For this markup,
<a>Ask Question<other/>more text</a>
notice that the a
element has a text node child ("Ask Question"
), an empty element child (other
), and a second text node child ("more text"
).
Here’s how to reason through what’s happening when evaluating //a[contains(text(),'Ask Question')]
against that markup:
contains(x,y)
expectsx
to be a string, buttext()
matches two text nodes.- In XPath 1.0, the rule for converting multiple nodes to a string is this:
A node-set is converted to a string by returning the string-value of
the node in the node-set that is first in document order. If the
node-set is empty, an empty string is returned. [Emphasis added]
- In XPath 2.0+, it is an error to provide a sequence of text nodes to a function expecting a string, so
contains(text(),'substr')
will cause an error for more than one matching text node.
In your case…
-
XPath 1.0 would treat
contains(text(),'Ask Question')
ascontains('Ask Question','Ask Question')
which is
true
. On the other hand, be sure to notice thatcontains(text(),'more text')
will evaluate tofalse
in XPath 1.0. Without knowing the (1)-(3) above, this can be counter-intuitive. -
XPath 2.0 would treat it as an error.
Better alternatives
-
If the goal is to find all
a
elements whose string value contains the substring,"Ask Question"
://a[contains(.,'Ask Question')]
This is the most common requirement.
-
If the goal is to find all
a
elements with an immediate text node child equal to"Ask Question"
://a[text()='Ask Question']
This can be useful when wishing to exclude strings from descendent elements in
a
such as if you want thisa
,<a>Ask Question<other/>more text</a>
but not this
a
:<a>more text before <not>Ask Question</not> more text after</a>