ChatGPT解决这个技术问题 Extra ChatGPT

XPath query to get nth instance of an element

There is an HTML file (whose contents I do not control) that has several input elements all with the same fixed id attribute of "search_query". The contents of the file can change, but I know that I always want to get the second input element with the id attribute "search_query".

I need an XPath expression to do this. I tried //input[@id="search_query"][2] but that does not work. Here is an example XML string where this query failed:

<div>
  <form>
    <input id="search_query" />
   </form>
</div>

<div>
  <form>
    <input id="search_query" />
  </form>
</div>

<div>
  <form>
    <input id="search_query" />
  </form>
</div>

Keep in mind that that the above is merely an example and the other HTML code can be quite different and the input elements can appear anywhere with no consistent document structure (except that I am guaranteed there will always be at least two input elements with an id attribute of "search_query").

What is the correct XPath expression?

Good question, +1. See my answer for a complete explanation of the problem and for the wanted solution.
Minor point: you should never have more than one element with a given ID (and so the HTML in the question is actually invalid). In practice, browsers will let you do it anyway, but if you do you're missing out on the only benefit of using IDs, which is that they signal "I'm unique" (whereas classes are designed to be used for non-unique signifiers).
Not a minor point @machineghost ! It is actually a bug! ID stands for unique identifier!

D
Dimitre Novatchev

This is a FAQ:

//somexpression[$N]

means "Find every node selected by //somexpression that is the $Nth child of its parent".

What you want is:

(//input[@id="search_query"])[2]

Remember: The [] operator has higher precedence (priority) than the // abbreviation.


I like this answer. I had not considered a precedence issue (I just assumed simple left-to-right precedence).
@rlandster: The word "precedence" may be confusing. The unabbreviated form of //input[@id='search_query'][2] is: /descendat-or-self::node()/child::input[attribute::id='search_query'][position()=2]
For those who got here from Google - the numbering starts from 1 - [1] being the first element and so on
Weird that in these XPath queries these kinds of arrays start with 1, confused me.
@Ivotje50 Yes XPath sequences and arrays are 1-based
r
rlandster

This seems to work:

/descendant::input[@id="search_query"][2]

I go this from "XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition" by Michael Kay.

There is also a note in the "Abbreviated Syntax" section of the XML Path Language specification http://www.w3.org/TR/xpath/#path-abbrev that provided a clue.


Many thanks for this answer. In my case the accepted solution would not work as I'm using the xpath in robot framework, which wouldn't accept paths starting with brackets. This one however, should do the trick
When I try this: ${el_my_value}= XML.Get Element ${x} .//isbn
It leads to this: Multiple elements (6) matching './/isbn' found. how can I find the 4th?