ChatGPT解决这个技术问题 Extra ChatGPT

XPath to select multiple tags

Given this simplified data format:

<a>
    <b>
        <c>C1</c>
        <d>D1</d>
        <e>E1</e>
        <f>don't select this one</f>
    </b>
    <b>
        <c>C2</c>
        <d>D2</d>
        <e>E1</e>
        <g>don't select me</g>
    </b>
    <c>not this one</c>
    <d>nor this one</d>
    <e>definitely not this one</e>
</a>

How would you select all the Cs, Ds and Es that are children of B elements?

Basically, something like:

a/b/(c|d|e)

In my own situation, instead of just a/b/, the query leading up to selecting those C, D, E nodes is actually quite complex so I'd like to avoid doing this:

a/b/c|a/b/d|a/b/e

Is this possible?


D
Dimitre Novatchev

One correct answer is:

/a/b/*[self::c or self::d or self::e]

Do note that this

a/b/*[local-name()='c' or local-name()='d' or local-name()='e']

is both too-long and incorrect. This XPath expression will select nodes like:

OhMy:c

NotWanted:d 

QuiteDifferent:e

'or' does not work on a for-each, you would need to use a vertical line instead '|'
@Guasqueño, or is a logical operator -- it operates on two Boolean values. The XPath union operator | operates on two sets of nodes. These are quite different and there are specific use cases for each of them. Using | can solve the original problem, but it results in a longer and more complex and challenging to understand XPath expression. The simpler expression in this answer, which uses the or operator produces the wanted node-set and can be specified in the "select" attribute of an <xsl:for-each> XSLT operation. Just try it.
@JonathanBenn , Anyone who "doesn't care about namespaces" actually doesn't care about XML, and doesn't use XML. The use of local-name() is only correct if we want to select all elements with that local name, regardless of the namespace the element is in. This is a very rare case -- in general people do care about the differences between: kitchen:table and sql:table, or between architecture:column, sql:column, array:column, military:column
@DimitreNovatchev you make a good point. I'm using XPath for HTML inspection, which is an edge case where the namespace is not so important...
That is super. Where did you come up with that?
t
the Tin Man

You can avoid the repetition with an attribute test instead:

a/b/*[local-name()='c' or local-name()='d' or local-name()='e']

Contrary to Dimitre's antagonistic opinion, the above is not incorrect in a vacuum where the OP has not specified the interaction with namespaces. The self:: axis is namespace restrictive, local-name() is not. If the OP's intention is to capture c|d|e regardless of namespace (which I'd suggest is even a likely scenario given the OR nature of the problem) then it is "another answer that still has some positive votes" which is incorrect.

You can't be definitive without definition, though I'm quite happy to delete my answer as genuinely incorrect if the OP clarifies his question such that I am incorrect.


Speaking as a 3rd party here -- personally, I find Dimitre's suggestion to be the better practice except in cases where the user has explicit (and good) reason to care about tag name irrelevant of namespace; if anyone did this against a document which I was mixing in differently-namespaced content (presumably intended to be read by a different toolchain), I would consider their behavior very inappropriate. That said, the argument is -- as you suggest -- a bit unbecoming.
exactly what I was looking for. XML namespaces the way they are used in real life are a unholy mess. For a lack of being able to specify something like /a/b/(:c|:d|*e) your solution is exactly what is needed. Purists can argue all they want but users don't care that the app breaks because whatever generated their input file screwed up the namespaces. They just want it to work.
I have only the vaguest idea what the difference would be between these two answers and nobody has bothered to explain. What does "namespace restrictive" mean? If I use local-name(), does that mean it would match tags with any namespace? If I use self::, what namespace would it have to match? How would I match only OhMy:c?
P
Pavel Repin

Why not a/b/(c|d|e)? I just tried with Saxon XML library (wrapped up nicely with some Clojure goodness), and it seems to work. abc.xml is the doc described by OP.

(require '[saxon :as xml])
(def abc-doc (xml/compile-xml (slurp "abc.xml")))
(xml/query "a/b/(c|d|e)" abc-doc)
=> (#<XdmNode <c>C1</c>>
    #<XdmNode <d>D1</d>>
    #<XdmNode <e>E1</e>>
    #<XdmNode <c>C2</c>>
    #<XdmNode <d>D2</d>>
    #<XdmNode <e>E1</e>>)

This worked well for me. It seems XPath 2.0 is the default for HTML parsing in lxml on Python 2.
C
Calvin

Not sure if this helps, but with XSL, I'd do something like:

<xsl:for-each select="a/b">
    <xsl:value-of select="c"/>
    <xsl:value-of select="d"/>
    <xsl:value-of select="e"/>
</xsl:for-each>

and won't this XPath select all children of B nodes:

a/b/*

Thanks Calvin, but I'm not using XSL, and there are actually more elements underneath B which I don't want to select. I'll update my example to be clearer.
Oh, well in that case annakata seems to have the solution.