Using XPath, How do I select a node based on its text content and value of an attribute?

xml xpath xquery

Given this XML:

<DocText>
<WithQuads>
    <Page pageNumber="3">
        <Word>
            July
            <Quad>
                <P1 X="84" Y="711.25" />
                <P2 X="102.062" Y="711.25" />
                <P3 X="102.062" Y="723.658" />
                <P4 X="84.0" Y="723.658" />
            </Quad>
        </Word>
        <Word>
        </Word>
        <Word>
            30,
            <Quad>
                <P1 X="104.812" Y="711.25" />
                <P2 X="118.562" Y="711.25" />
                <P3 X="118.562" Y="723.658" />
                <P4 X="104.812" Y="723.658" />
            </Quad>
        </Word>
    </Page>
</WithQuads>

I'd like to find the nodes that have text of 'July' and a Quad/P1/X attribute Greater than 90. Thus, in this case, it should not return any matches. However, if I use GT (>) or LT (<), I get a match on the first Word element. If I use eq (=), I get no match.

So:

//Word[text()='July' and //P1[@X < 90]]

will return true, as will

//Word[text()='July' and //P1[@X > 90]]

How do I constrain this properly on the P1@X attribute?

In addition, imagine I have multiple Page elements, for different page numbers. How would I additionally constrain the above search to find Nodes with text()='July', P1@X < 90, and Page@pageNumber=3?

An important thing to note with this particular XML that may not be obvious to every reader, is that because this XML uses a mixed content model it is tricky to match elements with XPath. I ran into this very issue recently and, being rusty with my XPath, was about to conclude that one could not match mixed content elements until I found Michael Kay's answer below. I have not been able to find any other reference that talks about pitfalls with mixed content and XPath.

Your question answered my question. It is very important to use 'single apostrophes' for strings in XPath, not "double quotation marks". It is really very important. Thanks for the clue.

AnthonyWJones

Generally I would consider the use of an unprefixed // as a bad smell in an XPath.

Try this:-

/DocText/WithQuads/Page/Word[text()='July' and Quad/P1/@X > 90]

Your problem is that you use the //P1[@X < 90] which starts back at the beginning of the document and starts hunting any P1 hence it will always be true. Similarly //P1[@X > 90] is always true.

I am surprised that this, in fact, worked because of the whitespace issues addressed in Michael Kay's answer. I tried this answer in a couple different XPath evaluators and it failed to match with either. Once I switched to the predicate with 'normalize-space', then I made a successful match.

You could use .//P1 to start hunting at the current level instead of specifying a fixed path

Mads Hansen

Apart form the "//" issue, this XML is a very weird use of mixed content. The predicate text()='July' will match the element if any child text node is exactly equal to July, which isn't true in your example because of surrounding whitespace. Depending on the exact definition of the source XML, I would go for [text()[normalize-space(.)='July'] and Quad/P1/@X > 90]

thank you, Michael. I was wondering about the whitespace.... I formatted the sample before pasting into stack overflow, but my source XML is all "tight". When I ran the xpath against the formatted version it did indeed fail to work correctly. I will try using normalize-space(.)

Using XPath, How do I select a node based on its text content and value of an attribute?

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Contact US