ChatGPT解决这个技术问题 Extra ChatGPT

Selecting a css class with xpath

I want to select just a class on its own called .date

For some reason, I cannot get this to work. If anyone knows what is wrong with my code, it would be much appreciated.

@$doc = new DOMDocument();
@$doc->loadHTML($html);
$xml = simplexml_import_dom($doc); // just to make xpath more simple
$images = $xml->xpath('//[@class="date"]');                             
foreach ($images as $img)
{
    echo  $img." ";
}
and what about piece of html ? ( Prefer to show us simpleXml output from asXML() as it is nearer to xpath )
if there is multiple classes you need to do contains(@class, 'date')
@Gordon's answer is dangerous, if the class attribute is "datetime" it would also match. user716736's answer is more complete.

J
John Smith

I want to write the canonical answer to this question because the answer above has a problem.

Our problem

The CSS selector:

.foo

will select any element that has the class foo.

How do you do this in XPath?

Although XPath is more powerful than CSS, XPath doesn't have a native equivalent of a CSS class selector. However, there is a solution.

The right way to do it

The equivalent selector in XPath is:

//*[contains(concat(" ", normalize-space(@class), " "), " foo ")]

The function normalize-space strips leading and trailing whitespace (and also replaces sequences of whitespace characters by a single space).

(In a more general sense) this is also the equivalent of the CSS selector:

*[class~="foo"]

which will match any element whose class attribute value is a list of whitespace-separated values, one of which is exactly equal to foo.

A couple of obvious, but wrong ways to do it

The XPath selector:

//*[@class="foo"]

doesn't work! because it won't match an element that has more than one class, for example

<div class="foo bar">

It also won't match if there is any extra whitespace around the class name:

<div class="  foo ">

The 'improved' XPath selector

//*[contains(@class, "foo")]

doesn't work either! because it wrongly matches elements with the class foobar, for example

<div class="foobar">

Credit goes to this fella, who was the earliest published solution to this problem that I found on the web: http://dubinko.info/blog/2007/10/01/simple-parsing-of-space-seprated-attributes-in-xpathxslt/


What's the need for normalize-space?
"the answer above" probably refers to MrGlass's.
Is this possible <div class="foo\tbar">? I mean, class names separated by a tab.
but
and
is the same for $x('//div[contains(concat(" ", normalize-space(@class), " "), "condition")]')
@testerjoe2 did you try //*[contains(concat(" ", normalize-space(@class), " "), " foo ")] ?
M
MrGlass

//[@class="date"] is not a valid xpath.

Try //*[@class="date"], or if you know it is an image, //img[@class="date"]


R
Robin Pokorny

XPath 3.1 introduces a function contains-token and thus finally solves this ‘officially’. It is designed to support classes.

Example:

//*[contains-token(@class, "foo")]

This function makes sure that white space (not only (U+0020)) is handled correctly, works in case of class name repetition, and generally covers the edge cases.

Note: As of today (2016-12-13) XPath 3.1 has status of Candidate Recommendation.


It does not work in today's latest chrome. Until it works, how do we get around the limitation that //*[contains(@class, "foo")] will also select any class that contains foo, such as foobar, fooz etc.
M
Memke

In XPath 2.0 you can:

//*[count(index-of(tokenize(@class, '\s+' ), 'foo')) = 1]

as stated by Christian Weiske in: https://cweiske.de/tagebuch/XPath%3A%20Select%20element%20by%20class.htm


unfortunately this doesn't seem to be implemented by chrome as of 6/12/2017. based on en.wikipedia.org/wiki/… it seems to be lacking pretty much across the board
h
hakre

HTML allows case-insensitive element and attribute names and then class is a space separated list of class-names. Here we go for a img tag and the class named date:

//*['IMG' = translate(name(.), 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')]/@*['CLASS' = translate(name(.), 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ') and contains(concat(' ', normalize-space(.), ' '), concat(' ', 'date', ' '))]

See as well: CSS Selector to XPath conversion


V
Vlado

BEWARE OF MINUS SIGNS IN TEMPLATE !!! If you are querying for "my-ownclass" in DOM:

<ul class="my-ownclass"><li>...</li></ul>
<ul class="someother"><li>...</li></ul>
<ul><li>...</li></ul>

$finder = new DomXPath($dom);
$nodes = $finder->query(".//ul[contains(@class, 'my-ownclass')]"); // This will NOT behave as expected! This will strangely match all the <ul> elements in DOM.
$nodes = $finder->query(".//ul[contains(@class, 'ownclass')]"); // This will match the element.