Selecting a css class with xpath

php html xml xpath web

I want to select just a class on its own called .date

For some reason, I cannot get this to work. If anyone knows what is wrong with my code, it would be much appreciated.

@$doc = new DOMDocument();
@$doc->loadHTML($html);
$xml = simplexml_import_dom($doc); // just to make xpath more simple
$images = $xml->xpath('//[@class="date"]');                             
foreach ($images as $img)
{
    echo  $img." ";
}

and what about piece of html ? ( Prefer to show us simpleXml output from asXML() as it is nearer to xpath )

if there is multiple classes you need to do contains(@class, 'date')

possible duplicate of PHP - Parse All Links That Contain A Speciffic Word In "href" Tag

possible duplicate of XPath: How to match attributes that contain a certain string

@Gordon's answer is dangerous, if the class attribute is "datetime" it would also match. user716736's answer is more complete.

John Smith

I want to write the canonical answer to this question because the answer above has a problem.

Our problem

The CSS selector:

.foo

will select any element that has the class foo.

How do you do this in XPath?

Although XPath is more powerful than CSS, XPath doesn't have a native equivalent of a CSS class selector. However, there is a solution.

The right way to do it

The equivalent selector in XPath is:

//*[contains(concat(" ", normalize-space(@class), " "), " foo ")]

The function normalize-space strips leading and trailing whitespace (and also replaces sequences of whitespace characters by a single space).

(In a more general sense) this is also the equivalent of the CSS selector:

*[class~="foo"]

which will match any element whose class attribute value is a list of whitespace-separated values, one of which is exactly equal to foo.

A couple of obvious, but wrong ways to do it

The XPath selector:

//*[@class="foo"]

doesn't work! because it won't match an element that has more than one class, for example

<div class="foo bar">

It also won't match if there is any extra whitespace around the class name:

<div class="  foo ">

The 'improved' XPath selector

//*[contains(@class, "foo")]

doesn't work either! because it wrongly matches elements with the class foobar, for example

<div class="foobar">

Credit goes to this fella, who was the earliest published solution to this problem that I found on the web: http://dubinko.info/blog/2007/10/01/simple-parsing-of-space-seprated-attributes-in-xpathxslt/

What's the need for normalize-space?

"the answer above" probably refers to MrGlass's.

Is this possible <div class="foo\tbar">? I mean, class names separated by a tab.

but

and

is the same for $x('//div[contains(concat(" ", normalize-space(@class), " "), "condition")]')

@testerjoe2 did you try //*[contains(concat(" ", normalize-space(@class), " "), " foo ")] ?

MrGlass

//[@class="date"] is not a valid xpath.

Try //*[@class="date"], or if you know it is an image, //img[@class="date"]

Robin Pokorny

XPath 3.1 introduces a function contains-token and thus finally solves this ‘officially’. It is designed to support classes.

Example:

//*[contains-token(@class, "foo")]

This function makes sure that white space (not only (U+0020)) is handled correctly, works in case of class name repetition, and generally covers the edge cases.

Note: As of today (2016-12-13) XPath 3.1 has status of Candidate Recommendation.

It does not work in today's latest chrome. Until it works, how do we get around the limitation that //*[contains(@class, "foo")] will also select any class that contains foo, such as foobar, fooz etc.

Memke

In XPath 2.0 you can:

//*[count(index-of(tokenize(@class, '\s+' ), 'foo')) = 1]

as stated by Christian Weiske in: https://cweiske.de/tagebuch/XPath%3A%20Select%20element%20by%20class.htm

unfortunately this doesn't seem to be implemented by chrome as of 6/12/2017. based on en.wikipedia.org/wiki/… it seems to be lacking pretty much across the board

hakre

HTML allows case-insensitive element and attribute names and then class is a space separated list of class-names. Here we go for a img tag and the class named date:

//*['IMG' = translate(name(.), 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')]/@*['CLASS' = translate(name(.), 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ') and contains(concat(' ', normalize-space(.), ' '), concat(' ', 'date', ' '))]

See as well: CSS Selector to XPath conversion

Vlado

BEWARE OF MINUS SIGNS IN TEMPLATE !!! If you are querying for "my-ownclass" in DOM:

<ul class="my-ownclass"><li>...</li></ul>
<ul class="someother"><li>...</li></ul>
<ul><li>...</li></ul>

$finder = new DomXPath($dom);
$nodes = $finder->query(".//ul[contains(@class, 'my-ownclass')]"); // This will NOT behave as expected! This will strangely match all the <ul> elements in DOM.
$nodes = $finder->query(".//ul[contains(@class, 'ownclass')]"); // This will match the element.

Follow WeChat

Success story sharing

Want to stay one step ahead of the latest teleworks?

Subscribe Now

相似问题

HuntsBot,a one-stop outsourcing task, remote job, product ideas sharing and subscription platform, which supports DingTalk, Lark, WeCom, Email and Telegram robot subscription. The platform will push outsourcing task requirements, remote work opportunities, product ideas to every subscribed user with timely, stable and reliable.

Platform

Support

Contact US

Any questions or suggestions during use, you can contact us in the following ways:

Email: huntsbot@xinbeitime.com