ChatGPT解决这个技术问题 Extra ChatGPT

xpath expression to remove whitespace

I have this HTML:

 <tr class="even  expanded first>
   <td class="score-time status">
     <a href="/matches/2012/08/02/europe/uefa-cup/">

            16 : 00

     </a>
    </td>        
  </tr>

I want to extract the (16 : 00) string without the extra whitespace. Is this possible?

Using what implementation - PHP, or what? XPath is concerned with the retrieval of nodes, not string handling. Any removal of whitespace would need to be done separately after retrieval.
i think there is an expression to get the desired text without spaces
If we're talking about php (which I've somehow assumed since it's about html), you can set preseveWhiteSpace to false on you DOMDocument object, resulting in the automatic removal of redundant white space. php.net/manual/de/…
As I say, XPath is not a string-handling mechanism; it cannot remove spaces. It is concerned solely with the retrieval of data. Anything you want to do TO that data must be done separately, and currently we don't know what language you're using to do that in.
@Utkanos: the absolute statement about the string-handling capabilities of XPath is proven wrong -- by my answer. :)

D
Dimitre Novatchev

I. Use this single XPath expression:

translate(normalize-space(/tr/td/a), ' ', '')

Explanation:

normalize-space() produces a new string from its argument, in which any leading or trailing white-space (space, tab, NL or CR characters) is deleted and any intermediary white-space is replaced by a single space character. translate() takes the result produced by normalize-space() and produces a new string in which each of the remaining intermediary spaces is replaced by the empty string.

II. Alternatively:

translate(/tr/td/a, ' &#9;&#10;&#13', '')

Is there a shortest XPATH expression to get only the CDATA nodes though an XML file ?
@ArupRakshit, There are no "CDATA nodes" in the XPath Data Model and thus it is not possible to distinguish CDATA as part of the text node that contains it. The same way as it is not possible to know if the short tag was used for an element without children, or if quotes or apostrophes were used as delimiters around an attribute value.
@DimitreNovatchev Thanks for the reply. So it means, I need to find it , they way, I search for the regular nodes.
@ArupRakshit, Yes, one can only select whole text nodes in XPath. You could filter these nodes with predicate(s) if you know something more (like a substring) for the text you are looking for
R
Rob

Please try the below xpath expression :

//td[@class='score-time status']/a[normalize-space() = '16 : 00']

U
Udhav Sarvaiya

You can use XPath's normalize-space() as in //a[normalize-space()="16 : 00"]


j
jerrythebum

I came across this thread when I was having my own issue similar to above.

HTML

<div class="d-flex">
<h4 class="flex-auto min-width-0 pr-2 pb-1 commit-title">
  <a href="/nsomar/OAStackView/releases/tag/1.0.1">

    1.0.1
  </a>

XPath start command

tree.xpath('//div[@class="d-flex"]/h4/a/text()')

However this grabbed random whitespace and gave me the output of:

['\n          ', '\n        1.0.1\n      ']

Using normalize-space, it removed the first blank space node and left me with just what I wanted

tree.xpath('//div[@class="d-flex"]/h4/a/text()[normalize-space()]')

['\n        1.0.1\n      ']

I could then grab the first element of the list, and use strip() to remove any further whitespace

XPath final command

tree.xpath('//div[@class="d-flex"]/h4/a/text()[normalize-space()]')[0].strip()

Which left me with exactly what I required:

1.0.1

C
Chris Noe

you can check if text() nodes are empty. /path/text()[not(.='')]

it may be useful with axes like following-sibling:: if these are no containers, or with child::.

you can use string() or the regex() function of xpath 2.

NOTE: some comments say that xpath cannot do string manipulation... even if it's not really designed for that you can do basic things: contains(), starts-with(), replace().

if you want to check whitespace nodes it's much harder, as you will generally have a nodelist result set, and most xpath functions, like match or replace, only operate one node.

you can separate node and string manipulation

So you may use xpath to retrieve a container, or a list of text nodes, and then process it with another language. (java, php, python, perl for instance).