ChatGPT解决这个技术问题 Extra ChatGPT

How to select following sibling/XML tag using XPath

I have an HTML file (from Newegg) and their HTML is organized like below. All of the data in their specifications table is 'desc' while the titles of each section are in 'name.' Below are two examples of data from Newegg pages.

<tr>
    <td class="name">Brand</td>
    <td class="desc">Intel</td>
</tr>
<tr>
    <td class="name">Series</td>
    <td class="desc">Core i5</td>
</tr>
<tr>
    <td class="name">Cores</td>
    <td class="desc">4</td>
</tr>
<tr>
    <td class="name">Socket</td>
    <td class="desc">LGA 1156</td>

<tr>
    <td class="name">Brand</td>
    <td class="desc">AMD</td>
</tr>
<tr>
    <td class="name">Series</td>
    <td class="desc">Phenom II X4</td>
</tr>
<tr>
    <td class="name">Cores</td>
    <td class="desc">4</td>
</tr>
<tr>
    <td class="name">Socket</td>
    <td class="desc">Socket AM3</td>
</tr>

In the end I would like to have a class for a CPU (which is already set up) that consists of a Brand, Series, Cores, and Socket type to store each of the data. This is the only way I can think of to go about doing this:

if(parsedDocument.xpath(tr/td[@class="name"])=='Brand'):
    CPU.brand = parsedDocument.xpath(tr/td[@class="name"]/nextsibling?).text

And doing this for the rest of the values. How would I accomplish the nextsibling and is there an easier way of doing this?


D
Dimitre Novatchev

How would I accomplish the nextsibling and is there an easier way of doing this?

You may use:

tr/td[@class='name']/following-sibling::td

but I'd rather use directly:

tr[td[@class='name'] ='Brand']/td[@class='desc']

This assumes that:

The context node, against which the XPath expression is evaluated is the parent of all tr elements -- not shown in your question. Each tr element has only one td with class attribute valued 'name' and only one td with class attribute valued 'desc'.


Note that you have to be careful about using class. When your 'name' class elements have any other class at the same time, td[@class='name'] will break. See this question for details.
@gm2008, Yes, in case there is more than one class in the value of the @class attribute, the predicate to use is: contains(concat(' ', @class, ' '), ' name ') . But in this question the @class attributes do have only single values.
Relative to an element: ./following-sibling::td
@JohnGietzen, Re: "Relative to an element" -- You mean If the context node is the element we are interested in. In this case you can omit ./ . Also, if you want to select the immediate following sibling, use: following-sibling::td[1], otherwise, if there are more than one following siblings, all will be selected.
P
Philipp

Try the following-sibling axis (following-sibling::td).


M
Milan

For completeness - adding to accepted answer above - in case you are interested in any sibling regardless of the element type you can use variation:

following-sibling::*