xpath 示例

时间：2021-03-28 22:28:24 阅读：32 评论：0 收藏：0 [点我收藏+]

选区元素的父元素

<li>
<a href="/hot/page/4/" rel="nofollow">
<!--<a href="/hot/page/4/" rel="nofollow">-->
<span class="next">
下一页
</span>
</a>
</li>

选取a的href标签；
response.xpath(‘//a/span[@class="next"]/parent::a/@href‘).extract()

获取ul元素

python - How can I get the text with xPath between and

? - Stack Overflow

//div[@class=‘oc_info‘]/ul[@class=‘list‘]/following-sibling::text()

xpath 获取多个class属性信息

How to get html elements with multiple css classes - Stack Overflow

//div[contains(@class, ‘class1‘) and contains(@class, ‘class2‘)]

XPath提取数据块（结构性数据）的技巧

XPath提取多个标签下的text - 简书

articles = selector.xpath(‘//ul[@class="article-list thumbnails"]/li‘)

    for article in articles:
        title = article.xpath(‘div/h4/a/text()‘).extract()
        url = article.xpath(‘div/h4/a/@href‘).extract()
        author = article.xpath(‘div/p/a/text()‘).extract()

XPath提取多个标签下的text内容

技术分享图片

循环遍历读取

#版本一
for r in response.xpath(‘//li[@class="clearfix"]‘):
    #抓取标题
    item[‘title‘] = r.xpath(‘./h3/a/text()‘).extract()
    #抓取简述
    item[‘desc‘] = r.xpath(‘string(./p/text())‘).extract()
    #抓取时间
    item[‘time‘] = r.xpath(‘./div/span/text()‘).extract()

版本二

from lxml import etree
html = etree.HTML(html)


li = html.xpath("//li[@class=‘clear‘]")
print(type(li))
print(len(li))

for item in li:
    title = item.xpath("string(.//div[@class=‘title‘]/a/text())")
    print(title)
    
#或者

from lxml import etree
text = ‘‘‘
<div>
    <ul>
         <li class="item-0"><a href="link1.html">first item</a></li>
         <li class="item-1"><a href="link2.html">second item</a></li>
         <li class="item-inactive"><a href="link3.html">third item</a></li>
         <li class="item-1"><a href="link4.html">fourth item</a></li>
         <li class="item-0"><a href="link5.html">fifth item</a>
     </ul>
 </div>
‘‘‘
tmp_html = etree.HTML(text)
result = etree.tostring(tmp_html)
print(result)
li = tmp_html.xpath("//li")
len(li)
for i in li:
    a = i.xpath(".//a//text()")
    print(a)

xpath 示例

原文：https://www.cnblogs.com/ministep/p/14589531.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)