python静态网页爬虫之xpath

时间：2016-05-18 23:42:20 阅读：195 评论：0 收藏：0 [点我收藏+]

常用语句：

1.starts-with(@属性名称，属性字符相同部分）使用情形：以相同的字符开头

<div id = ‘test-1‘>需要的内容1</div>

<div id = ‘test-2‘>需要的内容2</div>

<div id = ‘test-3‘>需要的内容3</div>

selector = etree.HTML(html)
content = selector.xpath(‘//div[start-with(@id,‘test‘)]/text()‘)

2.string(.) 使用情形：标签套标签

<div id=‘class3‘>美女，

　　<font color=red>你微信号是多少？</font>

</div>

selector = etree.HTML(html)
data = selector.xpath(‘//div[@id=‘test3‘]‘)[0]   #先大后小
info = data.xpath(‘string(.)‘)
content = info.replace(‘\n‘,‘‘).replace(‘  ‘,‘‘)  #替换换行符和tab

python静态网页爬虫之xpath

原文：http://www.cnblogs.com/alan-babyblog/p/5506968.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)