爬虫10-爬取飘花电影网

时间：2020-03-12 01:22:08 阅读：106 评论：0 收藏：0 [点我收藏+]

import requests
from  lxml import  etree
url="https://www.piaohua.com/"
headers={
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36"
}
#1.请求网页
response=requests.get(url,headers=headers)
content=response.content.decode("utf-8")
#2.建立xpath
html=etree.HTML(content)
#3.使用xpath语法筛选
ul=html.xpath("//ul[@class=‘ul-imgtxt1 row‘]")[0]
lis=ul.xpath("./li")
# for li in lis:
    #print(etree.tostring(li,encoding=‘utf-8‘).decode(‘utf-8‘))#检测li没有问题
movies=[]
for li in lis:
    title=li.xpath(".//h3//text()")[0]
    clear=li.xpath(".//h3//text()")[1]
    playbill=li.xpath(".//img/@src")#@相当于取值符号
    movie={
        "title":title,
        "clear":clear,
        "playbill":playbill
    }
    movies.append(movie)
print(movies)

爬虫10-爬取飘花电影网

原文：https://www.cnblogs.com/wcyMiracle/p/12466647.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)