首页 > 其他 > 详细

spider

时间:2019-08-03 19:40:37      阅读:77      评论:0      收藏:0      [点我收藏+]

#

技术分享图片
from lxml import etree
import requests
import csv

fp = open(./douban.csv,w+,encoding=utf-8,newline=‘‘)
writer = csv.writer(fp)
writer. writerow((name,url,author,publisher,date,price,rate,comment))  #写头部

urls = [https://book.douban.com/top250?start={}.format(num) for num in range(0,250,25)]
headers = {User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36}

for url in urls:
    page = requests.get(url,headers).text
    tree = etree.HTML(page)
    infos = tree.xpath(//tr[@class="item"])
    for info in infos:
        name = info.xpath(td/div/a/@title)[0]
        url = info.xpath(td/div/a/@href)[0]
        book_infos = info.xpath(//td/p/text())[0]
        author = book_infos.split(/)[0]
        pub = book_infos.split(/)[-3]
        date = book_infos.split(/)[-2]
        price = book_infos.split(/)[-1]
        rate = info.xpath(td/div/span[2]/text())[0]
        comments = info.xpath(td/p/span/text())
        comment = comments[0] if len(comments) != 0 else 
        writer.writerow((name,url,author,pub,date,price,rate,comment))
fp.close()
豆瓣书籍top250 csv文件

 

spider

原文:https://www.cnblogs.com/zhangchen-sx/p/11295690.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!