爬取资讯网站的新闻并保存到excel

时间：2018-01-30 19:16:47 阅读：248 评论：0 收藏：0 [点我收藏+]

#!/usr/bin/env python
#* coding:utf-8 *
#author:Jacky

from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from bs4 import BeautifulSoup
import xlwt

driver = webdriver.Firefox()
driver.implicitly_wait(3)
first_url = ‘http://www.yidianzixun.com/channel/c6‘
driver.get(first_url)
driver.find_element_by_class_name(‘icon-refresh‘).click()
for i in range(1, 90):
driver.find_element_by_class_name(‘icon-refresh‘).send_keys(Keys.DOWN)
soup = BeautifulSoup(driver.page_source, ‘lxml‘)
print soup
articles=[]
for article in soup.findall(class=‘item doc style-small-image style-content-middle‘):
title= article.find(class_=‘doc-title‘).gettext()
source=article.find(class=‘source‘).gettext()
comment=article.find(class=‘comment-count‘).get_text()
link=‘http://www.yidianzixun.com‘+article.get(‘href‘)
articles.append([title,source,comment,link])
print articles
driver.quit()

wbk=xlwt.Workbook(encoding=‘utf-8‘)
sheet=wbk.add_sheet(‘yidianzixun‘)
i=1
sheet.write(0, 0, ‘title‘)
sheet.write(0, 1, ‘source‘)
sheet.write(0, 2, ‘comment‘)
sheet.write(0, 3, ‘link‘)
for row in articles:
#print row[0]
sheet.write(i,0,row[0])
sheet.write(i,1,row[1])
sheet.write(i,2,row[2])
sheet.write(i,3,row[3])
i +=1
wbk.save(r‘zixun\zixun.xls‘)

爬取资讯网站的新闻并保存到excel

原文：http://blog.51cto.com/jackyxin/2066959

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)