首页 > 其他 > 详细

爬取校园新闻首页的新闻

时间:2018-04-02 12:20:25      阅读:430      评论:0      收藏:0      [点我收藏+]
import requests
from bs4 import BeautifulSoup

url = http://news.gzcc.cn/html/xiaoyuanxinwen/
res = requests.get(url)
res.encoding = utf-8
soup = BeautifulSoup(res.text, html.parser)
for news in soup.select(li):
    if len(news.select(.news-list-title)) > 0:
        title = news.select(.news-list-title)[0].text
        time = news.select(.news-list-info)[0].contents[0].text
        a = news.select(a)[0].attrs[href]
        print(a,title,time)
        break
res1 = requests.get(a)
res1.encoding = utf-8
soup1 = BeautifulSoup(res1.text, html.parser)
sp1 = soup1.select(#content)[0].text
info = soup1.select(.show-info)[0].text
print(info)
dt = info.lstrip(发布时间:)[1:20]
print(dt)
ly = info.find(来源:)
if ly>0:
    s = info[info.find(来源:):].split()[0].lstrip(来源:)
print(s)
ly = info.find(摄影:)
if ly>0:
    s = info[info.find(摄影:):].split()[0].lstrip(摄影:)
print(s)

from datetime import datetime
str = dt
da = datetime.strptime(str,%Y-%m-%d %H:%M:%S)
now = datetime.now()
type(now)
print(now.strftime(%Y-%m-%d %H:%M:%S))

技术分享图片

爬取校园新闻首页的新闻

原文:https://www.cnblogs.com/mimimi/p/8691974.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!