首页 > 其他 > 详细

爬取校园新闻首页的新闻

时间:2018-04-03 23:25:04      阅读:565      评论:0      收藏:0      [点我收藏+]
import requests
from bs4 import BeautifulSoup
import re
from datetime import datetime


new_list, add, p_list, pa = [], [], [], []
url = ‘http://news.gzcc.cn/html/xiaoyuanxinwen/‘
res = requests.get(url)
res.encoding = ‘utf-8‘
soup = BeautifulSoup(res.text, ‘html.parser‘)
news = soup.select(‘div[class="list-container"] li a‘)
for i in range(0, len(news)):
    a = re.findall(r‘<a href="(.*?)">‘, str(news[i]))[0]
    # print(a)
    add.append(a)
    new_list.append(news[i].get_text().strip())
resd = requests.get(add[0])
print(add)
print(new_list)
resd.encoding = ‘utf-8‘
soupd = BeautifulSoup(resd.text, ‘html.parser‘)
# print(soupd)
passage = soupd.select(‘div[class="show-container"]‘)
# print(passage)
title = soupd.select(‘div[class="show-info"]‘)
for j in range(0, len(title)):
    pa.append(passage[j].get_text().strip())
print(pa)
print(title)
tm = re.findall(r‘\d\d\d\d-\d\d-\d\d‘, str(title))
print(tm)
# sst = datetime.strftime(str(tm), ‘%Y-%m-%d‘)
# print(sst)

  技术分享图片

 

爬取校园新闻首页的新闻

原文:https://www.cnblogs.com/miranda-76/p/8711234.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!