爬取校园新闻首页的新闻

时间：2018-04-03 23:25:04 阅读：570 评论：0 收藏：0 [点我收藏+]

import requests
from bs4 import BeautifulSoup
import re
from datetime import datetime


new_list, add, p_list, pa = [], [], [], []
url = ‘http://news.gzcc.cn/html/xiaoyuanxinwen/‘
res = requests.get(url)
res.encoding = ‘utf-8‘
soup = BeautifulSoup(res.text, ‘html.parser‘)
news = soup.select(‘div[class="list-container"] li a‘)
for i in range(0, len(news)):
    a = re.findall(r‘<a href="(.*?)">‘, str(news[i]))[0]
    # print(a)
    add.append(a)
    new_list.append(news[i].get_text().strip())
resd = requests.get(add[0])
print(add)
print(new_list)
resd.encoding = ‘utf-8‘
soupd = BeautifulSoup(resd.text, ‘html.parser‘)
# print(soupd)
passage = soupd.select(‘div[class="show-container"]‘)
# print(passage)
title = soupd.select(‘div[class="show-info"]‘)
for j in range(0, len(title)):
    pa.append(passage[j].get_text().strip())
print(pa)
print(title)
tm = re.findall(r‘\d\d\d\d-\d\d-\d\d‘, str(title))
print(tm)
# sst = datetime.strftime(str(tm), ‘%Y-%m-%d‘)
# print(sst)

　　技术分享图片

爬取校园新闻首页的新闻

原文：https://www.cnblogs.com/miranda-76/p/8711234.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)