简单爬虫一个网站的图片

时间：2020-04-26 15:38:17 阅读：94 评论：0 收藏：0 [点我收藏+]

import requests
from bs4 import BeautifulSoup
import re
response = requests.get(url="https://www.autohome.com.cn/news/")
response.encoding= response.apparent_encoding
suop = BeautifulSoup(response.text,features="lxml")
target = suop.find(id="auto-channel-lazyload-article")
li_list = target.find_all("li")
for i in li_list:
    a = i.find("a")
    if a:
        print(a.attrs.get("href"))
        test = a.find("h3").text
        tx = re.findall(‘[\u4e00-\u9fa5a-zA-Z0-9]+‘,test)
        txt = "".join(tx)
        print(txt)
        for pic_tag in a.find_all(‘img‘):
            pic_link = pic_tag.get(‘src‘)
            img_url = ‘http:‘ + str(pic_link)
            print(img_url)
        img_response = requests.get(url=img_url)
        file_name = txt  + ".jpg"
        with open(file_name,"wb")as f:
            f.write(img_response.content)

简单爬虫一个网站的图片

原文：https://www.cnblogs.com/linglinglingling/p/12233581.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)