首页 > Web开发 > 详细

简单爬虫一个网站的图片

时间:2020-04-26 15:38:17      阅读:94      评论:0      收藏:0      [点我收藏+]
import requests
from bs4 import BeautifulSoup
import re
response = requests.get(url="https://www.autohome.com.cn/news/")
response.encoding= response.apparent_encoding
suop = BeautifulSoup(response.text,features="lxml")
target = suop.find(id="auto-channel-lazyload-article")
li_list = target.find_all("li")
for i in li_list:
    a = i.find("a")
    if a:
        print(a.attrs.get("href"))
        test = a.find("h3").text
        tx = re.findall(‘[\u4e00-\u9fa5a-zA-Z0-9]+‘,test)
        txt = "".join(tx)
        print(txt)
        for pic_tag in a.find_all(‘img‘):
            pic_link = pic_tag.get(‘src‘)
            img_url = ‘http:‘ + str(pic_link)
            print(img_url)
        img_response = requests.get(url=img_url)
        file_name = txt  + ".jpg"
        with open(file_name,"wb")as f:
            f.write(img_response.content)

  

简单爬虫一个网站的图片

原文:https://www.cnblogs.com/linglinglingling/p/12233581.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!