首页 > 其他 > 详细

堆糖网热门图片下载

时间:2020-06-04 21:55:30      阅读:56      评论:0      收藏:0      [点我收藏+]

下载目标是堆糖网热门图片,打开网页并下拉发现图片是通过ajax加载的,按F12打开开发者工具选择nerwork并筛选xhr,继续下拉网页找到ajax请求的api,如下图所示

 

技术分享图片

然后就可以构造请求获取包含图片url的json数据,对于网络请求等IO密集型任务,开启进程池可以提高下载速度

代码如下:

import requests
from requests import exceptions
import re
from multiprocessing import Pool
import os

def get_pic_info():
    url = https://www.duitang.com/napi/index/hot/?
    for i in range(1000):
        params = {
            include_fields: top_comments,is_root,source_link,item,buyable,root_id,status,like_count,sender,album,
            limit: 24,
            start: 24 * i,
        }
        response = requests.get(url, params=params)
        json_data = response.json()
        pic_list = json_data[data][object_list]
        for pic_ in pic_list:
            image = {}
            pic_info = pic_[album]
            pic_url = pic_info[covers][0]
            image[pic_name] = re.sub(r[\\/:*?"<>|\r\n。,.? ]+, ‘‘, pic_info[name]) + . + pic_url.split(.)[-1]
            image[pic_url] = pic_url
            yield image

def download_pic(image):
    if not os.path.exists(f./img/{image["pic_name"]}):
        try:
            resp = requests.get(image[pic_url])
            if resp.status_code == 200:
                    with open(f./img/{image["pic_name"]}, wb) as f:
                        f.write(resp.content)
        except exceptions:
            return None
    else:
        print(image[pic_name] +  has already downloaded)

if __name__ == __main__:
    if not os.path.exists(./img):
        os.mkdir(./img)
    pool = Pool()
    pool.map(download_pic, get_pic_info())
    pool.close()
    pool.join()

 

堆糖网热门图片下载

原文:https://www.cnblogs.com/pau1fang/p/13045835.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!