堆糖网热门图片下载

时间：2020-06-04 21:55:30 阅读：62 评论：0 收藏：0 [点我收藏+]

下载目标是堆糖网热门图片，打开网页并下拉发现图片是通过ajax加载的，按F12打开开发者工具选择nerwork并筛选xhr，继续下拉网页找到ajax请求的api，如下图所示

技术分享图片

然后就可以构造请求获取包含图片url的json数据，对于网络请求等IO密集型任务，开启进程池可以提高下载速度

代码如下：

import requests
from requests import exceptions
import re
from multiprocessing import Pool
import os

def get_pic_info():
    url = ‘https://www.duitang.com/napi/index/hot/?‘
    for i in range(1000):
        params = {
            ‘include_fields‘: ‘top_comments,is_root,source_link,item,buyable,root_id,status,like_count,sender,album‘,
            ‘limit‘: ‘24‘,
            ‘start‘: 24 * i,
        }
        response = requests.get(url, params=params)
        json_data = response.json()
        pic_list = json_data[‘data‘][‘object_list‘]
        for pic_ in pic_list:
            image = {}
            pic_info = pic_[‘album‘]
            pic_url = pic_info[‘covers‘][0]
            image[‘pic_name‘] = re.sub(r‘[\\/:*?"<>|\r\n。，.？ ]+‘, ‘‘, pic_info[‘name‘]) + ‘.‘ + pic_url.split(‘.‘)[-1]
            image[‘pic_url‘] = pic_url
            yield image

def download_pic(image):
    if not os.path.exists(f‘./img/{image["pic_name"]}‘):
        try:
            resp = requests.get(image[‘pic_url‘])
            if resp.status_code == 200:
                    with open(f‘./img/{image["pic_name"]}‘, ‘wb‘) as f:
                        f.write(resp.content)
        except exceptions:
            return None
    else:
        print(image[‘pic_name‘] + ‘ has already downloaded‘)

if __name__ == ‘__main__‘:
    if not os.path.exists(‘./img‘):
        os.mkdir(‘./img‘)
    pool = Pool()
    pool.map(download_pic, get_pic_info())
    pool.close()
    pool.join()

堆糖网热门图片下载

原文：https://www.cnblogs.com/pau1fang/p/13045835.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)