利用python 爬虫抓取百度贴吧图片！

时间：2017-02-22 15:36:42 阅读：184 评论：0 收藏：0 [点我收藏+]

利用正则表达式下载百度贴吧图片。百度贴吧直接用urllib.urlopen就可打开，像‘糗事百科’这种就打不开了，后面我呈上利用urllib2.Request来突破封禁抓取糗事百科的代码

#!/usr/bin。/python
#coding=utf-8
import urllib
import re

def gethtml(url):
    page=urllib.urlopen(url)
    html=page.read()
    return html

def getjpgurl(html):
    r=r‘src="(.*?\.jpg)" size‘
    jpgurl=re.findall(r,html)
    print jpgurl
    x=1
    for jpgurls in jpgurl:
        urllib.urlretrieve(jpgurls,‘%s.jpg‘%x)
        x+=1
html=gethtml(‘http://tieba.baidu.com/p/4968182754‘)
getjpgurl(html)

有更好的写法，望大家提出宝贵的建议供大家一起学习交流。

本文出自 “全球互联云主机Q874247458” 博客，请务必保留此出处http://gosweet.blog.51cto.com/11759495/1900076

利用python 爬虫抓取百度贴吧图片！

原文：http://gosweet.blog.51cto.com/11759495/1900076

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)