首页 > 编程语言 > 详细

python网络爬虫

时间:2016-10-25 14:04:00      阅读:149      评论:0      收藏:0      [点我收藏+]
1.可以通过r.headers来获取响应头内容
>>>r = requests.get(‘http://www.zhidaow.com‘)
>>> r.headers
{
    ‘content-encoding‘: ‘gzip‘,
    ‘transfer-encoding‘: ‘chunked‘,
    ‘content-type‘: ‘text/html; charset=utf-8‘;
    ...
}
>>> r.headers[‘Content-Type‘]
‘text/html; charset=utf-8‘

>>> r.headers.get(‘content-type‘)

‘text/html; charset=utf-8‘

  2.设置超时时间

  >>> requests.get(‘http://github.com‘, timeout=0.001)
3.代理访问 
proxies = {
  "http": "http://10.10.1.10:3128",
  "https": "http://10.10.1.10:1080",
}

requests.get("http://www.zhidaow.com", proxies=proxies)

   如果代理需要账户和密码,则需这样

proxies = {
    "http": "http://user:pass@10.10.1.10:3128/",
}
4.不允许重定向的设置
>>>r = requests.get(‘http://www.baidu.com/link?url=QeTRFOS7TuUQRppa0wlTJJr6FfIYI1DJprJukx4Qy0XnsDO_s9baoO8u1wvjxgqN‘, allow_redirects = False)
5. 上传文件
>>> url = ‘http://httpbin.org/post‘
>>> files = {‘file‘: open(‘report.xls‘, ‘rb‘)}

>>> r = requests.post(url, files=files)
>>> r.text
{
  ...
  "files": {
    "file": "<censored...binary...data>"
  },
  ...
}

   还可以显式地设置文件名:

>>> url = ‘http://httpbin.org/post‘
>>> files = {‘file‘: (‘report.xls‘, open(‘report.xls‘, ‘rb‘))}

>>> r = requests.post(url, files=files)
>>> r.text
{
  ...
  "files": {
    "file": "<censored...binary...data>"
  },
  ...
}

  如果你想,你也可以发送作为文件来接收的字符串:

>>> url = ‘http://httpbin.org/post‘
>>> files = {‘file‘: (‘report.csv‘, ‘some,data,to,send\nanother,row,to,send\n)}

>>> r = requests.post(url, files=files)
>>> r.text
{
  ...
  "files": {
    "file": "some,data,to,send\\nanother,row,to,send\\n"
  },
  ...
}
 

python网络爬虫

原文:http://www.cnblogs.com/auto-tester-space/p/5996010.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!