首页 > 编程语言 > 详细

数据之路 - Python爬虫 - Requests库

时间:2019-08-01 22:48:58      阅读:116      评论:0      收藏:0      [点我收藏+]

转载学习:https://www.cnblogs.com/alex3714/articles/8359404.html

一、Requests库介绍

Requests是用python语言基于urllib编写的,采用的是Apache2 Licensed开源协议的HTTP库。

二、Requests库基本使用

import requests

response  = requests.get("https://www.baidu.com")
print(response.status_code)
print(response.text)
print(response.cookies)
print(response.content)

很多情况下的网站如果直接response.text会出现乱码的问题,常用解决方法如下:

方法1:

import requests

response  = requests.get("https://www.baidu.com")
print(response.content.decode("utf-8"))

方法2:

import requests

response  = requests.get("https://www.baidu.com")
response.encoding="utf-8"
print(response.text)

三、Requests请求

import requests

requests.post("http://httpbin.org/post")
requests.put("http://httpbin.org/put")
requests.delete("http://httpbin.org/delete")
requests.head("http://httpbin.org/get")
requests.options("http://httpbin.org/get")

四、GET与POST

1.GET请求

不带参数请求用法:

import requests

response = requests.get(http://httpbin.org/get)
print(response.text)

2种带参数请求用法:

import requests

response = requests.get("http://httpbin.org/get?name=zhaofan&age=23")
print(response.text)
import requests
data = {
    "name":"zhaofan",
    "age":22
}
response = requests.get("http://httpbin.org/get",params=data)
print(response.url)
print(response.text)

部分网页访问需定制headers头部信息,方法如下:

import requests
headers = {
        "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
      }
response =requests.get("https://www.zhihu.com",headers=headers)

print(response.text)

2.POST请求

import requests

data = {
    "name":"zhaofan",
    "age":23
}
response = requests.post("http://httpbin.org/post",data=data)
print(response.text)

五、Requests高级用法

1.文件上传

import requests

files= {"files":open("git.jpeg","rb")}
response = requests.post("http://httpbin.org/post",files=files)
print(response.text)

2.cookie获取

import requests

response = requests.get("http://www.baidu.com")
print(response.cookies)

for key,value in response.cookies.items():
    print(key+"="+value)

3.会话维持

会话维持主要是利用cookie做模拟登陆。

import requests
s
= requests.Session() s.get("http://httpbin.org/cookies/set/number/123456") response = s.get("http://httpbin.org/cookies") print(response.text)

4.证书验证

主要用于https类型的网页,如:12306官网,常用解决方法如下:

import requests
from requests.packages import urllib3

urllib3.disable_warnings()
response = requests.get("https://www.12306.cn",verify=False)
print(response.status_code)

这样,模拟请求则不会提示警告信息,也通过cert参数放入证书路径。

5.代理设置

import requests

proxies= {
    "http":"http://127.0.0.1:9999",
    "https":"http://127.0.0.1:8888"
}
response = requests.get("https://www.baidu.com",proxies=proxies)
print(response.text)

如果代理需要设置账户名和密码,只需要将字典更改为如下

import requests

proxies = {
    "http":"http://user:password@127.0.0.1:9999"
}
response = requests.get("https://www.baidu.com",proxies=proxies)
print(response.text)

如果代理是通过sokces这种方式则需要 pip install "requests[socks]"

proxies= {
    "http":"socks5://127.0.0.1:9999",
    "https":"sockes5://127.0.0.1:8888"
}

6.超时设置

import requests

# 超时抛出异常
r = requests.get("https://www.taobao.com", timeout = 1)
print(r.status_code)

# 请求分为两个阶段,即连接(connect)和读取(read),可以分别指定,传入一个元组
r = requests.get(https://www.taobao.com, timeout=(5,11, 30))

# 永久等待    
r = requests.get(https://www.taobao.com, timeout=None)
r = requests.get(https://www.taobao.com)

7.身份认证

如果碰到需要认证的网站可以通过requests.auth模块实现。

方法1:

import requests
from requests.auth import HTTPBasicAuth

response = requests.get("http://120.27.34.24:9001/",auth=HTTPBasicAuth("user","123"))
print(response.status_code)

方法2

import requests

response = requests.get("http://120.27.34.24:9001/",auth=("user","123"))
print(response.status_code)

8.异常处理

import requests
from requests.exceptions import ReadTimeout,ConnectionError,RequestException

try:
    response = requests.get("http://httpbin.org/get",timout=0.1)
    print(response.status_code)
except ReadTimeout:
    print("timeout")
except ConnectionError:
    print("connection Error")
except RequestException:
    print("error")

数据之路 - Python爬虫 - Requests库

原文:https://www.cnblogs.com/Iceredtea/p/11285574.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!