转载学习:https://www.cnblogs.com/alex3714/articles/8359404.html
Requests是用python语言基于urllib编写的,采用的是Apache2 Licensed开源协议的HTTP库。
import requests response = requests.get("https://www.baidu.com") print(response.status_code) print(response.text) print(response.cookies) print(response.content)
很多情况下的网站如果直接response.text会出现乱码的问题,常用解决方法如下:
方法1:
import requests response = requests.get("https://www.baidu.com") print(response.content.decode("utf-8"))
方法2:
import requests response = requests.get("https://www.baidu.com") response.encoding="utf-8" print(response.text)
import requests requests.post("http://httpbin.org/post") requests.put("http://httpbin.org/put") requests.delete("http://httpbin.org/delete") requests.head("http://httpbin.org/get") requests.options("http://httpbin.org/get")
不带参数请求用法:
import requests response = requests.get(‘http://httpbin.org/get‘) print(response.text)
2种带参数请求用法:
import requests response = requests.get("http://httpbin.org/get?name=zhaofan&age=23") print(response.text)
import requests data = { "name":"zhaofan", "age":22 } response = requests.get("http://httpbin.org/get",params=data) print(response.url) print(response.text)
部分网页访问需定制headers头部信息,方法如下:
import requests headers = { "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" } response =requests.get("https://www.zhihu.com",headers=headers) print(response.text)
import requests data = { "name":"zhaofan", "age":23 } response = requests.post("http://httpbin.org/post",data=data) print(response.text)
import requests files= {"files":open("git.jpeg","rb")} response = requests.post("http://httpbin.org/post",files=files) print(response.text)
import requests response = requests.get("http://www.baidu.com") print(response.cookies) for key,value in response.cookies.items(): print(key+"="+value)
会话维持主要是利用cookie做模拟登陆。
import requests
s = requests.Session() s.get("http://httpbin.org/cookies/set/number/123456") response = s.get("http://httpbin.org/cookies") print(response.text)
主要用于https类型的网页,如:12306官网,常用解决方法如下:
import requests from requests.packages import urllib3 urllib3.disable_warnings() response = requests.get("https://www.12306.cn",verify=False) print(response.status_code)
这样,模拟请求则不会提示警告信息,也通过cert参数放入证书路径。
import requests proxies= { "http":"http://127.0.0.1:9999", "https":"http://127.0.0.1:8888" } response = requests.get("https://www.baidu.com",proxies=proxies) print(response.text)
如果代理需要设置账户名和密码,只需要将字典更改为如下:
import requests proxies = { "http":"http://user:password@127.0.0.1:9999" } response = requests.get("https://www.baidu.com",proxies=proxies) print(response.text)
如果代理是通过sokces这种方式则需要 pip install "requests[socks]"
proxies= { "http":"socks5://127.0.0.1:9999", "https":"sockes5://127.0.0.1:8888" }
import requests # 超时抛出异常 r = requests.get("https://www.taobao.com", timeout = 1) print(r.status_code) # 请求分为两个阶段,即连接(connect)和读取(read),可以分别指定,传入一个元组 r = requests.get(‘https://www.taobao.com‘, timeout=(5,11, 30)) # 永久等待 r = requests.get(‘https://www.taobao.com‘, timeout=None) r = requests.get(‘https://www.taobao.com‘)
如果碰到需要认证的网站可以通过requests.auth模块实现。
方法1:
import requests from requests.auth import HTTPBasicAuth response = requests.get("http://120.27.34.24:9001/",auth=HTTPBasicAuth("user","123")) print(response.status_code)
方法2:
import requests response = requests.get("http://120.27.34.24:9001/",auth=("user","123")) print(response.status_code)
import requests from requests.exceptions import ReadTimeout,ConnectionError,RequestException try: response = requests.get("http://httpbin.org/get",timout=0.1) print(response.status_code) except ReadTimeout: print("timeout") except ConnectionError: print("connection Error") except RequestException: print("error")
原文:https://www.cnblogs.com/Iceredtea/p/11285574.html