目录
请求方式有get,post两种,另外还有head,put,delete,options等
请求URL,全球统一资源定位符,任何一个网页图片文档等,都可以用URL唯一确定
请求头,包含请求时的头部信息,如user-agent,host,cookies等
请求体,请求时额外携带的数据,如表单提交时的表单数据
响应状态,有多种相应状态200成功,301跳转,404找不到页面,502服务器错误
响应头,如内容类型,内容长度吗,服务器信息,设置cookies等
响应体,最主要的部分,包括了请求资源的内容,如网页HTML,图片,二进制数据等
import requests # 网页
uheaders = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36'}
response = requests.get('http://www.baidu.com', headers=uheaders)
print(response.text)
print(response.headers)
print(response.status_code)
response = requests.get('https://www.baidu.com/img/baidu_jgylogo3.gif') # 图片
res = response.content
with open('1.gif','wb') as f:
f.write(res)
直接处理
json解析
正则表达式
beautifulsoup
pyquery
xpath
各种请求方法
import requests
requests.post('http://httpbin.org/post')
requests.put('http://httpbin.org/put')
requests.delete('http://httpbin.org/delete')
requests.head('http://httpbin.org/get')
requests.options('http://httpbin.org/get')
import requests
response=requests.get('http://httpbin.org/get')
print(response.text)
{
"args": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Connection": "close",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.19.1"
},
"origin": "115.214.23.142",
"url": "http://httpbin.org/get"
}
import requests
response=requests.get('http://httpbin.org/get?name=germey&age=22')
print(response.text)
data={'name':'germey','age':22}
response=requests.get('http://httpbin.org/get',params=data)
{
"args": {
"age": "22",
"name": "germey"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Connection": "close",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.19.1"
},
"origin": "115.214.23.142",
"url": "http://httpbin.org/get?name=germey&age=22"
}
import requests
response=requests.get('http://httpbin.org/get')
print(type(response.text))
print(response.json())
print(type(response.json()))
<class 'str'>
{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.19.1'}, 'origin': '115.214.23.142', 'url': 'http://httpbin.org/get'}
<class 'dict'>
import requests
response=requests.get('http://github.com/favicon.ico')
with open() as f:
f.write(response.content)
import requests
headers={'User-Agent':''}
response=requests.get('http://www.zhihu.com/explore',headers=headers)
print(response.text)
import requests
data={'name':'germey','age':22}
headers={'User-Agent':''}
response=requests.post('http://httpbin.org/post',data=data,headers=headers)
print(response.json())
{'args': {}, 'data': '', 'files': {}, 'form': {'age': '22', 'name': 'germey'}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Content-Length': '18', 'Content-Type': 'application/x-www-form-urlencoded', 'Host': 'httpbin.org', 'User-Agent': ''}, 'json': None, 'origin': '115.214.23.142', 'url': 'http://httpbin.org/post'}
response属性
import requests
response = requests.get('http://www.jianshu.com')
print(type(response.status_code), response.status_code)
print(type(response.headers), response.headers)
print(type(response.cookies), response.cookies)
print(type(response.url), response.url)
print(type(response.history), response.history)
<class 'int'> 403
<class 'requests.structures.CaseInsensitiveDict'> {'Date': 'Wed, 31 Oct 2018 06:25:29 GMT', 'Content-Type': 'text/html', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Server': 'Tengine', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'Content-Encoding': 'gzip', 'X-Via': '1.1 dianxinxiazai180:5 (Cdn Cache Server V2.0), 1.1 PSzjjxdx10wx178:11 (Cdn Cache Server V2.0)'}
<class 'requests.cookies.RequestsCookieJar'> <RequestsCookieJar[]>
<class 'str'> https://www.jianshu.com/
<class 'list'> [<Response [301]>]
好多种
文件上传
import requests
files={'file':open('1.jpg','rb')}
response=requests.post('http://httpbin.org/post',files=files)
print(response.text)
获取cookie
import requests
response=requests.get('http://www.baidu.com')
print(response.cookies)
for key,value in response.cookies.items():
print(key+'='+value)
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
BDORZ=27315
会话维持
import requests
s=requests.Session()
s.get('http://httpbin.org/cookies/set/number/123456789')
response=s.get('http://httpbin.org/cookies')
print(response.text)
{"cookies": {"number": "123456789"}}
证书验证
代理设置
超时设置
import requests
from requests.exceptions import ReadTimeout
try:
response=requests.get('https://www.taobao.com',timeout= 1)
print(response.status_code)
except ReadTimeout:
print('Timeout')
认证设置
import requests
r = requests.get('', auth=('user', '123'))
print(r.status_code)
异常处理
import requests
from requests.exceptions import ReadTimeout,ConnectionError,RequestException
try:
response=requests.get('http://httpbin.org/get',timeout=0.5)
print(response.status_code)
except ReadTimeout:
print('Timeout')
except ConnectionError:
print('connect error')
except RequestException:
print('Error')
import requests # 伪造浏览器发起Http请求
from bs4 import BeautifulSoup # 将html格式的字符串解析成对象,对象.find/find__all
response = requests.get('https://www.autohome.com.cn/news/')
response.encoding = 'gbk' # 网站是gbk编码的
soup = BeautifulSoup(response.text, 'html.parser')
div = soup.find(name='div', attrs={'id': 'auto-channel-lazyload-article'})
li_list = div.find_all(name='li')
for li in li_list:
title = li.find(name='h3')
if not title:
continue
p = li.find(name='p')
a = li.find(name='a')
print(title.text) # 标题
print(a.attrs.get('href')) # 标题链接,取属性值,字典形式
print(p.text) # 摘要
img = li.find(name='img') # 图片
src = img.get('src')
src = 'https:' + src
print(src)
file_name = src.rsplit('/', maxsplit=1)[1]
ret = requests.get(src)
with open(file_name, 'wb') as f:
f.write(ret.content) # 二进制
原文:https://www.cnblogs.com/qiuyicheng/p/10753117.html