首页 > 其他 > 详细

爬虫之Requests库

时间:2019-04-05 00:40:53      阅读:185      评论:0      收藏:0      [点我收藏+]

官方文档:http://www.python-requests.org/en/master/

一、引子

import requests

resp = requests.get("https://www.baidu.com/")
print(type(resp))  # <class ‘requests.models.Response‘>
print(resp.status_code)  # 200
# print(resp.text)
print(type(resp.text))   # <class ‘str‘>
# print(resp.text)
print(resp.cookies)

各种请求方式:

import requests
requests.post("http://httpbin.org/post")
requests.put("http://httpbin.org/put")
requests.delete("http://httpbin.org/delete")
requests.head("http://httpbin.org/get")
requests.options("http://httpbin.org/get")

二、请求

GET请求

基本写法:

import requests
resp = requests.get("http://httpbin.org/get")
print(resp.text)

带参数get请求:

# 方式1
resp = requests.get("http://httpbin.org/get?name=pd&age=18")
print(resp.text)
# 方式2
params = {
    "name": "pd",
    "age": 18
}
resp = requests.get("http://httpbin.org/get", params=params)
print(resp.text)

解析json:

resp = requests.get("http://httpbin.org/get")
print(resp.json())        # 相当于json.loads(resp.text)
print(type(resp.text))    # <class ‘str‘>
print(type(resp.json()))  # <class ‘dict‘>

获取二进制数据:

resp = requests.get("https://github.com/favicon.ico")
print(type(resp.text))     # <class ‘str‘>
print(type(resp.content))  # <class ‘bytes‘>
print(resp.text)
print(resp.content)        # 获取非文本响应内容

with open("favicon.ico", "wb") as f:
    f.write(resp.content)

添加请求头:

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36"
}
resp = requests.get("https://www.zhihu.com/explore", headers=headers)
print(resp.text)

POST请求

基本操作:

data = {"name": "pd", "age": 18}
resp = requests.post("http://httpbin.org/post", data=data)
print(resp.text)

添加请求头:

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36"
}
data = {"name": "pd", "age": 18}
resp = requests.post("http://httpbin.org/post", data=data, headers=headers)
print(resp.text)

三、响应

响应属性:

resp = requests.get("https://www.jianshu.com")
print(type(resp.status_code), resp.status_code) # <class ‘int‘> 200
print(type(resp.headers), resp.headers)
print(type(resp.cookies), resp.cookies)     # <class ‘requests.cookies.RequestsCookieJar‘> <RequestsCookieJar[<Cookie locale=zh-CN for www.jianshu.com/>]>
print(type(resp.url), resp.url)             # <class ‘str‘> https://www.jianshu.com/
print(type(resp.history), resp.history)     # <class ‘list‘> []

状态码判断:

resp = requests.get("https://www.jianshu.com")
exit() if not resp.status_code == requests.codes.forbidden else print("403 Forbidden")
response = requests.get("http://www.baidu.com")
exit() if not response.status_code == 200 else print("Request Successfully")
技术分享图片
100: (continue,),
101: (switching_protocols,),
102: (processing,),
103: (checkpoint,),
122: (uri_too_long, request_uri_too_long),
200: (ok, okay, all_ok, all_okay, all_good, \\o/, ?),
201: (created,),
202: (accepted,),
203: (non_authoritative_info, non_authoritative_information),
204: (no_content,),
205: (reset_content, reset),
206: (partial_content, partial),
207: (multi_status, multiple_status, multi_stati, multiple_stati),
208: (already_reported,),
226: (im_used,),

# Redirection.
300: (multiple_choices,),
301: (moved_permanently, moved, \\o-),
302: (found,),
303: (see_other, other),
304: (not_modified,),
305: (use_proxy,),
306: (switch_proxy,),
307: (temporary_redirect, temporary_moved, temporary),
308: (permanent_redirect, resume_incomplete, resume,),

# Client Error.
400: (bad_request, bad),
401: (unauthorized,),
402: (payment_required, payment),
403: (forbidden,),
404: (not_found, -o-),
405: (method_not_allowed, not_allowed),
406: (not_acceptable,),
407: (proxy_authentication_required, proxy_auth, proxy_authentication),
408: (request_timeout, timeout),
409: (conflict,),
410: (gone,),
411: (length_required,),
412: (precondition_failed, precondition),
413: (request_entity_too_large,),
414: (request_uri_too_large,),
415: (unsupported_media_type, unsupported_media, media_type),
416: (requested_range_not_satisfiable, requested_range, range_not_satisfiable),
417: (expectation_failed,),
418: (im_a_teapot, teapot, i_am_a_teapot),
421: (misdirected_request,),
422: (unprocessable_entity, unprocessable),
423: (locked,),
424: (failed_dependency, dependency),
425: (unordered_collection, unordered),
426: (upgrade_required, upgrade),
428: (precondition_required, precondition),
429: (too_many_requests, too_many),
431: (header_fields_too_large, fields_too_large),
444: (no_response, none),
449: (retry_with, retry),
450: (blocked_by_windows_parental_controls, parental_controls),
451: (unavailable_for_legal_reasons, legal_reasons),
499: (client_closed_request,),

# Server Error.
500: (internal_server_error, server_error, /o\\, ?),
501: (not_implemented,),
502: (bad_gateway,),
503: (service_unavailable, unavailable),
504: (gateway_timeout,),
505: (http_version_not_supported, http_version),
506: (variant_also_negotiates,),
507: (insufficient_storage,),
509: (bandwidth_limit_exceeded, bandwidth),
510: (not_extended,),
511: (network_authentication_required, network_auth, network_authentication),
状态码信息

四、高级操作

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

爬虫之Requests库

原文:https://www.cnblogs.com/believepd/p/10657681.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!