爬虫:请求网站并爬取数据的自动化程序。
基本流程:
Request:
Response:
解析方式:
Urllib:python 内置的 HTTP 请求库
import request response = urllib.request.urlopen(‘http://www.baidu.com‘)
1 import urllib.parse 2 import urllib.request 3 import urllib.error 4 import socket 5 6 data = bytes(urllib.parse.urlencode({‘word‘:‘hello‘}),encoding=‘utf8‘) 7 try: 8 # response = urllib.request.urlopen(‘http://httpbin.org/get‘, timeout=0.1) 9 response = urllib.request.urlopen(‘http://httpbin.org/post‘, data=data, timeout=1) 10 print(response.read().decode(‘utf-8‘)) 11 except urllib.error.URLError as e: 12 if isinstance(e.reason, socket.timeout): 13 print(‘TIME OUT‘)
后期再整理......
原文:https://www.cnblogs.com/liqiongming/p/11588865.html