爬虫基础知识

时间：2017-02-04 01:01:01 阅读：263 评论：0 收藏：0 [点我收藏+]

1.URL:URL是web页的地址，这种地址会在浏览器顶部附近的Location或者URL框内显示出来。

2.各种传输协议都有默认的端口;Http默认的端口是80

下载网页数据

import urllib.request     #导入一个包 
response = urllib.request.urlopen("http://www.baidu.com")  
#打开一个网站     将返回的对象返回给参数response
html = response.read()  #将读取的内容赋值给变量html
html = html.decode("utf-8") #将二进制内容转换成utf-8编码呈现出来 
print(html) #将内容打印出来

模拟浏览器下载简单的图品

#下载一只猫的图品

import urllib.request

response = urllib.request.urlopen("http://placekitten.com/g/600/600")
cat_img = resopnse.read()

with open(‘cat.jpg‘,‘wb‘) as f:
    f.write(cat_img)

#有道翻译
import urllib.request
import urllib.parse
import json

message = input("请输入需要翻译的内容")
url = " http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=http://www.baidu.com/link"
data = {}
data[‘type‘] = "AUTO"
data[‘doctype‘] = "json"
data[‘keyfrom‘] = "fanyi.web"
data[‘typoResult‘] = "true"
data[‘ue‘] = "UTF-8"
data[‘xmlVersion‘] = "1.8"
data[‘i‘] = "你好"
data = urllib.parse.urlencode(data).encode(‘utf-8‘)

response = urllib.request.urlopen(url,data)
html = response.read().decode(‘utf-8‘)

target = json.loads(html)
print("翻译结果：%s"%(target[‘translateResult‘][0][0][‘tgt‘]))

爬虫基础知识

原文：http://www.cnblogs.com/bixiaopengblog/p/6338022.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)