Python爬虫：urllib库的基本使用

时间：2020-01-23 22:20:16 阅读：100 评论：0 收藏：0 [点我收藏+]

请求网址获取网页代码

import urllib.request
url = "http://www.baidu.com"
response = urllib.request.urlopen(url)
data = response.read()
# print(data)
# 将文件获取的内容转换成字符串
str_data = data.decode("utf-8")
print(str_data)
# 将结果保存到文件中
with open("baidu.html", "w", encoding="utf-8") as f:
    f.write(str_data)

get带参数请求

import urllib.request

def get_method_params(wd):
    url = "http://www.baidu.com/s?wd="
    # 拼接字符串
    final_url = url + wd
    # 发送网络请求
    response = urllib.request.urlopen(final_url)
    print(response.read().decode("utf-8"))

get_method_params("美女")

直接这么写会报错：
技术分享图片

原因是，网址里面包含了汉字，但是ascii码是没有汉字的，需要转义一下：

import urllib.request
import urllib.parse
import string

def get_method_params(wd):
    url = "http://www.baidu.com/s?wd="
    # 拼接字符串
    final_url = url + wd
    # 将包含汉字的网址进行转义
    encode_new_url = urllib.parse.quote(final_url, safe=string.printable)
    # 发送网络请求
    response = urllib.request.urlopen(encode_new_url)
    print(response.read().decode("utf-8"))

get_method_params("美女")

Python爬虫：urllib库的基本使用

原文：https://www.cnblogs.com/wbyixx/p/12231527.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)