python 爬虫之BeautifulSoup 库的基本使用

时间：2018-10-23 14:23:33 阅读：196 评论：0 收藏：0 [点我收藏+]

import urllib2
url = ‘http://www.someserver.com/cgi-bin/register.cgi‘
values = {}
values[‘name‘] = ‘Michael Foord‘
values[‘location‘] = ‘Northampton‘
values[‘language‘] = ‘Python‘

data = urllib.urlencode(values) #数据进行编码生成get方式的请求字段
req = urllib2.Request(url,data) #作为data参数传递到Request对象中 POST方式访问
response = urllib2.urlopen(req) 返回一个类文件对象
the_page = response.read()
soup = BeautifulSoup(the_page，"html.parser") 通过类文件the_page 创建beautifulsoup对象，soup的内容就是页面的源码内容
构造好BeautifulSoup对象后，借助find()和find_all()这两个函数，可以通过标签的不同属性轻松地把繁多的html内容过滤为你所想要的
url_name = line.get(‘href‘) 获取a标签的url信息
Title = line.get_text().strip() 获取a标签的文本内容

原文：http://blog.51cto.com/weadyweady/2307779

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)