爬虫实践之第一次获取网页内容及BeautifulSoup处理

时间：2017-02-14 11:49:07 阅读：328 评论：0 收藏：0 [点我收藏+]

 1 from urllib.request import urlopen
 2 from urllib.request import HTTPError
 3 from bs4 import BeautifulSoup
 4 
 5 def getTag(url,tager):
 6     try:
 7         html = urlopen(url)
 8     except HTTPError as e:
 9         return None
10     try:
11         bsObj = BeautifulSoup(html.read(),"html.parser")
12         print(tager)
13         title = bsObj(tager)
14     except AttributeError as e:
15         return None
16     return title
17 
18 
19 title = getTag("http://www.pythonscraping.com/pages/page1.html",‘title‘)
20 if title is None:
21     print("Title could not be found")
22 else:
23     print(title)

实例二、只获取单个标签

 1 from urllib.request import urlopen
 2 from urllib.request import HTTPError
 3 from bs4 import BeautifulSoup
 4 
 5 def getTitle(url):
 6     try:
 7         html = urlopen(url)
 8     except HTTPError as e:
 9         return None
10     try:
11         bsObj = BeautifulSoup(html.read(),"html.parser")
12         title = bsObj.title
13     except AttributeError as e:
14         return None
15     return title
16 
17 
18 title = getTitle("http://www.pythonscraping.com/pages/page1.html")
19 if title is None:
20     print("Title could not be found")
21 else:
22     print(title)

原文：http://www.cnblogs.com/xiaoyaowuming/p/6396558.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)