Python__bs4模块

时间：2021-05-08 00:45:08 阅读：16 评论：0 收藏：0 [点我收藏+]

1 - 导入模块

from bs4 import BeautifulSoup

2 - 创建对象

fp = open(‘./test.html‘,‘r‘,encoding=‘utf-8‘)
soup = BeautifulSoup(fp,‘lxml‘)

3 - 定位

（1）标签定位: 
    1）div_tag = soup.div   
 
（2）属性定位: 
    1）find(只可以定位到满足要求的第一个标签): div_tag = soup.find(‘div‘,class=‘song‘)
    2）findAll(定位到满足要求的所有标签): div_tag = soup.findAll(‘div‘,class_=‘song‘)

（3）选择器定位(定位到满足要求的所有标签): 
    1）a_tag = soup.select(‘#feng‘)
    2）层级选择器(>表示一个层级，空格表示多个层级):
        li_tag = soup.select(‘.tang > ul > li‘)
        li_tag = soup.select(‘.tang li‘)

4 - 数据提取

a_tag = soup.findAll(‘a‘,id=‘feng‘)[0]

print(a_tag.string)     #取直系文本内容
print(a_tag.text)       #取所有的文本内容
print(a_tag[‘href‘])    #取属性值

Python__bs4模块

原文：https://www.cnblogs.com/zhangyh-blog/p/14741929.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)