爬虫find()和find_all()遇到的问题集合

时间：2020-11-17 09:48:29 阅读：163 评论：0 收藏：0 [点我收藏+]

from bs4 import BeautifulSoup
lxml 以lxml形式解析html，例：BeautifulSoup(html,‘lxml‘) # 注：html5lib 容错率最高
find 返回找到的第一个标签
find_all 以list的形式返回找到的所有标签
limit 指定返回的标签个数
attrs 将标签属性放到一个字典中
string 获取标签下的非标签字符串(值), 返回字符串
strings 获取标签下的所有非标签字符串，返回生成器。
stripped_strings 获取标签下的所有非标签字符串，并剔除空白字符，返回生成器。
get_text # 获取标签下的所有非标签字符串,返回字符串格式
contents、children都是返回某个标签下的直接子元素，包含字符串。 contents 返回一个列表，children 返回一个生成器

记录第一个问题 .text

soup.find_all().text

报错内容：
AttributeError: ResultSet object has no attribute ‘text’. You’re probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

当不是一个单独的对象的时候不能使用.text方法

想要返回第二个标签的内容

方法一：通过limit可指定返回的标签数量

p = soup.find_all("p",limit=2)[1] # 从列表中获取第二个元素，limit 获取标签个数
print(p.text)

方法二：获取class的p标签

d = soup.find(class_="wy_contMain fontSt")
p =d.find("p")

爬虫find()和find_all()遇到的问题集合

原文：https://www.cnblogs.com/cloudchild/p/13992035.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)