除了find()和find_all(), 这里还提供了许多类似的方法我就细讲了, 参数和用法都差不多, 最后四个是next, previous是以.next/previous_element()来说的...
Signature: find_parents(name, attrs, string, limit, **kwargs)
Signature: find_parent(name, attrs, string, **kwargs)
Signature: find_next_siblings(name, attrs, string, limit, **kwargs)
Signature: find_next_sibling(name, attrs, string, **kwargs)
Signature: find_previous_siblings(name, attrs, string, limit, **kwargs)
Signature: find_previous_sibling(name, attrs, string, **kwargs)
Signature: find_all_next(name, attrs, string, limit, **kwargs)
Signature: find_next(name, attrs, string, **kwargs)
Signature: find_all_previous(name, attrs, string, limit, **kwargs)
Signature: find_previous(name, attrs, string, **kwargs)
BeautifulSoup也提供CSS选择器, 用法大致与css选择器相同, 我css也只是入门级别, 这里就不多解释了... :
1 soup.select("title") 2 # [<title>The Dormouse‘s story</title>] 3 4 soup.select("p nth-of-type(3)") 5 # [<p class="story">...</p>] 6 7 soup.select("body a") 8 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 9 # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 10 # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] 11 12 soup.select("html head title") 13 # [<title>The Dormouse‘s story</title>] 14 15 soup.select("head > title") 16 # [<title>The Dormouse‘s story</title>] 17 18 soup.select("p > a") 19 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 20 # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 21 # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] 22 23 soup.select("p > a:nth-of-type(2)") 24 # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>] 25 26 soup.select("p > #link1") 27 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>] 28 29 soup.select("body > a") 30 # [] 31 32 #上面好像看懂了, 应该是 > 的话就是必须是孩子, 空格的话表示子孙. 33 34 soup.select("#link1 ~ .sister") 35 # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 36 # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] 37 38 soup.select("#link1 + .sister") 39 # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>] 40 41 soup.select(".sister") 42 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 43 # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 44 # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] 45 46 soup.select("#link1") 47 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>] 48 49 soup.select("a#link2") 50 # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>] 51 52 #下面好像是通过id寻找 : 53 soup.select("#link1") 54 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>] 55 56 soup.select("a#link2") 57 # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>] 58 59 #匹配任意一个 60 soup.select(“#link1,#link2”) 61 # [<a class=”sister” href=”http://example.com/elsie” id=”link1”>Elsie</a>, 62 # <a class=”sister” href=”http://example.com/lacie” id=”link2”>Lacie</a>] 63 64 #当然可以用属性的值来匹配 65 soup.select(‘a[href="http://example.com/elsie"]‘) 66 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>] 67 68 soup.select(‘a[href^="http://example.com/"]‘) 69 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 70 # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 71 # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] 72 73 soup.select(‘a[href$="tillie"]‘) 74 # [<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] 75 76 soup.select(‘a[href*=".com/el"]‘) 77 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>] 78 79 #这个真看不懂 80 multilingual_markup = """ 81 <p lang="en">Hello</p> 82 <p lang="en-us">Howdy, y‘all</p> 83 <p lang="en-gb">Pip-pip, old fruit</p> 84 <p lang="fr">Bonjour mes amis</p> 85 """ 86 multilingual_soup = BeautifulSoup(multilingual_markup) 87 multilingual_soup.select(‘p[lang|=en]‘) 88 # [<p lang="en">Hello</p>, 89 # <p lang="en-us">Howdy, y‘all</p>, 90 # <p lang="en-gb">Pip-pip, old fruit</p>] 91 92 #选一个可以用select_one() 93 soup.select_one(".sister") 94 # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
读BeautifulSoup官方文档之html树的搜索(2)
原文:http://www.cnblogs.com/nzhl/p/5591765.html