为什么使用Beautiful Soup
详细内容在崔庆才大佬的教程:https://cuiqingcai.com/1319.html
简单来说,Beautiful Soup是python的一个库,最主要的功能是从网页抓取数据。官方解释如下:
在爬虫的实现中,经常使用正则表达式来匹配要查找的部分,但是如果一个正则匹配稍有差池,那可能程序就处在永久的循环之中,可以使用一个更强大的工具,叫Beautiful Soup,有了它我们可以很方便地提取出HTML或XML标签中的内容。
Beautiful Soup提供一些简单的、python式的函数用来处理导航、搜索、修改分析树等功能。它是一个工具箱,通过解析文档为用户提供需要抓取的数据,因为简单,所以不需要多少代码就可以写出一个完整的应用程序。
Beautiful Soup自动将输入文档转换为Unicode编码,输出文档转换为utf-8编码。你不需要考虑编码方式,除非文档没有指定一个编码方式,这时,Beautiful Soup就不能自动识别编码方式了。然后,你仅仅需要说明一下原始编码方式就可以了。
Beautiful Soup已成为和lxml、html6lib一样出色的python解释器,为用户灵活地提供不同的解析策略或强劲的速度。
安装过程出现了链接中类似的问题,参照作者的方案,
首先输入 anaconda search -t conda beautifulsoup4,这样子就会显示可用的版本 ,我的显示效果如下所示:
(wangli) D:\Anaconda3\envs\wangli> anaconda search -t conda beautifulsoup4 Using Anaconda API: https://api.anaconda.org Packages: Name | Version | Package Types | Platforms | Builds ------------------------- | ------ | --------------- | --------------- | ---------- IzODA/beautifulsoup4 | 4.6.0 | conda | zos-z | py37_0, py36_0 NOAA-ORR-ERD/beautifulsoup4 | 4.3.2 | conda | win-64, osx-64 | 0, py34_0, py33_0, py27_0 : Screen-scraping library ODSP-TEST/beautifulsoup4 | 4.6.0 | conda | zos-z | py37_0, py36_0 RahulJain/beautifulsoup4 | 4.4.1 | conda | win-64 | py27_0 Trentonoliphant/beautifulsoup4 | 4.3.2 | conda | win-32, win-64 | py34_0, py33_0, py26_0, py27_0 : http://www.crummy.com/software/BeautifulSoup/bs4/ aarch64_gbox/beautifulsoup4 | 4.5.3 | conda | linux-aarch64 | py36_0 aetrial/beautifulsoup4 | | conda | linux-64, osx-64 | py35_0, py27_0 akode/beautifulsoup4 | 4.3.2 | conda | osx-64 | py27_0 : Screen-scraping library alefnula/beautifulsoup4 | 4.1.3 | conda | osx-64 | py34_0 : UNKNOWN anaconda/beautifulsoup4 | 4.8.2 | conda | linux-ppc64le, linux-64, win-32, osx-64, linux-32, win-64 | py37_0, py37_1, py36h6ea3382_0, py36_1, py36_0, py35hb75f182_1, py27h3f86ba9_1, py27_1, py27_0, py27hc287451_1, py27h9416283_1, py27hdc1f29e_0, py35h61fcdcc_1, py36h49b8c8c_1, py38_0, py36h4361f19_1, py27h8bb5803_1, py35h94b83b4_1, py35h50ea147_0, py34_0, py35_0, py36h72d3c9f_1, py36hd4cc5e8_1, py35h442a8c9_1, py35_1 : Python library designed for screen-scraping anacondams/beautifulsoup4 | 4.5.1 | conda | linux-64, win-64 | py35_0 : Python library designed for screen-scraping archiarm/beautifulsoup4 | 4.7.0 | conda | linux-aarch64 | py27_1000, py36_1000, py37_1000 : Python library designed for screen-scraping asmeurer/beautifulsoup4 | 4.2.1 | conda | osx-64 | py26_1, py33_0, py33_1, py27_1, py27_0 : http://www.crummy.com/software/BeautifulSoup/bs4/ auto/beautifulsoup4 | 4.3.2 | conda | linux-64, linux-32, osx-64 | py27_0 : Screen-scraping library c4aarch64/beautifulsoup4 | 4.6.3 | conda | linux-aarch64 | py37_0 : Python library designed for screen-scraping c4armv7l/beautifulsoup4 | 4.7.1 | conda | linux-armv7l | py37_1001 : Python library designed for screen-scraping cdat-forge/beautifulsoup4 | 4.8.1 | conda | linux-64, osx-64 | py27_0 : Python library designed for screen-scraping conda-forge/beautifulsoup4 | 4.8.2 | conda | linux-ppc64le, linux-64, win-32, linux-aarch64, osx-64, win-64 | py37_0, py36_1001, py36_1000, py37_1000, py37_1001, py27_0, py36_0, py38_0, py36h9f0ad1d_1, py37hc8dfbb8_1, py36hc560c46_1, py27_1001, py27_1000, py38h32f6830_1, py35_0 : Python library designed for screen-scraping conner_org/beautifulsoup4 | 4.7.1 | conda | linux-64 | py27_1 : Python library designed for screen-scraping daf/beautifulsoup4 | 4.3.2 | conda | linux-64 | py27_0 : Screen-scraping library draikes/beautifulsoup4 | 4.4.1 | conda | win-64 | py27_0 : UNKNOWN ericmjl/beautifulsoup4 | 4.4.0 | conda | linux-64, osx-64 | py34_0 : Screen-scraping library free/beautifulsoup4 | 4.6.0 | conda | linux-ppc64le, linux-64, win-32, osx-64, linux-32, win-64 | py36_0, py34_0, py35_0, py27_0 : Python library designed for screen-scraping iilab/beautifulsoup4 | 4.3.2 | conda | linux-64, osx-64 | py34_0 : Screen-scraping library ijstokes/beautifulsoup4 | 4.3.2 | conda | linux-64 | py27_0 jetson-tx2/beautifulsoup4 | 4.6.0 | conda | noarch | py_0 : Python library designed for screen-scraping jjhelmus/beautifulsoup4 | | conda | linux-aarch64 | py37_0 : Python library designed for screen-scraping jmatsushita/beautifulsoup4 | 4.3.2 | conda | linux-64 | py34_0 : Screen-scraping library josh/beautifulsoup4 | 4.3.2 | conda | win-64 | py34_0 : Screen-scraping library main/beautifulsoup4 | 4.8.2 | conda | linux-ppc64le, linux-64, win-32, osx-64, linux-32, win-64 | py37_0, py37_1, py36h6ea3382_0, py36_1, py27hdc1f29e_0, py35hb75f182_1, py27h3f86ba9_1, py27_1, py27_0, py27hc287451_1, py27h9416283_1, py36_0, py35h61fcdcc_1, py36h49b8c8c_1, py38_0, py36h4361f19_1, py27h8bb5803_1, py35h94b83b4_1, py35h50ea147_0, py35_0, py36h72d3c9f_1, py36hd4cc5e8_1, py35h442a8c9_1, py35_1 : Python library designed for screen-scraping manmadescience/beautifulsoup4 | 4.3.2 | conda | win-64 | py34_0 : Screen-scraping library moghimis/beautifulsoup4 | 4.3.2 | conda | linux-32 | py27_0 : Beautiful Soup sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree moustik/beautifulsoup4 | 4.5.0 | conda | linux-64 | py27_0 ngould/beautifulsoup4 | 4.2.1 | conda | osx-64 | py27_0 : http://www.crummy.com/software/BeautifulSoup/bs4/ pdrops/beautifulsoup4 | 4.3.2 | conda | osx-64 | py27_0 : Screen-scraping library prkrekel/beautifulsoup4 | 4.3.2 | conda | win-64 | py27_0 : Screen-scraping library prometeia/beautifulsoup4 | 4.8.2 | conda | linux-ppc64le, linux-64, linux-aarch64, win-64, osx-64 | py37_0, py36_0, py38_0, py27_0 : Python library designed for screen-scraping rmcgibbo/beautifulsoup4 | 4.3.2 | conda | linux-64 | py27_0 : Screen-scraping library rodgomesc/pip-beautifulsoup4 | 4.5.3 | conda | noarch | 0 : Screen-scraping library Built for Android and iOS apps using enaml-native. rogerramos/beautifulsoup4 | 4.6.0 | conda | linux-64 | py27_3 : Screen-scraping library rpi/beautifulsoup4 | 4.6.3 | conda | linux-armv6l, linux-armv7l, noarch | py27_1, py27_0, py36_1, py36_0, py_0, py35_0, py35_1 : Python library designed for screen-scraping rpi64/beautifulsoup4 | 4.6.3 | conda | linux-aarch64 | py36_0 : Python library designed for screen-scraping rsmulktis/beautifulsoup4 | 4.5.3 | conda | linux-armv7l | py34_0 : Screen-scraping library sayth/beautifulsoup4 | 4.4.0 | conda | win-64 | py34_3 : This is a simple meta-package sundarv/beautifulsoup4 | 4.5.3 | conda | win-64 | py36_0 : Python library designed for screen-scraping sunpy/beautifulsoup4 | 4.8.2 | conda | linux-ppc64le, linux-64, win-32, linux-aarch64, osx-64, win-64 | py37_0, py36_1001, py36_1000, py37_1000, py37_1001, py27_0, py36_0, py38_0, py36h9f0ad1d_1, py37hc8dfbb8_1, py36hc560c46_1, py27_1001, py27_1000, py38h32f6830_1, py35_0 : Python library designed for screen-scraping syllabs_admin/beautifulsoup4 | 4.6.0 | conda | linux-64 | py27h490011d_0 tbalaburkina/beautifulsoup4 | 4.6.0 | conda | zos-z | py37_0 test_org_002/beautifulsoup4 | 4.5.3 | conda | [] | py36_0, py27_0, py35_0, py34_0 travis/beautifulsoup4 | 4.3.2 | conda | linux-64, osx-64 | py27_0 : Beautiful Soup sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree. ulmo/beautifulsoup4 | 4.3.2 | conda | linux-64, win-32, osx-64, linux-32, win-64 | py27_0 : Screen-scraping library wakari1/beautifulsoup4 | 4.3.2 | conda | linux-64 | py27_0 ziebel/beautifulsoup4 | 4.4.0 | conda | linux-64 | py34_0 : Screen-scraping library zoeith/beautifulsoup4 | 4.3.2 | conda | osx-64 | py27_0 : Screen-scraping library Found 54 packages Run ‘anaconda show <USER/PACKAGE>‘ to get installation details
我选择的版本是conda-forge/beautifulsoup4,在命令行中输入:
conda install -c https://conda.anaconda.org/conda-forge beautifulsoup4, 注意conda-forge和beautifulsoup4之间没有“/”。
原文:https://www.cnblogs.com/liliwang/p/12637666.html