jsoup 教程

时间：2021-09-16 13:15:18 阅读：36 评论：0 收藏：0 [点我收藏+]

从URL获取HTML来解析

Document doc = Jsoup.connect("http://www.baidu.com/").get();
String title = doc.title();

其中Jsoup.connect("xxx")方法返回一个org.jsoup.Connection对象。
在Connection对象中，我们可以执行get或者post来执行请求。但是在执行请求之前，
我们可以使用Connection对象来设置一些请求信息。比如：头信息，cookie，请求等待时间，代理等等来模拟浏览器的行为。

Document doc = Jsoup.connect("http://example.com")
  .data("query", "Java")
  .userAgent("Mozilla")
  .cookie("auth", "token")
  .timeout(3000)
  .post();

获得Document对象后，接下来就是解析Document对象，并从中获取我们想要的元素了。

Document中提供了丰富的方法来获取指定元素。

◇使用DOM的方式来取得

　　getElementById(String id)：通过id来获取
　　getElementsByTag(String tagName)：通过标签名字来获取
　　getElementsByClass(String className)：通过类名来获取
　　getElementsByAttribute(String key)：通过属性名字来获取
　　getElementsByAttributeValue(String key, String value)：通过指定的属性名字，属性值来获取
　　getAllElements()：获取所有元素

◇通过类似于css或jQuery的选择器来查找元素

　　使用的是Element类的下记方法：

　　public Elements select(String cssQuery)

　　通过传入一个类似于CSS或jQuery的选择器字符串，来查找指定元素。

File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");

Elements links = doc.select("a[href]"); //带有href属性的a元素
Elements pngs = doc.select("img[src$=.png]");
  //扩展名为.png的图片

Element masthead = doc.select("div.masthead").first();
  //class等于masthead的div标签

Elements resultLinks = doc.select("h3.r > a"); //在h3元素之后的a元素

https://www.jianshu.com/p/fd5caaaa950d

https://www.yiibai.com/jsoup/jsoup-quick-start.html

jsoup 教程

原文：https://www.cnblogs.com/luweiweicode/p/15267503.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)