基于Java的html解析器Jsoup的简单介绍
Jsoup是一个基于Java的HTML解析器,可直接解析某个URL地址、HTML文本内容。
Maven下载:
<!-- https://mvnrepository.com/artifact/org.jsoup/jsoup -->
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.9.2</version>
</dependency>
特点:
Lorem
Ipsum parses to
Lorem
Ipsum
)| ?) |
HTMl字符串body片断URL解析一个Document 可以通过选择器来查找元素文档HTMl字符串
String html = "<html><head><meta charset=‘UTF-8‘><title>three.js</title></head><body>"+
"<script type=‘text/javascript‘ src=‘js/Three/three.js‘></script>"+
"<script></script></body></html>";
Document doc = Jsoup.parse(html);
System.out.println(doc);
output:
<html>
<head>
<meta charset="UTF-8" />
<title>three.js</title>
</head>
<body>
<script type="text/javascript" src="js/Three/three.js"></script>
<script></script>
</body>
</html>
URL,这里我们解析的是Google的首页的登陆按钮(按钮id:gb_70)然后回去按钮的文本。
Document google = Jsoup.connect("https://www.google.com.hk/").get();
Element login = google.getElementById("gb_70");
System.out.println(login.text());
output:登入
原文:https://www.cnblogs.com/chenjy1225/p/9661350.html