首页 > Web开发 > 详细

如何爬取网站代码

时间:2021-09-06 03:59:37      阅读:41      评论:0      收藏:0      [点我收藏+]
private static String getHtml(String urlInfo) throws Exception {
//读取目的网页URL地址,获取网页源码
URL url = new URL(urlInfo);
HttpURLConnection httpUrl = (HttpURLConnection)url.openConnection();
httpUrl.setConnectTimeout(30000);//连接主机的超时时间(单位:毫秒)
httpUrl.setReadTimeout(30000);//从主机读取数据的超时时间(单位:毫秒)
System.out.println(httpUrl.getContentEncoding());
InputStream is = httpUrl.getInputStream();
if("gzip".equals(httpUrl.getContentEncoding())){
//处理gzip压缩
is = new GZIPInputStream(is);
}
BufferedReader br = new BufferedReader(new InputStreamReader(is,"gb2312"));
StringBuilder sb = new StringBuilder();
String line;
while ((line = br.readLine()) != null) {
sb.append(line);
}
is.close();
br.close();
return sb.toString().trim();

}

如何爬取网站代码

原文:https://www.cnblogs.com/xing-nb/p/15226831.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!