nodejs实现一个简易的爬虫

时间：2020-06-25 15:27:00 阅读：82 评论：0 收藏：0 [点我收藏+]

我们用nodejs的http模块实现一个简单的爬虫：

什么事爬虫呢？就是我们获取到网页上面的一些数据信息，我们把它爬下来，爬到本地。比如说我们可以爬图片、爬html文档等。

下面来简单实现以下，如何去爬一个网页：

const https = require("https")
const fs = require("fs")

// 使用http的get方法，来爬去小滴课堂官网的数据 这里需要注意 我们爬取的是 https的网页 用的是https模块
https.get("https://xdclass.net/#/index",res => {
    //设置一下编码格式
    res.setEncoding(‘utf8‘);
    // 创建一个html变量
    let html = ‘‘;
    // 监听response的data事件，将获取到的数据 保存在 html 变量中
    res.on(‘data‘,chunk => {
        html += chunk;
    })
    // 监听一下 响应结束的方法
    res.on(‘end‘,()=>{
        console.log(html);
        // 用fs模块的writeFile方法，将网页内容 写入到index.txt文件中  这个方法会自动创建文件
        fs.writeFile(‘./index.txt‘,html,(err)=>{
            if(err) throw err;
            console.log("写入成功");
        })
    })
})

。

nodejs实现一个简易的爬虫

原文：https://www.cnblogs.com/fqh123/p/13191833.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)