首页 > 数据库技术 > 详细

将python爬取到的数据导入到mysql中

时间:2020-03-13 22:31:06      阅读:216      评论:0      收藏:0      [点我收藏+]

1.创建scrapy框架和爬虫程序

2.定义settings.py

技术分享图片

 

 3.编写spider爬虫程序

技术分享图片
 1 #!/usr/bin/python3
 2 #-*-coding:UTF-8-*-
 3 import scrapy
 4 import sys
 5 import time
 6 sys.path.append("..")
 7 from top250.items import Top250Item
 8 
 9 class Top250Spider(scrapy.Spider):
10         name="top250"
11         allowed_domains=["www.douban.com"]
12         start_urls=["https://movie.douban.com/top250"]
13         def parse(self,response):
14             sel=scrapy.selector.Selector(response)#response为参数
15             sites=sel.xpath("//li/div/div[2]")
16             for site in sites:
17                     item=Top250Item()
18                     item[title]=site.xpath("div[1]/a/span[1]/text()").extract()
19                     item[link]=site.xpath("div[1]/a/@href").extract()
20                     item[dc]=site.xpath("div[2]/p[2]/span/text()").extract()
21                     yield item#这一句很重要,省去了item().append
View Code

 4.items.py

技术分享图片
 1 # -*- coding: utf-8 -*-
 2 # Define here the models for your scraped items
 3 #
 4 # See documentation in:
 5 # https://docs.scrapy.org/en/latest/topics/items.html
 6 import scrapy
 7 
 8 class Top250Item(scrapy.Item):
 9     # define the fields for your item here like:
10     # name = scrapy.Field()
11     title=scrapy.Field()
12     link=scrapy.Field()
13     dc=scrapy.Field()
View Code

5.建立sql数据库和表格

6.定义pipelines

技术分享图片
 1 # -*- coding: utf-8 -*-
 2 
 3 # Define your item pipelines here
 4 #
 5 # Don‘t forget to add your pipeline to the ITEM_PIPELINES setting
 6 # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
 7 import json
 8 import pymysql
 9 class Top250Pipeline(object):
10     def __init__(self):
11         self.connect = pymysql.connect(host="localhost",user="root",password="",db="db2",charset="utf8",use_unicode=True)     
12         self.cursor = self.connect.cursor()
13     def process_item(self, item, spider):
14         self.sql="insert into top250(title,link,dc) values (%s,%s,%s)"
15         self.cursor.execute(self.sql,(item[title],item[link],item[dc]))
16         self.connect.commit()
17         return item
18     def close_spider(self,spider):
19         self.cursor.close()
20         self.connect.close()
21 ‘‘‘
22         with open(‘items.json‘, ‘a‘) as f:
23             json.dump(dict(item),f,ensure_ascii=False)
24             f.write(‘,\n‘)
25 ‘‘‘#后面多行注释部分为导出json表格时使用
View Code

7.在cmd-top250窗口输入scrapy crawl top250

将python爬取到的数据导入到mysql中

原文:https://www.cnblogs.com/lianghaiming/p/12489269.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!