首页 > 数据库技术 > 详细

scrapy存储mysql

时间:2019-05-07 19:25:49      阅读:140      评论:0      收藏:0      [点我收藏+]
技术分享图片
#spider.py
from scrapy.linkextractors import LinkExtractor from scrapy.spiders import CrawlSpider, Rule from Cwpjt.items import CwpjtItem class FulongSpider(CrawlSpider): name = ‘fulong‘ allowed_domains = [‘sina.com.cn‘] start_urls = [‘http://sina.com.cn/‘] ‘http://news.sina.com.cn/c/2017-05-09/doc-ifyeycte9324112.shtml‘ rules = ( Rule(LinkExtractor(allow=(‘.*?/[0-9]{4}.[0-9]{2}.[0-9]{2}.doc-.*?shtml‘),allow_domains=(‘sina.com.cn‘)), callback=‘parse_item‘, follow=True), ) def parse_item(self, response): i = CwpjtItem() i[‘name‘]=response.xpath(‘/html/head/title/text()‘).extract() i[‘kws‘] = response.xpath(‘/html/head/meta[@name="keywords"]/@content‘).extract() #i[‘domain_id‘] = response.xpath(‘//input[@id="sid"]/@value‘).extract() #i[‘name‘] = response.xpath(‘//div[@id="name"]‘).extract() #i[‘description‘] = response.xpath(‘//div[@id="description"]‘).extract() return i
技术分享图片

pipeline

技术分享图片
import pymysql
from pymysql import connections
class CwpjtPipeline(object):
    def __init__(self):
        self.conn = pymysql.connect(host=‘127.0.0.1‘,user=‘root‘,passwd=‘123456‘,db =‘mydb‘)
        self.cursor = self.conn.cursor()
    def process_item(self, item, spider):
        name = item[‘name‘][0]
        kws = item[‘kws‘][0]
        sql ="insert into hehe(title,kws) VALUES(%s,%s)"
        self.cursor.execute(sql,(name,kws,))
        self.conn.commit()
        return item
    def close_spider(self,spider):
        self.conn.close()
技术分享图片

item

技术分享图片
import scrapy


class CwpjtItem(scrapy.Item):
    # define the fields for your item here like:
    name = scrapy.Field()
    kws = scrapy.Field()
技术分享图片

 

scrapy存储mysql

原文:https://www.cnblogs.com/duanlinxiao/p/10827264.html

(0)
(0)
   
举报
评论 一句话评论(0
分享档案
最新文章
教程昨日排行
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!