首页 > 编程语言 > 详细

【Python爬虫】股票数据定向爬虫

时间:2020-07-12 13:59:08      阅读:186      评论:0      收藏:0      [点我收藏+]

技术分享图片

 

 爬取网站:

http://quote.eastmoney.com/center/gridlist.html

https://stockapp.finance.qq.com/mstats/

技术分享图片

 

 

import requests
from bs4 import BeautifulSoup
import traceback
import re
 
def getHTMLText(url, code="utf-8"):
    try:
        r = requests.get(url)
        r.raise_for_status()
        r.encoding = code
        return r.text
    except:
        return ""
 
def getStockList(lst, stockURL):
    html = getHTMLText(stockURL, "GB2312")
    soup = BeautifulSoup(html, html.parser) 
    a = soup.find_all(a)
    for i in a:
        try:
            href = i.attrs[href]
            lst.append(re.findall(r"[s][hz]\d{6}", href)[0])
        except:
            continue
 
def getStockInfo(lst, stockURL, fpath):
    count = 0
    for stock in lst:
        url = stockURL + stock + ".html"
        html = getHTMLText(url)
        try:
            if html=="":
                continue
            infoDict = {}
            soup = BeautifulSoup(html, html.parser)
            stockInfo = soup.find(div,attrs={class:stock-bets})
 
            name = stockInfo.find_all(attrs={class:bets-name})[0]
            infoDict.update({股票名称: name.text.split()[0]})
             
            keyList = stockInfo.find_all(dt)
            valueList = stockInfo.find_all(dd)
            for i in range(len(keyList)):
                key = keyList[i].text
                val = valueList[i].text
                infoDict[key] = val
             
            with open(fpath, a, encoding=utf-8) as f:
                f.write( str(infoDict) + \n )
                count = count + 1
                print("\r当前进度: {:.2f}%".format(count*100/len(lst)),end="")
        except:
            count = count + 1
            print("\r当前进度: {:.2f}%".format(count*100/len(lst)),end="")
            continue
 
def main():
    stock_list_url = https://quote.eastmoney.com/stocklist.html
    stock_info_url = https://gupiao.baidu.com/stock/
    output_file = D:/BaiduStockInfo.txt
    slist=[]
    getStockList(slist, stock_list_url)
    getStockInfo(slist, stock_info_url, output_file)
 
main()

 

【Python爬虫】股票数据定向爬虫

原文:https://www.cnblogs.com/HGNET/p/13275081.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!