首页 > 编程语言 > 详细

python爬取珞珈1号卫星数据

时间:2019-03-28 12:23:10      阅读:368      评论:0      收藏:0      [点我收藏+]

首先登录珞珈一号数据系统查询想要的数据

 

技术分享图片

 

利用浏览器审查元素获取包含下载信息的源码

将最右侧的table相关的网页源码copy到剪切板备用

利用python下载数据

 

## utf-8



import requests
import os
# import urllib.request
from bs4 import BeautifulSoup
from tqdm import tqdm
import pandas as pd 


def saveFile(url,fileName):

    # ‘‘‘ 保存文件‘‘‘

    r = requests.get(url, stream=True)
    chunkSize = 256
    # print(‘dowloading...‘,fileName)
    with open(data/+fileName, wb) as f:
        pbar = tqdm( unit="B", total=int( r.headers[Content-Length] ) ,desc = "downloading..."+fileName)
        for chunk in r.iter_content(chunk_size=chunkSize):
            if chunk: # filter out keep-alive new chunks
                pbar.update (len(chunk))
                f.write(chunk)


html = ‘‘‘将table的源码粘贴到这里‘‘‘

##  get download url and file name

soup = BeautifulSoup(html)
tbody = soup.findAll(tbody)[0]
trs = tbody.findAll("tr")

data = []
for tr in trs:
    tds = tr.findAll("td")[-4:]
    temp = []

    # 
    for td in tds[:-1]:
        temp.append(td.text)

    a = tds[-1].findAll("a")[-1]

##   download url
    href = "http://59.175.109.173:8888" + a["href"]

    temp.append(href)

    data.append(temp)

dataSet = pd.DataFrame(data,columns = ["weixing","chuanganqi","time","url"])

###file name
dataSet.loc[:,"fileName"] = dataSet.loc[:,"weixing"] + dataSet.loc[:,"chuanganqi"] + dataSet.loc[:,"time"] + "-" + dataSet.index.map(str) + ".tar.gz"




#### dowload


for i in tqdm(range(dataSet.shape[0])):
    # if i<start:
    #     continue

    # if i > 200:
    #     continue
    row = dataSet.loc[i,:]
    fileName = row["fileName"]
    url = row["url"]
    saveFile(url,fileName)

 

 

 

python爬取珞珈1号卫星数据

原文:https://www.cnblogs.com/wybert/p/10613873.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!