首页 > 数据库技术 > 详细

【转载】 导入GoogleClusterData到MySQL

时间:2019-05-21 14:35:47      阅读:136      评论:0      收藏:0      [点我收藏+]

原文地址:

https://www.cnblogs.com/instant7/p/4159022.html

 

 

 

---------------------------------------------------------------------------------------------

 

 

 

 

  本篇随笔记录如何导入google-cluster-data-2011-1-2的

 

job_events和task_events到MySQL

 

 

 

1. 下载数据

download_job_events:

import urllib2

url = https://commondatastorage.googleapis.com/clusterdata-2011-2/
f = open(C:\\SHA256SUM)
l = f.readlines()
f.close()
for i in l:
    if i.count(job_events)>0:
        fileAddr = i.split()[1][1:]
        fileName = fileAddr.split(/)[1]
        print downloading, fileName
        data = urllib2.urlopen(url+fileAddr).read()
        print saving, fileName
        fileDown = open(C:\\job_events\\+fileName, wb)
        fileDown.write(data)
        fileDown.close()

 

 

(ps:   由于上面的代码为python2.7的,现在一般使用python3的,于是给出python3版本的代码如下:

#encoding:UTF-8

from urllib import request

url = https://commondatastorage.googleapis.com/clusterdata-2011-2/
f = open(C:\\SHA256SUM)
l = f.readlines()
f.close()
for i in l:
    if i.count(job_events)>0:
        fileAddr = i.split()[1][1:]
        fileName = fileAddr.split(/)[1]
        print(downloading, fileName)
        data = request.urlopen(url+fileAddr).read()
        print(saving, fileName)
        fileDown = open(C:\\job_events\\+fileName, wb)
        fileDown.write(data)
        fileDown.close()

 

 

 

 

 

 

download_task_events:

import urllib2

url = https://commondatastorage.googleapis.com/clusterdata-2011-2/
f = open(C:\\SHA256SUM)
l = f.readlines()
f.close()
for i in l:
    if i.count(task_events)>0:
        fileAddr = i.split()[1][1:]
        fileName = fileAddr.split(/)[1]
        print downloading, fileName
        data = urllib2.urlopen(url+fileAddr).read()
        print saving, fileName
        fileDown = open(C:\\task_events\\+fileName, wb)
        fileDown.write(data)
        fileDown.close()

 

 

(ps:   由于上面的代码为python2.7的,现在一般使用python3的,于是给出python3版本的代码如下:

#encoding:UTF-8

from urllib import request

url = https://commondatastorage.googleapis.com/clusterdata-2011-2/
f = open(C:\\SHA256SUM)
l = f.readlines()
f.close()
for i in l:
    if i.count(task_events)>0:
        fileAddr = i.split()[1][1:]
        fileName = fileAddr.split(/)[1]
        print(downloading, fileName)
        data = request.urlopen(url+fileAddr).read()
        print(saving, fileName)
        fileDown = open(C:\\task_events\\+fileName, wb)
        fileDown.write(data)
        fileDown.close()

)

 

 

 

注意:这次用的数据是

clusterdata-2011-2

不同于之前重画GoogleCLusterData中的

clusterdata-2011-1

 

 

 

 

 

 

 

2. 解压缩

由于不能直接导入压缩包里的数据到mysql,故先将它们解压缩

unzip_job_events:

import gzip
import os

fileNames = os.listdir(C:\\task_events)

for l in fileNames:
    print now at: + l
    f = gzip.open(C:\\job_events\\+l)
    fOut = open(C:\\job_events_unzip\\+l[:-3], w)
    content = f.read()
    fOut.write(content)
    f.close()
    fOut.close()
    #raw_input()

 

 

 

python3 版本

import gzip
import os

fileNames = os.listdir(C:\\job_events)

for l in fileNames:
    print( now at: + l )
    f = gzip.open(C:\\job_events\\+l)
    fOut = open(C:\\job_events_unzip\\+l[:-3], wb)
    content = f.read()
    fOut.write(content)
    f.close()
    fOut.close()
    #raw_input()

 

 

 

 

 

 

 

unzip_task_events:

import gzip
import os

fileNames = os.listdir(C:\\task_events)

for l in fileNames:
    print now at: + l
    f = gzip.open(C:\\task_events\\+l)
    fOut = open(C:\\task_events_unzip\\+l[:-3], w)
    content = f.read()
    fOut.write(content)
    f.close()
    fOut.close()

 

python3 版本:

import gzip
import os

fileNames = os.listdir(C:\\task_events)

for l in fileNames:
    print( now at: + l )
    f = gzip.open(C:\\task_events\\+l)
    fOut = open(C:\\task_events_unzip\\+l[:-3], wb)
    content = f.read()
    fOut.write(content)
    f.close()
    fOut.close()
    #raw_input()

 

 

 

 

 

 

3. 建数据库

create_job_events:

create table job_events(
time bigint,
missing_info int,
job_id bigint,
event_type int,
user text,
scheduling_class int,
job_name text,
logical_job_name text)
engine = myisam;

 

 

 

 

create_task_events:

create table task_events(
time bigint,
missing_info int,
job_id bigint,
task_index bigint,
machine_id bigint,
event_type int,
user text,
scheduling_class int,
priority int,
cpu_request float,
memory_request float,
disk_space_request float,
difference_machine_restriction boolean
)engine = myisam;

 

注意:由于数据量非常大,这里一定要选择myisam作为engine。

 

 

 

 

 

 

4. 导入数据

由于数据中有部分为空的值,需要先设定mysql使其能够导入空值。

具体方法为:

在mysql的控制台输入

SET @@GLOBAL.sql_mode="NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION";

之后就可以开始导入数据了。

注意!!以下代码在导入类似2.3e-10的数据会产生严重问题,具体为导入的数据在MySQL中变为负数,而且绝对值不小!!!

 

 

 

 

loadJobEvents2MySQL.py

import os
import MySQLdb

fileNames = os.listdir(C:\\task_events_unzip)

conn=MySQLdb.connect(host="localhost",user="root",passwd="123456",db="googleclusterdata",charset="utf8")
cursor = conn.cursor()
cursor.execute(truncate job_events)

for f in fileNames:
    print now at: + f
    order = "load data infile ‘C:/job_events_unzip/%s‘ into table job_events fields terminated by ‘,‘ lines terminated by ‘\n‘" %f
    print order
    cursor.execute(order)
    conn.commit()

 

 

 

 

 

 

loadTaskEvents2MySQL.py

import os
import MySQLdb

fileNames = os.listdir(C:\\task_events_unzip)

conn=MySQLdb.connect(host="localhost",user="root",passwd="123456",db="googleclusterdata",charset="utf8")
cursor = conn.cursor()
cursor.execute(truncate task_events)

for f in fileNames:
    print now at: + f
    order = "load data infile ‘C:/task_events_unzip/%s‘ into table task_events fields terminated by ‘,‘ lines terminated by ‘\n‘" %f
    print order
    cursor.execute(order)
    conn.commit()

 

 

注意:这里需要相应的修改密码和使用的数据库名(db)

 

 
 

 

 

 

 

---------------------------------------------------------------------------------------------

 

【转载】 导入GoogleClusterData到MySQL

原文:https://www.cnblogs.com/devilmaycry812839668/p/10899567.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!