Python实际应用-数据处理(二) 数据特定格式变化

时间：2014-09-22 23:17:54 阅读：362 评论：0 收藏：0 [点我收藏+]

目前的状况是：

1. 在我一个文件夹下面有许多文件名是这样的数据文件

part-m-0000

part-m-0001

part-m-0002

part-m-0003

...

2. 其中每个文件夹里的数据是这样格式：

"460030730101160","3","0","0","0","2013/8/31 0:21:42"
"460036745672363","3","0","0","0","2013/8/31 0:21:31"
"460030250931114","3","1307","1","0","2013/8/31 0:21:40"
"460030250942643","3","0","0","0","2013/8/31 0:21:40"
"460036650411006","3","1021","1","0","2013/8/31 0:21:39"
"000000000009674","8","0","0","0","2013/8/31 0:12:28"
"000000000005661","8","0","0","0","2013/8/31 0:12:29"
"460030731390121","3","0","0","0","2013/8/31 21:54:00"
"460030256111396","3","0","0","0","2013/8/31 21:54:00"
"460030207447762","3","0","0","0","2013/8/31 21:53:58"
"460030250939916","3","0","0","0","2013/8/31 21:53:58"
"460030957972011","3","1613","0","0","2013/8/31 21:53:51"
"460030237206739","3","0","0","0","2013/8/31 21:53:59"
...

现在需要将数字上的引号去掉，同时将最后一列的时间的小时提取出来，下面是我用python处理的过程：

1. 先遍历当前文件夹下所有的以‘part‘开头的文件；

2. 对每一个文件，读取每一行，根据“，”进行分割；

3. 之后读每一部分取引号中间的部分，对最后一项时间取小时数部分，这里需要判断小时的位数是1还是2；

4. 每读一行就写一行

下面是具体的待买

#coding: utf-8
import os
for root,dir,files in os.walk("./"):
        for file in files:
                if file.startswith("part"):
                        filepath = "./"+file #This is the current file path
                        print filepath
                        newfilepath = "./data_handled/"+file[7:] # This is file used to write into
                        file = open(filepath)
                        newfile = open(newfilepath,'w')
                        for line in file:
                                string = ""
                                line_ = line.split(',')
                                for i in range(len(line_)-1):
                                        j = line_[i][1:len(line_[i])-1] #Delte the " "
                                        string += j
                                        string += ','
                                len1 = len(line_)
                                if len(line_[len1-1]) > 12:
                                        if line_[len1-1][12]==':':
                                                k = line_[len1-1][11:12]
                                        else:
                                                k = line_[len1-1][11:13]
                                else :
                                        k = "-1"
                                string += k
                                newfile.write(string+"\n")
                        newfile.close()

Python实际应用-数据处理(二) 数据特定格式变化

原文：http://blog.csdn.net/michael_kong_nju/article/details/39482903

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)