在github上,已经有前辈对这两种格式的文件间的转换提供了相应的python库,比如liac-arff: https://github.com/renatopp/liac-arff。但是当程序比较复杂时,再调用这么多外部文件,未免显得冗杂;而且这些arff库,在attribute和值数目不一致时,会报错。所以,在师兄的支持下,我参考overflow写了两个简单的转换函数。(用时5个多小时。。。以后要效率啊)
arff2txt():
将arff文件转换成txt格式:
import re import sys def arff2txt(filename): txtfile = open(‘./generatedtxt.txt‘,‘w‘) arr = [] lines = [] arff_file = open(filename) for line in arff_file: if not (line.startswith("@")): if not (line.startswith("%")): line = line.strip("\n") line = line.split(‘,‘) arr.append(line) del arr[0] for child in arr: del child[10] if child[9] == "True": child[9] = 1 else: child[9] = 0 lines.append(‘\t‘.join(map(str,child))) result = ‘\n‘.join(lines) print result txtfile.writelines(result) txtfile.close()
txt2arff():
将txt文件转换成arff()格式:
def txt2arff(filename, value): with open(‘./generatedarff.arff‘, ‘w‘) as fp: fp.write(‘‘‘@relation ExceptionRelation @attribute ID string @attribute Thrown numeric @attribute SetLogicFlag numeric @attribute Return numeric @attribute LOC numeric @attribute NumMethod numeric @attribute EmptyBlock numeric @attribute RecoverFlag numeric @attribute OtherOperation numeric @attribute class-att {True,False} @data ‘‘‘) with open(filename) as f: contents = f.readlines() for content in contents: lines = content.split(‘\t‘) lines = [line.strip() for line in lines] if lines[9] == ‘1‘: lines[9] = "True" lines.append(‘{‘ + str(value) + ‘}‘) else: lines[9] = "False" lines.append(‘{1}‘) array = ‘,‘.join(lines) fp.write("%s\n" % array)
arff文件和txt文件之间的转换_python,布布扣,bubuko.com
原文:http://www.cnblogs.com/cuiyunGAO/p/3920263.html