我们的合作方式采取pair coding和 separate coding相结合的方式。刚开始的讨论设计,分配功能,建立GitHub仓库是一起做的,伙伴搭建好了框架,互相分配好要实现的函数,通过GitHub源码管理,进行分头编程。当遇到框架/关键函数/目标功能等问题时候进行讨论,pair coding解决
说话不够逗
我们使用了coverage包进行回归测试
xxxxxxxxxxcoverage run coverage_test.pycoverage report结果如下:
xxxxxxxxxxName Stmts Miss Cover--------------------------------------coverage_test.py 36 0 100%modes.py 94 0 100%utils.py 68 0 100%--------------------------------------TOTAL 198 0 100%
我们使用 python 的 cProfile 进行效能分析,根据最初稿的时间效能分析我们做了两次优化,以下是队友的分析与工作:
优化前:
?xTue Oct 30 20:14:19 2018 profile.stats? 697390 function calls (690360 primitive calls) in 0.650 seconds? Ordered by: internal time List reduced from 2079 to 10 due to restriction <10>? ncalls tottime percall cumtime percall filename:lineno(function) 22391 0.141 0.000 0.141 0.000 C:\Users\v-yizzha\Desktop\WordFrequency\modes.py:102(<listcomp>) 1375 0.061 0.000 0.061 0.000 {built-in method nt.stat} 22391 0.060 0.000 0.074 0.000 C:\Users\v-yizzha\Desktop\WordFrequency\utils.py:14(get_phrases) 1 0.045 0.045 0.382 0.382 C:\Users\v-yizzha\Desktop\WordFrequency\modes.py:83(mode_p) 27395 0.039 0.000 0.039 0.000 {method ‘split‘ of ‘re.Pattern‘ objects} 306 0.023 0.000 0.023 0.000 {built-in method marshal.loads} 12/11 0.020 0.002 0.023 0.002 {built-in method _imp.create_dynamic} 306 0.017 0.000 0.027 0.000 <frozen importlib._bootstrap_external>:914(get_data) 27798 0.011 0.000 0.062 0.000 C:\Users\v-yizzha\AppData\Local\Continuum\anaconda3\envs\nltk\lib\re.py:271(_compile)1067/1064 0.010 0.000 0.039 0.000 {built-in methodbuiltins.__build_class__}发现用时最长是modes.py的listcomp操作,队友发现他存stopword使用了list而不是set,增加了查找效率
xxxxxxxxxxpre_list = [word for word in pre_list if word not in stop_words]改变之后 效果如下:
xxxxxxxxxxTue Oct 30 20:23:31 2018 profile.stats? 697516 function calls (690485 primitive calls) in 0.510 seconds? Ordered by: internal time List reduced from 2094 to 10 due to restriction <10>? ncalls tottime percall cumtime percall filename:lineno(function) 1379 0.060 0.000 0.060 0.000 {built-in method nt.stat} 22391 0.058 0.000 0.072 0.000 C:\Users\v-yizzha\Desktop\WordFrequency\utils.py:14(get_phrases) 1 0.040 0.040 0.234 0.234 C:\Users\v-yizzha\Desktop\WordFrequency\modes.py:83(mode_p) 27395 0.037 0.000 0.037 0.000 {method ‘split‘ of ‘re.Pattern‘ objects} 304 0.023 0.000 0.023 0.000 {built-in method marshal.loads} 12/11 0.018 0.002 0.020 0.002 {built-in method _imp.create_dynamic} 308 0.018 0.000 0.028 0.000 <frozen importlib._bootstrap_external>:914(get_data) 22391 0.011 0.000 0.011 0.000 C:\Users\v-yizzha\Desktop\WordFrequency\modes.py:102(<listcomp>)1067/1064 0.010 0.000 0.039 0.000 {built-in method builtins.__build_class__} 27798 0.010 0.000 0.058 0.000 C:\Users\v-yizzha\AppData\Local\Continuum\anaconda3\envs\nltk\lib\re.py:271(_compile)发现listcomp时间 -0.13s,改变非常有效!
之后是我的测试与改变:
测试前:
xxxxxxxxxxThu Nov 1 18:20:35 2018 proflie.status? 1714748 function calls (1701302 primitive calls) in 1.118 seconds? Ordered by: internal time List reduced from 3945 to 10 due to restriction <10>? ncalls tottime percall cumtime percall filename:lineno(function) 22391 0.179 0.000 0.238 0.000 C:\Users\v-qiyao\Documents\WordFrequency\utils.py:14(get_phrases) 3163 0.111 0.000 0.111 0.000 {built-in method nt.stat} 100/78 0.059 0.001 0.085 0.001 {built-in method _imp.create_dynamic} 741 0.052 0.000 0.052 0.000 {built-in method marshal.loads} 1 0.041 0.041 0.455 0.455 C:\Users\v-qiyao\Documents\WordFrequency\modes.py:83(mode_p) 27395 0.040 0.000 0.040 0.000 {method ‘split‘ of ‘_sre.SRE_Pattern‘ objects} 105354 0.035 0.000 0.035 0.000 C:\Users\v-qiyao\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\probability.py:127(__setitem__) 743 0.035 0.000 0.054 0.000 <frozen importlib._bootstrap_external>:830(get_data) 992/1 0.032 0.000 1.119 1.119 {built-in method builtins.exec} 1 0.030 0.030 0.065 0.065 {built-in method _collections._count_eleme根据结果发现耗时最多的是get_phrases子函数,作用是从一句话中截取短语,经过对之前源代码的分析
x
while(len(pre_list) >= n): target_phrase = [] for i in range(n): if not_word(pre_list[i]): for j in range(i+1): pre_list.pop(0) break else: target_phrase.append(pre_list[i]) if len(target_phrase) == n : target_str = target_phrase[0] for i in range(n-1): target_str += " "+target_phrase[i+1] result.append(target_str) pre_list.pop(0) return result 结果多增加了一个tuple,多了没必要的pop操作,于是进行了以下优化:
xxxxxxxxxxfor j in range(len(pre_list)+1-n): target_phrase = "" for i in range(n): if not_word(pre_list[i+j]): j += i break elif target_phrase == "": target_phrase += pre_list[i+j] else : target_phrase += (‘ ‘ + pre_list[i+j]) if i == n-1: result.append(target_phrase)结果如下显示:
xxxxxxxxxxThu Nov 1 18:22:38 2018 proflie.status? 1187845 function calls (1174399 primitive calls) in 0.972 seconds? Ordered by: internal time List reduced from 3945 to 10 due to restriction <10>? ncalls tottime percall cumtime percall filename:lineno(function) 3163 0.109 0.000 0.109 0.000 {built-in method nt.stat} 22391 0.095 0.000 0.118 0.000 C:\Users\v-qiyao\Documents\WordFrequency\utils.py:14(get_phrases) 100/78 0.055 0.001 0.081 0.001 {built-in method _imp.create_dynamic} 741 0.052 0.000 0.052 0.000 {built-in method marshal.loads} 1 0.040 0.040 0.336 0.336 C:\Users\v-qiyao\Documents\WordFrequency\modes.py:83(mode_p) 27395 0.039 0.000 0.039 0.000 {method ‘split‘ of ‘_sre.SRE_Pattern‘ objects} 105544 0.036 0.000 0.036 0.000 C:\Users\v-qiyao\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\probability.py:127(__setitem__) 743 0.034 0.000 0.053 0.000 <frozen importlib._bootstrap_external>:830(get_data) 1 0.033 0.033 0.068 0.068 {built-in method _collections._count_elements} 992/1 0.030 0.000 0.973 0.973 {built-in method builtins.exec}get_phrases 函数运行时间 -0.08s 效果显著
根据结果输出,-n 10 -p 2 -v verbs.txt下时间已经缩小到0.27s,我们使用nltk函数库进行从list到dic并且sort的操作,cProfile输出显示最多时间为build-in函数,而经过大文件的测试,时间结果基本符合O(nlgn)增长,之前有过多次文件操作导致时间很慢,已经通过优化代码逻辑立刻解决掉了,并没有存为commit。
原文:https://www.cnblogs.com/yqsyqs/p/9892189.html