首页 > 其他 > 详细

综合练习:词频统计

时间:2018-03-28 23:06:01      阅读:206      评论:0      收藏:0      [点我收藏+]
f = open("peng.txt", "r", encoding=‘utf-8‘)
song = f.read()
f.close()

sep = ‘‘‘,.?—!"‘‘‘

exclude = {‘the‘, ‘and‘, ‘i‘, ‘in‘, "i‘m", ‘a‘, ‘of‘, ‘an‘, ‘on‘, ‘to‘, ‘with‘}

for c in sep:
    song = song.replace(c, ‘ ‘)

swl = song.lower().split()

swd = {}

sws = set(swl) - exclude

for w in sws:
    swd[w] = swl.count(w)

fl = list(swd.items())

fl.sort(key=lambda x: x[1], reverse=True)

for i in fl:
    print(i)

f = open("result.txt", "w")
for i in range(20):
    f.write(fl[i][0] + "  " + str(fl[i][1]) + "\n")
f.close()

  技术分享图片

import jieba

f = open(‘weicheng.txt‘, ‘r‘, encoding=‘utf-8‘)
text = f.read()
f.close()

p = ‘‘‘,。‘’“”:;()!?、 ‘‘‘
a = {
    ‘的‘, ‘\n‘, ‘\u3000‘,
    ‘曰‘, ‘之‘, ‘不‘, ‘人‘, ‘一‘, ‘大‘, ‘马‘, ‘来‘, ‘有‘, ‘于‘, ‘下‘, ‘此‘,
}
for i in p:
    text = text.replace(i, ‘‘)
print(list(jieba.cut(text)))
t = list(jieba.lcut(text))
print(t)
count = {}
wl = list(set(t) - a)
print(wl)

for i in range(0, len(wl)):
    count[wl[i]] = text.count(str(wl[i]))

cl = list(count.items())
cl.sort(key=lambda x: x[1], reverse=True)
print(cl)

f = open(‘wcCount.txt‘, ‘a‘)
for i in range(20):
    f.write(cl[i][0] + ‘:‘ + str(cl[i][1]) + ‘\n‘)
f.close()

  技术分享图片

 

综合练习:词频统计

原文:https://www.cnblogs.com/phoenlix/p/8666515.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!