Python 中文词频统计，热词统计，简要分析（含上手源码）

时间：2020-02-03 00:18:30 阅读：463 评论：0 收藏：0 [点我收藏+]

jieba库有三种模式

精确模式、全模式、搜索引擎模式

- 精确模式：把文本精确的切分开，不存在冗余单词
- 全模式：把文本中所有可能的词语都扫描出来，有冗余

- 搜索引擎模式：在精确模式基础上，对长词再次切分

应用实例：

技术分享图片

代码：

 1 import jieba
 2 
 3 file = open(‘E:/578095023/FileRecv/寒假作业/test.txt‘, encoding="utf-8")
 4 txt = file.read()
 5 #words = jieba.lcut(txt)  #无空格
 6 #words = jieba.lcut(txt,cut_all=True)   #有空格
 7 words = jieba.lcut_for_search(txt)
 8 counts = {}
 9 for word in words:
10     if len(word) == 1:
11         continue
12     else:
13         counts[word] = counts.get(word, 0) + 1
14 
15 items = list(counts.items())
16 
17 items.sort(key=lambda x: x[1], reverse=True)
18 # items.sort(reverse = True)
19 for i in range(20):
20     word, count = items[i]
21     print(word, count)
22 #    print(‘{0:<10}{1:>5}‘.format(word,count))

Python 中文词频统计，热词统计，简要分析（含上手源码）

原文：https://www.cnblogs.com/smartisn/p/12254250.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)