python统计word文档中的词频

时间：2020-03-11 00:26:11 阅读：234 评论：0 收藏：0 [点我收藏+]

如何将统计word文档中的词频呢？先用docx模块将word文档转变成txt格式，然后使用jieba模块进行分词，并统计词频。是不是很简单～

#2020年3月10日
#Elizabeth
from docx import Document
import jieba #分词模块

#自定义函数，将word文档写入txt文档
def to_txt(path):
    document=Document(path)
    txt=open(‘/Users/fangluping/Desktop/数据分析笔试试题/词频统计.txt‘,‘w+‘)
    for paragraph in document.paragraphs:
        text=paragraph.text 
        txt.write(text)
    txt.close()
    return txt

if __name__==‘__main__‘:
    path0=‘/Users/fangluping/Desktop/数据分析笔试试题/笔试题目-V1.0.docx‘
    to_txt(path0) #调用写入txt文档的函数

    #分词
    txt=open(‘/Users/fangluping/Desktop/词频统计.txt‘,‘r‘,encoding=‘utf-8‘).read()
    words=jieba.lcut(txt)
    counts={}
    for word in words:
        if len(word)==1:
            continue
        else:
            counts[word]=counts.get(word,0)+1
    items=list(counts.items())
    items.sort(key=lambda x:x[1],reverse=True)

    for i in range(10):
        word,count=items[i]
        print("{0:<10}{1:>5}".format(word,count))

python统计word文档中的词频

原文：https://blog.51cto.com/14534896/2477002

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)