首页 > 编程语言 > 详细

计算ngram距离-python实现【转载】

时间:2019-08-13 13:49:16      阅读:450      评论:0      收藏:0      [点我收藏+]

转自:https://flystarhe.github.io/docs-2014/algorithm/similarity-more/readme/

def Ngram_distance(str1, str2, n=2):
    tmp =   * (n-1)
    str1 = tmp + str1 + tmp#表示以首字母开头和本char结尾
    str2 = tmp + str2 + tmp
    set1 = set([str1[i:i+n] for i in range(len(str1)-(n-1))])
    set2 = set([str2[i:i+n] for i in range(len(str2)-(n-1))])
    setx = set1 & set2
    len1 = len(set1)
    len2 = len(set2)
    lenx = len(setx)
    num_dist = len1 + len2 - 2*lenx
    num_sim = 1 - num_dist / (len1 + len2)
    return set1,set2,{dist: num_dist, sim: num_sim}

print(Ngram_distance(girl,girlfriend))

输出结果:

({gi, ir, rl, l ,  g}, 
{gi, en, d , ir, lf, ie, rl, fr, ri, g, nd}, {dist: 8, sim: 0.5})

 

计算ngram距离-python实现【转载】

原文:https://www.cnblogs.com/BlueBlueSea/p/11345117.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!