朴素贝叶斯

时间：2018-01-20 23:43:46 阅读：284 评论：0 收藏：0 [点我收藏+]

#朴素：考虑每个特征或者词，出项的可能性与它和其他单词相邻没有关系
#每个特征等权重
from numpy import *

def loadDataSet():
    postingList=[[‘my‘, ‘dog‘, ‘has‘, ‘flea‘, ‘problems‘, ‘help‘, ‘please‘],
                 [‘maybe‘, ‘not‘, ‘take‘, ‘him‘, ‘to‘, ‘dog‘, ‘park‘, ‘stupid‘],
                 [‘my‘, ‘dalmation‘, ‘is‘, ‘so‘, ‘cute‘, ‘I‘, ‘love‘, ‘him‘],
                 [‘stop‘, ‘posting‘, ‘stupid‘, ‘worthless‘, ‘garbage‘],
                 [‘mr‘, ‘licks‘, ‘ate‘, ‘my‘, ‘steak‘, ‘how‘, ‘to‘, ‘stop‘, ‘him‘],
                 [‘quit‘, ‘buying‘, ‘worthless‘, ‘dog‘, ‘food‘, ‘stupid‘]]
    classVec = [0,1,0,1,0,1]    #1 is abusive, 0 not
    return postingList,classVec
#创建一个单词的集合
def createVocabList(dataSet):
    vocabSet = set([]) #创建空集合
    for document in dataSet:
        vocabSet |= set(document)
    return list(vocabSet)

#判断文档出现在词汇表中
def setOfWordsVec(vocabSet,inputSet):
    returnVec = [0]*len(vocabSet)
    for word in inputSet:
        if word in vocabSet:
            returnVec[vocabSet.index(word)] = 1
        else: print ("the word: %s is not in the Vocabulary!" % word)
    return returnVec

def main():
    listOPosts,listClasses = loadDataSet()
    myVocabList = createVocabList(listOPosts)
    print (myVocabList)
    print(setOfWordsVec(myVocabList, listOPosts[0]))
main()

朴素贝叶斯

原文：https://www.cnblogs.com/littlepear/p/8322251.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)