所含的包可以在这里下载:http://www.nltk.org/nltk_data/
确定词根
import nltk nltk.download(‘wordnet‘) lemmatizer = WordNetLemmatizer()#确定词源 print(lemmatizer.lemmatize(‘gathering‘, ‘v‘)) print(lemmatizer.lemmatize(‘gathering‘, ‘n‘))
输出:
gather gathering
https://kite.com/python/docs/nltk.word_tokenize
分词:
import nltk nltk.download(‘punkt‘) sentence = "At eight o‘clock on Thursday morning, Arthur didn‘t feel very good." print(word_tokenize(sentence)) # [‘At‘, ‘eight‘, "o‘clock", ‘on‘, ‘Thursday‘, ‘morning‘, ‘,‘, ‘Arthur‘, ‘did‘, "n‘t", ‘feel‘, ‘very‘, ‘good‘, ‘.‘]
原文:https://www.cnblogs.com/BlueBlueSea/p/13154590.html