首页 > 其他 > 详细

自然语言处理术语

时间:2014-01-30 10:49:07      阅读:501      评论:0      收藏:0      [点我收藏+]

定义来自维基百科

Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens. The list of tokens becomes input for further processing such as parsing or text mining.

 

Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages, according to the rules of a formal grammar. Same meanning with Part of Speech(POS).

 

Text segmentation is the process of dividing written text into meaningful units, such as wordssentences, or topics.

 

In computer sciencelexical analysis is the process of converting a sequence of characters into a sequence of tokens.

自然语言处理术语

原文:http://www.cnblogs.com/wintor12/p/3536431.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!