一、Elasticsearch内置分词器
#Simple Analyzer – 按照非字母切分(符号被过滤),小写处理
#Stop Analyzer – 小写处理,停用词过滤(the,a,is)
#Whitespace Analyzer – 按照空格切分,不转小写
#Keyword Analyzer – 不分词,直接将输入当作输出
#Patter Analyzer – 正则表达式,默认 \W+ (非字符分隔)
#Language – 提供了30多种常见语言的分词器
1,Standard Analyzer
2, Simple Analyzer
3,Whitespace Analyzer
4,Stop Analyzer
5,Keywork Analyzer
6,Pattern Analyzer
#Simple Analyzer – 按照非字母切分(符号被过滤),小写处理 #Stop Analyzer – 小写处理,停用词过滤(the,a,is) #Whitespace Analyzer – 按照空格切分,不转小写 #Keyword Analyzer – 不分词,直接将输入当作输出 #Patter Analyzer – 正则表达式,默认 \W+ (非字符分隔) #Language – 提供了30多种常见语言的分词器 #2 running Quick brown-foxes leap over lazy dogs in the summer evening #查看不同的analyzer的效果 #standard GET _analyze { "analyzer": "standard", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." } #simpe GET _analyze { "analyzer": "simple", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." } GET _analyze { "analyzer": "stop", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." } #stop GET _analyze { "analyzer": "whitespace", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." } #keyword GET _analyze { "analyzer": "keyword", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." } GET _analyze { "analyzer": "pattern", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." } #english GET _analyze { "analyzer": "english", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." } POST _analyze { "analyzer": "icu_analyzer", "text": "他说的确实在理”" } POST _analyze { "analyzer": "standard", "text": "他说的确实在理”" } POST _analyze { "analyzer": "icu_analyzer", "text": "这个苹果不大好吃" }
二、中文分词 ICU Analyzer
//直接指定analyze进行测试 GET _analyze { "analyzer":"icu_analyzer", "text":"你好中国" }
2,其他中文分词插件
原文:https://www.cnblogs.com/zd1994/p/12650261.html