1.测试Elasticsearch的分词
Elasticsearch有多种分词器(参考:https://www.jianshu.com/p/d57935ba514b)
Set the shape to semi-transparent by calling set_trans(5)
(1)standard analyzer:标准分词器(默认是这种)
set,the,shape,to,semi,transparent by,calling,set_trans,5
(2)simple analyzer:简单分词器
set, the, shape, to, semi, transparent, by, calling, set, trans
(3)whitespace analyzer:空白分词器。大小写,下划线等都不会转换
Set, the, shape, to, semi-transparent, by, calling, set_trans(5)
(4)language analyzer:(特定语言分词器,比如说English英语分瓷器)
set, shape, semi, transpar, call, set_tran, 5
http://localhost:9200/_analyze?analyzer=standard&pretty=true&text=test测试
分词结果
{ "tokens" : [ { "token" : "test", "start_offset" : 0, "end_offset" : 4, "type" : "<ALPHANUM>", "position" : 0 }, { "token" : "测", "start_offset" : 4, "end_offset" : 5, "type" : "<IDEOGRAPHIC>", "position" : 1 }, { "token" : "试", "start_offset" : 5, "end_offset" : 6, "type" : "<IDEOGRAPHIC>", "position" : 2 } ] }
简单分词器 : simple analyzer
http://localhost:9200/_analyze?analyzer=simple&pretty=true&text=test_测试
结果
{ "tokens" : [ { "token" : "test", "start_offset" : 0, "end_offset" : 4, "type" : "word", "position" : 0 }, { "token" : "测试", "start_offset" : 5, "end_offset" : 7, "type" : "word", "position" : 1 } ] }
原文:https://www.cnblogs.com/tonglin0325/p/10088021.html