首页 > 其他 > 详细

Elasticsearch学习笔记——分词

时间:2018-12-08 16:49:10      阅读:188      评论:0      收藏:0      [点我收藏+]

1.测试Elasticsearch的分词

Elasticsearch有多种分词器(参考:https://www.jianshu.com/p/d57935ba514b)

Set the shape to semi-transparent by calling set_trans(5)

(1)standard analyzer:标准分词器(默认是这种)
set,the,shape,to,semi,transparent by,calling,set_trans,5

(2)simple analyzer:简单分词器
set, the, shape, to, semi, transparent, by, calling, set, trans

(3)whitespace analyzer:空白分词器。大小写,下划线等都不会转换
Set, the, shape, to, semi-transparent, by, calling, set_trans(5)

(4)language analyzer:(特定语言分词器,比如说English英语分瓷器)
set, shape, semi, transpar, call, set_tran, 5

标准分词器 : standard analyzer
http://localhost:9200/_analyze?analyzer=standard&pretty=true&text=test测试

分词结果

{
  "tokens" : [
    {
      "token" : "test",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "测",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "试",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    }
  ]
}

简单分词器 : simple analyzer

http://localhost:9200/_analyze?analyzer=simple&pretty=true&text=test_测试

 结果

{
  "tokens" : [
    {
      "token" : "test",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "测试",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "word",
      "position" : 1
    }
  ]
}

 

Elasticsearch学习笔记——分词

原文:https://www.cnblogs.com/tonglin0325/p/10088021.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!