写点什么

elasticsearch7.13.4 ik 中文分词器安装

用户头像
Rubble
关注
发布于: 34 分钟前
elasticsearch7.13.4 ik中文分词器安装

ik 分词器下载

ik分词器 git


百度网盘 提取码: fnq0


解压后放到 /plugins/ik 目录启动服务 ./bin/elasticsearch

测试

使用 kibana 的 Dev Tools

测试分词

 get /_analyze{  "analyzer": "standard",  "text": "王者荣耀"}
复制代码


standard 分词器结果


{  "tokens" : [    {      "token" : "王",      "start_offset" : 0,      "end_offset" : 1,      "type" : "<IDEOGRAPHIC>",      "position" : 0    },    {      "token" : "者",      "start_offset" : 1,      "end_offset" : 2,      "type" : "<IDEOGRAPHIC>",      "position" : 1    },    {      "token" : "荣",      "start_offset" : 2,      "end_offset" : 3,      "type" : "<IDEOGRAPHIC>",      "position" : 2    },    {      "token" : "耀",      "start_offset" : 3,      "end_offset" : 4,      "type" : "<IDEOGRAPHIC>",      "position" : 3    }  ]}
复制代码


get /_analyze{  "analyzer": "ik_smart",  "text": "王者荣耀"}
复制代码


ik_smart 分词结果


{  "tokens" : [    {      "token" : "王者",      "start_offset" : 0,      "end_offset" : 2,      "type" : "CN_WORD",      "position" : 0    },    {      "token" : "荣耀",      "start_offset" : 2,      "end_offset" : 4,      "type" : "CN_WORD",      "position" : 1    }  ]}
复制代码

分词器使用


# 创建索引put /indexik
# 索引映射,字段指定分词器put /indexik/_mapping{ "properties": { "content": { "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_smart" } }}
# 查看映射get /indexik/_mapping
# 添加测试数据post /indexik/_doc{"content":"美国留给伊拉克的是个烂摊子吗"}
post /indexik/_doc{"content":"公安部:各地校车将享最高路权"}
post /indexik/_doc{"content":"中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"}
post /indexik/_doc{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}
post /indexik/_doc{"content":"中华人民共和国"}

# 查询get /indexik/_search{ "query": { "match": { "content":"中国" } }}
# 删除索引DELETE /indexik
复制代码


查询得到结果比默认分词更准确


{  "took" : 2,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 2,      "relation" : "eq"    },    "max_score" : 0.79423964,    "hits" : [      {        "_index" : "indexik",        "_type" : "_doc",        "_id" : "3ObjKnsBEq3c_HSrSrAr",        "_score" : 0.79423964,        "_source" : {          "content" : "中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"        }      },      {        "_index" : "indexik",        "_type" : "_doc",        "_id" : "3ebjKnsBEq3c_HSrUbCx",        "_score" : 0.79423964,        "_source" : {          "content" : "中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"        }      }    ]  }}
复制代码

配置 ik 生效

踩过的坑

elasticsearch.yaml 添加配置 报错


# ik analyzerindex.analysis.analyzer.default.type: ik_max_word
复制代码


Since elasticsearch 5.x index level settings can NOT be set on the nodesconfiguration like the elasticsearch.yaml, in system properties or command linearguments.In order to upgrade all indices the settings must be updated via the/${index}/_settings API. Unless all settings are dynamic all indices must be closedin order to apply the upgradeIndices created in the future should use index templatesto set default values.


Please ensure all required values are updated on all indices by executing:


curl -XPUT 'http://localhost:9200/_all/_settings?preserve_existing=true' -d '{"index.analysis.analyzer.default.type" : "ik_max_word"}'


自 5.x 版本 不允许在 node 节点配置 index 级别属性,需要使用/${index}/_settings API 进行更改

索引设默认分词器

PUT indexik{  "settings": {    "analysis": {      "analyzer": {        "default": {          "type": "ik_smart"        }      }    }  }}
复制代码

配置 ik 分词器

IKAnalyzer.cfg.xml


<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"><properties>  <comment>IK Analyzer 扩展配置</comment>  <!--用户可以在这里配置自己的扩展字典 -->  <entry key="ext_dict">custom/mydict.dic;custom/single_word_low_freq.dic</entry>   <!--用户可以在这里配置自己的扩展停止词字典-->  <entry key="ext_stopwords">custom/ext_stopword.dic</entry>   <!--用户可以在这里配置远程扩展字典 -->  <entry key="remote_ext_dict">location</entry>   <!--用户可以在这里配置远程扩展停止词字典-->  <entry key="remote_ext_stopwords">http://xxx.com/xxx.dic</entry></properties>
复制代码


用户头像

Rubble

关注

还未添加个人签名 2021.06.01 加入

还未添加个人简介

评论

发布
暂无评论
elasticsearch7.13.4 ik中文分词器安装