ElasticSearch 自定义相似度插件 - 根据命中数排序
作者:alexgaoyh
- 2023-04-13 河南
本文字数:2692 字
阅读完需:约 9 分钟
自定义相似度算法(只考虑词频/命中数)
在使用 Elasticsearch 的时候,针对排序结果,有些时候只关注对应的词出现的次数,相当于只考虑词频,这个时候就可以使用当前的插件。 当前插件继承了 TFIDFSimilarity 类, TfSimilarity 只考虑了词频,并将其注册到插件中。 实现结果如下,前两个代码段落分别是 mapping setting 配置文件,第三个代码段是请求,第四个代码段是结果。 详细查看第四个代码段落的 _score 得分,发现 _score 的值等于请求参数'效果'在文本中出现的次数,至此证明当前插件有效。
"studentIntro": { "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_smart", "similarity": "tf_similarity"},
复制代码
{ "similarity": { "tf_similarity": { "type": "tf_similarity" } }}
复制代码
{ "size": 100, "query": { "bool": { "should": [ { "match": { "studentIntro": { "query": "效果" } } } ] } }}
复制代码
{ "took": 2, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 6, "relation": "eq" }, "max_score": 2.0, "hits": [ { "_index": "student", "_type": "_doc", "_id": "26", "_score": 2.0, "_source": { "_class": "com.pap.es.domain.Student", "id": "26", "studentName": "洋柿子", "studentNo": "GAO202209", "studentAge": 11, "studentBirth": "2023-04-13 03:45:02", "studentIntro": "我是效果,我要看看分词的效果", "studentTags": "D", "gradePoint": 2.35 } }, { "_index": "student", "_type": "_doc", "_id": "4", "_score": 1.0, "_source": { "_class": "com.pap.es.domain.Student", "id": "4", "studentName": "洋柿子", "studentNo": "GAO202209", "studentAge": 11, "studentBirth": "2023-04-13 03:44:58", "studentIntro": "我是洋柿子,我要看看分词的效果", "studentTags": "D", "gradePoint": 2.35 } }, { "_index": "student", "_type": "_doc", "_id": "3", "_score": 1.0, "_source": { "_class": "com.pap.es.domain.Student", "id": "3", "studentName": "圣女果", "studentNo": "123GAO56", "studentAge": 13, "studentBirth": "2023-04-13 03:44:58", "studentIntro": "我是圣女果,我要看看分词的效果", "studentTags": "C", "gradePoint": 2.25 } }, { "_index": "student", "_type": "_doc", "_id": "1", "_score": 1.0, "_source": { "_class": "com.pap.es.domain.Student", "id": "1", "studentName": "西红柿", "studentNo": "YI123GAO", "studentAge": 12, "studentBirth": "2023-04-13 03:44:57", "studentIntro": "我是西红柿,我要看看分词的效果", "studentTags": "A", "gradePoint": 2.23 } }, { "_index": "student", "_type": "_doc", "_id": "2", "_score": 1.0, "_source": { "_class": "com.pap.es.domain.Student", "id": "2", "studentName": "番茄", "studentNo": "GAO456YH", "studentAge": 12, "studentBirth": "2023-04-13 03:44:58", "studentIntro": "我是番茄,我要看看分词的效果", "studentTags": "B", "gradePoint": 2.33 } }, { "_index": "student", "_type": "_doc", "_id": "6", "_score": 1.0, "_source": { "_class": "com.pap.es.domain.Student", "id": "6", "studentName": "效果", "studentNo": "20191202", "studentAge": 14, "studentBirth": "2023-04-13 03:44:59", "studentIntro": "效果", "studentTags": "A,B,C", "gradePoint": 2.13 } } ] }}
复制代码
使用方法
clone 当前代码,修改 pom.xml 文件中对应的 elasticsearch 版本;
mvn clean package 打包;
防止到 elasticsearch 目录的 plugins 文件夹下,并重启 ES;
相关链接
划线
评论
复制
发布于: 刚刚阅读数: 5
版权声明: 本文为 InfoQ 作者【alexgaoyh】的原创文章。
原文链接:【http://xie.infoq.cn/article/8011a2ba4d1fd7bde85a21b09】。
本文遵守【CC-BY 4.0】协议,转载请保留原文出处及本版权声明。
alexgaoyh
关注
DevOps 2013-12-08 加入
https://gitee.com/alexgaoyh










评论