写点什么

Elasticsearch mapping 复杂数据类型

用户头像
escray
关注
发布于: 2021 年 02 月 17 日
Elasticsearch mapping 复杂数据类型

Elasticsearch mapping 复杂数据类型:Arrays,null-value 和 object。文字内容来自 B 站中华石杉 Elasticsearch 顶尖高手系列课程核心知识篇,英文内容来自官方文档。

multi-value field


{ "tags": [ "tag1", "tag2" ]}


建立索引时与 string 是一样的,数据类型不能混


Arrays


In Elasticsearch, there is no dedicated array data type. Any field can contain zero or more values by default, however, all values in the array must be of the same data type.


an array of arrays: [1, [2, 3]] which is the equivalent of [1, 2, 3]


Arrays of objects do not work... you cannot query each object independently of the other objects in the array. If you need to ... you should use the nested data type.


When adding a field dynamically, the first value in the array determines the field type.


PUT my-index-000012
PUT my-index-000012/_doc/1{ "message": "some array in this document...", // The tags field is dynamically added as a string field "tags": ["elasticsearch", "wow"], // The lists field is dynamically added as an object field "lists": [ { "name": "prog_list", "description": "programming list" }, { "name": "cool_list", "description": "cool stuff list" } ]}
# The second document contains no arrays, but can be indexed into the same fieldsPUT my-index-000012/_doc/2{ "message": "no arrays in this document...", "tags": "elasticsearch", "lists": { "name": "prog_list", "description": "programming list" }}
GET my-index-000012/_search{ "query": { "match": { // The query looks for elasticsearch in the tags field, and matches both document "tags": "elasticsearch" } }}
复制代码


Multi-value fields and the inverted index


The fact that all field types support multi-value fields out of the box is a consequence of the origins of Lucene. Lucene was designed to be a full text search engine. In order to be able to search for individual words within a big block text, Lucene tokenizes the text into individual terms, and adds each term to the inverted index separately.


This means that even a simple text field must be able to support multiple values by default.

empty field


null,[],[null]


null_value


A null value cannot be indexed or searched. When a field is set to null, (or an empty array [] or an array of null values [null]) it treated as though that field has no values.


The null_value parameter allows you to replace explicit null values with the specified value so that it can be indexed and searched.


PUT my-index-000013{  "mappings": {    "properties": {      "status_code": {        "type": "keyword",        // Replace explicit null values with the term NULL        "null_value": "NULL"      }    }  }}
PUT my-index-000013/_doc/1{ "status_code": null}
PUT my-index-000013/_doc/2{ // An empty array does not contain an explicit null, and so won't be replaced with the null_value "status_code": []}
GET my-index-000013/_search{ "query": { "term": { // A query for NULL return document 1, but not document 2. "status_code": "NULL" } }}
复制代码


The null_value needs to be the same data type as the field.


The null_value only influences how data is indexed, it doesn't modify the _source document.

object field type


JSON documents are hierarchical in nature: the document may contain inner objects which, in turn, may contain inner objects themselves:


# the outer document is also a JSON objectPUT my-index-000010/_doc/2{  "region": "US",  // It contains an inner object called manager  "manager": {    "age": 30,    // Which in turn contains an inner object called name    "name": {      "first": "John",      "last": "Smith"    }  }}
复制代码


An explicit mapping for the above document could look like this:


PUT my-index-000011{  "mappings": {    // Properties in the top-level mappings definition    "properties": {      "region": {        "type": "keyword"      },      // The manager field is an inner object field      "manager": {        "properties": {          "age": { "type": "integer" },          // The manager.name field is an inner object field within the manager field          "name": {            "properties": {              "first": { "type": "text" },              "last": { "type": "text" }            }          }        }      }    }  }}
复制代码


来自中华石杉 Elasticsearch 顶尖高手系列课程核心知识篇的例子:


PUT employee/_doc/1{  "address": {    "country": "china",    "province": "guangdong",    "city": "guangzhou"  },  "name": "jack",  "age": 27,  "join_date": "2017-01-01"}
GET employee/_mapping
{ "employee" : { "mappings" : { "properties" : { "address" : { "properties" : { "city" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "country" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "province" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } } } }, "age" : { "type" : "long" }, "join_date" : { "type" : "date" }, "name" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } } } } }}
复制代码



address: object 类型


{  "address": {    "country": "china",    "province": "guangdong",    "city": "guangzhou"  },  "name": "jack",  "age": 27,  "join_date": "2017-01-01"}
复制代码


在 Elasticsearch 的存储变成如下格式:


{    "name":              [jack],    "age":               [27],    "join_date":         [2017-01-01],    "address.country":   [china],    "address.province":  [guangdong],    "address.city":      [guangzhou]}
复制代码


复杂数据类型


{    "authors": [        { "age": 26, "name": "Jack White"},        { "age": 55, "name": "Tom Jones"},        { "age": 39, "name": "Kitty Smith"}    ]}
复制代码


在 Elasticsearch 的底层,会从横向行式存储变成列式


{    "authors.age":    [26, 55, 39],    "authors.name":   [jack, white, tom, jones, kitty, smith]}
复制代码


这篇学习笔记中,Arrays 和 Object 可以归类为 Data Field Types,而 null-value 其实是属于 mapping parameters。其实还有 nested field type 值得学习一下,不过这一篇已经太长了,等以后有机会再学习。

发布于: 2021 年 02 月 17 日阅读数: 12
用户头像

escray

关注

Let's Go 2017.11.19 加入

在学 Elasticsearch 的项目经理

评论

发布
暂无评论
Elasticsearch mapping 复杂数据类型