Elasticsearch Index Types and Mappings
Elasticsearch Index Types and Mappings,内容来自 B 站中华石杉 Elasticsearch 顶尖高手系列课程核心知识篇,英文内容来自 Elasticsearch: The Definitive Guide [2.x],内容似乎有些过时,现在的 Elasticsearch 已经不再使用文中描述的 type,但是我觉得可以了解一下历史,底层原理应该也没有太大的变化,欢迎讨论、拍砖
Typeless
#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/doc/{id}, /{index}/doc, or /{index}/_create/{id}).
从 Elasticsearch 7.0 开始 已经不再使用文中描述的索引类型 type 来区分在同一个索引下面不同类型的数据,而是统一使用 _doc,回到了与 Lucene 一致的状态,官方说法叫做 Typeless,官方博客上有一篇文章:Goodbye, types. Hello, typeless.
大概是从 5.0 版本开始,Elasticsearch 就在有计划的做这件事情
5.0 started enforcing that fields that share the same name across multiple types have compatible mappings
6.0 started preventing new indices from having more than one type and deprecated the _default_ mapping
7.0 deprecated APIs that accept types, introduced new typeless APIs, and removed support for the default mapping.
8.0 will remove APIs that accept types.
Types and Mappings
A type in Elasticsearch represents a class of similar documents. A type consists of a name and a mapping. The mapping, like a database schema, describes the fields or properties that documents of that type may have, the datatype of each field and how those fields should be indexed and stored by Lucene.
A document in Lucene consists of a simple list of field-value pairs. A field must have at least one value, but any field can contain multiple values. In Lucene, all values are just treated as opaque bytes.
When we index a document in Lucene, the values for each field are added to the inverted index for the associated field. Optionally, the original values may also be stored unchanged so that they can be retrieved later.
Because Lucene has no concept of document types, the type name of each document is stored with the document in a metadata field called _type.
Lucene also has no concept of mappings.
... each Lucene index contains a single, flat schema for all fields. A particular field is either mapped as a string, or a number, but not both.
type,是一个 index 中用来区分类似的数据的,类似的数据,但是可能有不同的 fields,而且有不同的属性来控制索引建立、分词器 field 的 value,在底层的 lucene 中建立索引的时候,全部是 opaque bytes 类型,不区分类型的
lucene 是没有 type 的概念的,在 document 中,实际上将 type 作为一个 document 的 field 来存储,即 _type,es 通过 _type 来进行 type 的过滤和筛选。
一个 index 中的多个 type,实际上是放在一起存储的,因此一个 index 下,不能有多个 type 重名,而类型或者其他设置不同的,因为那样是无法处理的。
在底层的存储是这样子的……
最佳实践,将类似结构的 type 放在一个 index 下,这些 type 应该有多个 field 是相同的
假如说,你将两个 type 的 field 完全不同,放在一个 index 下,那么就每条数据都至少有一半的 field 在底层的 Lucene 中是空值,会有严重的性能问题
Type are not as well suited for entirely differently types of data. If your two types have mutually exclusive sets of fields, that means half your index is going to contain "empty" values (the fields will be sparse), which will eventually cause performance problems.
Good: Kitchen and lawn-care types inside the products index, because the two types are essentially the same schema.
Bad: products and logs types inside the data index, because the two types are mutually exclusive. Separate these into their own indices.
关于 Elasticsearch Index 中的 type 仅作了解即可,现在的 Elasticsearch 7.0 已经不再支持 type 了,而是统一成 _doc 了。
版权声明: 本文为 InfoQ 作者【escray】的原创文章。
原文链接:【http://xie.infoq.cn/article/abf52d31f97dd9d109b6e3ecb】。
本文遵守【CC-BY 4.0】协议,转载请保留原文出处及本版权声明。
评论