写点什么

Elasticsearch Segments Merging 磁盘文件合并

用户头像
escray
关注
发布于: 2021 年 03 月 15 日
Elasticsearch Segments Merging 磁盘文件合并

Elasticsearch Segments Merging 磁盘文件合并(segment merge、optimize),内容来自 B 站中华石杉 Elasticsearch 顶尖高手系列课程核心知识篇,英文内容来自 Elasticsearch: The Definitive Guide [2.x]


每秒一个 segment file,文件过多,而且每次 search 都要搜索所有的 segment,很耗时


磁盘文件合并


默认会在后台执行 segment merge 操作,在 merge 的时候,被标记为 deleted 的 document 也会被彻底物理删除


每次 merge 操作的执行流程


  1. 选择一些有相似大小的 segment,merge 成一个大的 segment

  2. 将新的 segment flush 到磁盘上去

  3. 写一个新的 commit point,包括了新的 segment,并且排除旧的那些 segment

  4. 将新的 segment 打开供搜索

  5. 将旧的 segment 删除


POST /my_index/_optimize?max_num_segments=1,尽量不要手动执行,让它自动默认执行就可以了

Segment Merging

https://www.elastic.co/guide/en/elasticsearch/guide/2.x/merge-process.html


Having too many segments is a problem. Each segment consumes file handles, memory, and CPU cycles... every search request has to check every segment in turn.


Elasticsearch merge segments in the background. small, big, bigger...


This is the moment when those old deleted documents are purged from the filesystem. Deleted documents ( or old versions of updated documents) are not copied over to the new bigger segment.


Two committed segments and one uncommitted segment in the process of being merged into a bigger segment


  1. While indexing, the refresh process creates new segments and opens them for search.

  2. The merge process selects a few segments of similar size and merges them into a new bigger segment in the background. This does not interrupt indexing and searching.

  3. Once merging has finished, the old segments are deleted

  4. The new segments is flushed to disk

  5. A new commit point is written that includes the new segment and excludes the old, smaller segments.

  6. The new segment is opened for search.

  7. The old segments are deleted.


Once merging has finished, the old segments are deleted


The merging of big segments can use a lot of I/O and CPU, which can hurt search performance if left unchecked. By default, Elasticsearch throttles the merge process so that search still has enough resources available to perform well.

optimize API


the forced merge API. It forces a shard to be merged down to the number of segments specified in the max_num_segments parameter.


The optimize API should not be used on a dynamic index - an index that is being actively updated.


loggings, where logs are stored in an index per day, week =, or month. Older indices are essentially read-only; they are unlikely to change.


In logging case, it can be useful to optimize the shards of an old index down to a single segment each; it will use fewer resources and searches will be quicker.


// Merges each shard in the index down to a single segmentPOST /logstash-2014-10/_optimize?max_num_segments=1
复制代码


Be aware that merges triggered by the optimize API are not throttled at all.


发布于: 2021 年 03 月 15 日阅读数: 22
用户头像

escray

关注

Let's Go 2017.11.19 加入

在学 Elasticsearch 的项目经理

评论

发布
暂无评论
Elasticsearch Segments Merging 磁盘文件合并