一只后端汪

Posted 2018-05-29

分片不宜过大，在故障恢复的时候会更大的影响集群。

The shard is the unit at which Elasticsearch distributes data around the cluster. The speed at which Elasticsearch can move shards around when rebalancing data, e.g. following a failure, will depend on the size and number of shards as well as network and disk performance.

TIP: Avoid having very large shards as this can negatively affect the cluster’s ability to recover from failure. There is no fixed limit on how large shards can be, but a shard size of 50GB is often quoted as a limit that has been seen to work for a variety of use-cases.

ES 只新增数据，不更新数据。更新也只是把旧的数据标记删除，再新增新的数据。被删除的数据在段合并前，是会一直占用资源的。有一种思路是按时间区间将数据分成不同的索引存储，比如 2017 年一份索引、2018 年一份索引。这样索引会更小，每次进入下一个年份都有机会调整分片数量和索引结果，冷数据也可以按年进入归档状态，不会影响在热数据上的业务服务。