分片不宜过大,在故障恢复的时候会更大的影响集群。
The shard is the unit at which Elasticsearch distributes data around the cluster. The speed at which Elasticsearch can move shards around when rebalancing data, e.g. following a failure, will depend on the size and number of shards as well as network and disk performance.
TIP: Avoid having very large shards as this can negatively affect the cluster’s ability to recover from failure. There is no fixed limit on how large shards can be, but a shard size of 50GB is often quoted as a limit that has been seen to work for a variety of use-cases.
ES 只新增数据,不更新数据。更新也只是把旧的数据标记删除,再新增新的数据。被删除的数据在段合并前,是会一直占用资源的。有一种思路是按时间区间将数据分成不同的索引存储,比如 2017 年一份索引、2018 年一份索引。这样索引会更小,每次进入下一个年份都有机会调整分片数量和索引结果,冷数据也可以按年进入归档状态,不会影响在热数据上的业务服务。