Retrieve Large Dataset in Elasticsearch

bebound@gmail.com (KK) — Sun, 21 Jun 2020 20:33:00 +0800

It’s easy to get small dataset from Elasticsearch by using size and from. However, it’s impossible to retrieve large dataset in the same way.

Deep Paging Problem

As we know it, Elasticsearch data is organised into indexes, which is a logical namespace, and the real data is stored into physical shards. Each shard is an instance of Lucene. There are two kind of shards, primary shards and replica shards. Replica shards is the copy of primary shards in case nodes or shards fail. By distributing documents in an index across multiple shards, and distributing those shards across multiple nodes, Elasticsearch can ensure redundancy and scalability. By default, Elasticsearch create 5 primary shards and one replica shard for each primary shards.

Elasticsearch on KK's Blog (fromkk)

Retrieve Large Dataset in Elasticsearch

Deep Paging Problem