<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Elasticsearch on KK's Blog (fromkk)</title><link>https://fromkk.com/tags/elasticsearch/</link><description>Recent content in Elasticsearch on KK's Blog (fromkk)</description><generator>Hugo</generator><language>en</language><managingEditor>bebound@gmail.com (KK)</managingEditor><webMaster>bebound@gmail.com (KK)</webMaster><lastBuildDate>Sun, 10 Aug 2025 18:44:06 +0800</lastBuildDate><atom:link href="https://fromkk.com/tags/elasticsearch/index.xml" rel="self" type="application/rss+xml"/><item><title>Retrieve Large Dataset in Elasticsearch</title><link>https://fromkk.com/posts/retrieve-large-dataset-in-elasticsearch/</link><pubDate>Sun, 21 Jun 2020 20:33:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/retrieve-large-dataset-in-elasticsearch/</guid><description>&lt;p&gt;It&amp;rsquo;s easy to get small dataset from Elasticsearch by using &lt;code&gt;size&lt;/code&gt; and &lt;code&gt;from&lt;/code&gt;. However, it&amp;rsquo;s impossible to retrieve large dataset in the same way.&lt;/p&gt;
&lt;h2 id="deep-paging-problem"&gt;Deep Paging Problem&lt;/h2&gt;
&lt;p&gt;As we know it, Elasticsearch data is organised into indexes, which is a logical namespace, and the real data is stored into physical shards. Each shard is an instance of Lucene. There are two kind of shards, primary shards and replica shards. Replica shards is the copy of primary shards in case nodes or shards fail. By distributing documents in an index across multiple shards, and distributing those shards across multiple nodes, Elasticsearch can ensure redundancy and scalability. By default, Elasticsearch create &lt;strong&gt;5&lt;/strong&gt; primary shards and one replica shard for each primary shards.&lt;/p&gt;</description></item></channel></rss>