Dealing with large data sets and big data can often call for new techniques to optimize performance.  When you have a few million documents or rows, you can do things differently than when you have to deal with tens of millions of rows or documents in either a standard SQL database or a NoSQL database like Couchbase.  One thing we noticed at CKH is that in very large document stores, the OFFSET and LIMIT style pagination on both N1QL queries and Views in Couchbase was getting slow as the OFFSET grew larger.  This becomes a problem when you want to build efficient, consistent paging into your system.  Not to worry, Couchbase provides a solution.

The concept of paging through a result set is not a difficult one.  Using OFFSET and LIMIT is the easiest way to write database queries for pagination. Together, OFFSET and LIMIT, make the pagination clause of the SELECT statement.  Let's take a look at how that would look in N1QL.

This query would return the first 10 hotels in the “travel-sample” bucket ordered by country and city.  To get the next 10, we simply increment the previous OFFSET by the size of our page (10).

The issue comes in at scale when that OFFSET value becomes very large.

This is because as the OFFSET increases, the indexer must get all the previous documents as well in order to know where to start and end its result set.   So, how do we solve this issue?  The answer is to tell the indexer where to start for each page by a unique key and instead of using an OFFSET at all, just pass in the last key of the previous page.  Let's take a look at how that can be done.

First, we will need an index that includes our unique key.  In Couchbase, the document ID is unique so we can use it as our key.

Now that we have an index to cover our query, we can make it again to get our pages, but using the unique ID as the starting place of each page.  For page one we can start our ID as 0.

And the next page we can now just start at the end of the previous page ID

Both of these queries will return with the same consistent speed. The same will hold true no matter the number of documents we have to page through.

This same technique can also be used for View queries.  View queries require a unique key as well as their value in the key/value pair result set, so you can use the key to paginate through the results setting the startKey on the View Query to the previous page.

For more information about the why and the how of this technique, have a look at this excellent article written by Keshav Murthy.

We have found this technique invaluable in large data sets when we need to perform operations across the whole dataset, or need to page through higher pages in a result set.  Hopefully, you can use this to improve your performance as well.  If you are looking for someone to performance tune your applications or want help optimizing Couchbase, let us know, we would be glad to help.