Keying data in NoSQL

Auto Format Your Code with ESLint

March 9, 2021

Vue + TypeScript & Debuggers

April 6, 2021

Published by Curt Gratz at March 11, 2021

Tags

Keying in noSQL

Most NoSQL databases are also key-value pair style databases that store your blob of data, be it JSON or something else, based on a specific key. They are designed to be able to retrieve data based on that key very very fast. This makes them great solutions in all sorts of situations, provided you key your data correctly.

Let’s look at a few patterns you can use to key your data in a key/value style database. You can look at the following patterns and examples and decide which work best for your use cases.

Predictable Keys
Counter-ID
GUID/UUID
Lookup Pattern

A few things before we get started. One of the practices I like when generating keys using any of these patterns is to make sure the key has some entity identification of what it is representing. Even when using GUID/UUID style keys. In my examples you will see me use <entity>:: before each of the keys. The reason for this is it makes it easy to both search by key and to visually see and group like values together in my database.

Predictable Keys

Let’s start with the Predictable Key pattern. In this pattern you look for something that is commonly known about your data and easy to get at. You want something that is unique, but known is the basic premise here. Some examples might include email, usernames, sku, isbn.

So using the Predictable Key pattern for users one could key the data like

user::ender@dragon.com (using the email address)
user::enderdragon (using a username)

Now when the user logins as long as you have the known value (the username or email) you can quickly get at any data stored with this pattern.

Another example for products and categories might be

product::231-0321404314123 (using the SKU of the product)
category::animal-hammocks (using the slug of the category)

Any key you can come up with for the use case at hand that is easily known and commonly used makes a good place to use the Predictable Key pattern. Something to watch out for with this pattern is using data in keys that can change. In my category example, if you are changing the slug often, you have to either rekey your data or put a pointer in place to point to the new key if it changed. This is why the Predictable Key pattern is less than ideal in some use cases. This takes us to the next pattern to explore when keying your data.

Counter-ID

The Counter-ID pattern is designed to ensure unique keys by using ever increasing numbers. Infinity is a long ways away. This pattern is very similar to an IDENTITY column in a relational database. The idea with the Counter-ID pattern is that any new value is keyed with the next largest number. This means that creating new documents is a pair of operations in your application, increment and add.

Initialize one key as an Atomic Counter (on app start)
Increment counter and save new value
- id = bucket.counter(“blog::slug::comment_count”, {delta: 1})
Use the id as component of the key for the new document
- bucket.upsert(“blog::slug::c” + id, comment_data)

This is a particular helpful pattern for use cases when you also want to know a count quickly, like in my example it is easy to know the number of comments on a blog based on the value of the counter. It is also handy to quickly order documents based on insertion. A caveat in the Counter-ID pattern is that you need to to make sure that your count increment is atomic so that it is not affected by some other process and your data is overwritten. Also when you delete items if you use the counter value as a count it can throw you off.

GUID/UUID

This pattern for keying data uses Globally Unique Identifier or a Universally Unique Identifier to create a unique key for your values. Some advantages of using this pattern are your keys are unique across applications and you don’t need the pair of operations like in the counter-ID pattern to do inserts as GUIDs can be generated offline and still pretty much guaranteed unique. A GUID pattern key might look like

user::0b96d14a-5b18-481d-9dac-74bb2ee64112 (for a unique user document/value)

Some of the common reasons not to use this for every use case are

Larger space use
Can’t order by ID to get the insert order like in the counter-ID pattern
Can’t know the value of the key outside of the context of the key

Lookup

The lookup pattern is a bit of the best of both worlds by combining the Predictable Key and either the counter-ID or GUID/UUID patterns. The way this pattern works is that you can create several small key/value pairs that serve as pointers to the main value you are looking for.

Create simple documents that have referential data (Key) to primary documents
- Primary Document user::abc123-12345-4312
- Lookup Document user::iron@golem.com = user::abc123-12345-4312
Lookup documents aren’t JSON, they should be the key as a string so you skip the JSON parsing
- Requires two get operations, first the lookup doc, then the primary doc
  - key = bucket.get(“user::iron@golem.com”)
  - doc = bucket.get(key)

What is nice about this pattern is that you can build several Lookup key/value pairs to be able to lookup the same data quickly on many different use cases. There is obviously a cost to the double lookup and to creating the multiple documents, but this can be offset by the speed of lookups. So in a heavy read use case, this pattern is very appealing.

Choose the pattern that suits you

There are many more ways to key your data, but for me these 4 patterns I have found to be the most useful. In the end make sure to choose that pattern that best fits your use case. If you have different patterns you have used that you think are relevant, drop them in the comments section, I’d love to keep learning.

Keying data in NoSQL

Auto Format Your Code with ESLint

Vue + TypeScript & Debuggers

Curt Gratz

Related posts

Unleashing the Power of Sagas

Retrospective – A look back to move forward

“Stupid” answers matter

Leave a Reply Cancel reply