November 1, 2019

Sorting Ordered Data in a Key-Value Store

Most key-value databases, including KVdb, store keys in a lexicographic order. What does this mean?

Imagine you save some values to a bucket:

  1. Set peach
  2. Set apple
  3. Set kiwi

What order will the keys be returned when enumerating the bucket?

apple, kiwi, peach as expected. It’s basically dictionary order.

What about numbers?

Aha! Good question:

  1. Set 10
  2. Set 2
  3. Set 1

How are the keys returned?

1, 10, 2

Wait, what?

You’ve probably encountered this when using a sort function that uses lexicographic sort order:

$ node
> [10, 2, 1].sort()
[ 1, 10, 2 ]

It turns out that with numbers, this is usually never what you want. To understand this, we have to grok lexicographical sort order. In a dictionary, the word apple” appears before application” because the letter e” comes before the letter i” in the alphabet. Thus the 5th letter is the first significant difference in the two words.

If we apply this to the numbers 1” and 10”, both have 1” in common, but the first 1” is shorter, so it’s sorted first. Take 10” and 2”: 2” is shorter, but it’s lexicographic order is after the first digit of 10”. Yes, it means we are ordering numbers as if they were strings. How do we ensure that numbers are ordered naturally?

Let’s zero-pad them:

> [‘10’, ‘02’, ‘01’].sort()
[ ‘01’, ‘02’, ‘10’ ]

Naturally, this also works for date and time values that are encoded in a representation like ISO 8601, whose lexicographical and natural sort order are the same, due to the fixed-width format, assuming the year (the YYYY) doesn’t exceed the maximum 4-digit year 9999 😇

For negative numbers and decimal/floating point numbers, it’s better to encode them into a big endian binary format so they maintain the expected sort order.

Now that we understand key enumeration order, we can design our keys in such a way that makes data querying efficient (i.e., reduces the number of queries). Let’s imagine we’re building a leaderboard for a game and want to keep track of players’ scores. A first attempt could be:

player:1:score => 1000

player:2:score => 1350

But you’ll quickly realize if you have a million players, building a leaderboard requires iterating through every player’s score and then picking the top scores. If we apply what we just learned about lexicographic sorting, we can design our keys a bit differently, by setting an upper bound on scores. For the sake of this example, let’s make it 999,999,999 and encode scores using 9 digits:

score:000001000 => {“player_id”: 1, score: 1000}

score:000001350 => {“player_id”: 2, score: 1350}

To fetch the top 50 scores, we’ll enumerate keys in reverse order, so the highest lexicographically sorted key is returned first:

const leaders = await bucket.list({
  prefix: ‘score:’,
  reverse: true,
  limit: 50,
  values: true})

for(const [key, value] of leaders) {
  console.log(`player: ${value.player_id} score: ${player.score}`)
}

OK, but what if two players are tied for the same score? Well, that presents a problem, since only one player can have a unique score at a time. Let’s suffix our keys with the player id to maintain uniqueness:

score:000001350:1 => {“player_id”: 1, score: 1350}

score:000001350:2 => {“player_id”: 2, score: 1350}

Now, you can have two players with the same score and show that in the leaderboard. Fair and square!

Thus you can see how taking advantage of lexicographical ordering of keys makes certain kinds of queries more natural. Typically, you might sort results in your application, but in many cases, you can push this task to the database and have it deal with ordering so your application just consumes the results.


The Web Storage API provides a mechanism for developers to persist key-value data associated with the current page’s origin. It’s a significant improvement to using client-side cookies to persist bits of data, typically allowing storage of up to 5 MB of data. The API is simple:

localStorage.setItem(‘color’, ‘blue’)
localStorage.getItem(‘color’) // ‘blue’
localStorage.removeItem(‘color’)

A few examples of where you could use localStorage:

  • Store UI settings (filters, state of the UI) that don’t strictly need to be persisted on the server
  • Data store for offline apps that don’t need a backend
  • Share state across different IFrames on the same page
  • Cache data for offline use or improve performance

In some cases, you may not want to create an API or persistence layer on the server, but still want to save the contents of localStorage into the cloud. Imagine a web app that has a search UI that lets the user select some filters. From a UX point of view, you may want to persist the state of the filter settings the user has chosen so they don’t need to reselect them every time. This could very well be stored in localStorage only and not complicate server-side logic. But what if the user opens the web app from another device? It would improve the user experience if the filter settings were also available there.

In this example, essentially, what we’d like to do is sync localStorage to the cloud so it can be fetched anytime the frontend needs it. Fortunately, the Web Storage API is simple enough that we can create a wrapper object that is API-compatible with it, so it’s easy to swap out the implementation, and also persist it into an online key-value storage service.

However, one downside of the Web Storage API is that it only offers a synchronous API. Using a remote storage service requires an asynchronous network round-trip, which makes designing a fully-compatible API difficult due to restrictions on synchronous network I/O, particularly in browsers. For this reason, instead of hacking together a synchronous API emulation, let’s implement an asynchronous API that is async-aware and Promises-friendly.

Let’s use our own kvdb.io-commonjs module to build it!

class KVdbWebStorage {
  constructor(bucket) {
    this.bucket = bucket
  }

  // When passed a number n, this method will return the name of the nth key in the storage.
  async key(index) {
    return this.bucket.list({skip: index, limit: 1})
      .then(keyName => keyName)
      .catch(err => null)
  }

  // When passed a key name, will return that key's value.
  async getItem(keyName) {
    return this.bucket.get(keyName)
      .then(keyValue => keyValue)
      .catch(err => null)
  }

  // When passed a key name and value, will add that key to the storage, or update that key's value if it already exists.
  async setItem(keyName, keyValue) {
    return this.bucket.set(keyName, keyValue)
  }

  // When passed a key name, will remove that key from the storage.
  async removeItem(keyName) {
    return this.bucket.delete(keyName)
  }

  // Returns an integer representing the number of data items stored in the Storage object.
  async length() {
    return this.bucket.list()
      .then(keys => keys.length)
      .catch(err => -1)
  }

  // When invoked, will empty all keys out of the storage.
  async clear() {
    const keys = await this.bucket.list()
    const waiting = keys.map(key => this.bucket.delete(key))
    return Promise.all(waiting)
  }
}

Using it is as easy as with localStorage, just instantiate it with your KVdb bucket and API key:

const bucket = KVdb.bucket(‘MY_BUCKET’, ‘MY_SECRET’)
const kvdbStorage = new KVdbWebStorage(bucket)

kvdbStorage.setItem(‘foo’, ‘bar’)
  .then(() => kvdbStorage.getItem(‘foo’))
  .then(value => console.log(‘get: ‘, value))
  .catch(err => console.error(‘error: ‘, err))

While it’s possible to treat localStorage as an object and directly set keys on it, it’s recommended to use the setter and getter methods instead, to allow for asynchronous storage backends such as the one we built above.

In fact, the popular localForage library builds on this approach and exposes a localStorage-like interface with support for different drivers, or storage backends such as localStorage, IndexedDB, and custom ones. The latest version of the kvdb.io-commonjs module has built-in support for localForage, making it easy to use KVdb as the backend:

<script src="localforage.js"></script>
<script src="https://unpkg.com/kvdb.io@v1.0"></script>
<script>
KVdb.installLocalForageDriver(localforage)

localforage.config({
  bucket: KVdb.bucket('MY_BUCKET_ID', 'MY_ACCESS_TOKEN')
})

localforage.setDriver([KVdb.LOCALFORAGE_DRIVER])
  .then(() => localForage.setItem('foo', 'bar'))
  .then(() => localForage.getItem('foo'))
  .then(value => alert('value: ' + value))
</script>

In summary, we’ve seen how to implement a Web Storage API wrapper that is backed by KVdb and how minimally disruptive it is to maintain API compatibility with localStorage. And if you’re already using localStorage and want a vendor-neutral approach to persisting it into the cloud, check out localForage and its API.


October 11, 2019

Why You Should Ditch Your Database for a Key-Value DB

As developers, we like to build, more often than not, solutions to problems we think we have. When starting a new project, the first task is often gathering up the tools, libraries, and services you’ll need to deploy your eventual project and scale it to billions of users. While I take no issue with dreaming big, developers love the latest trendy frameworks and services. The nobody was ever fired for buying IBM adage is still valid today, albeit more aptly if all the other cool developers are doing it, it can’t possibly be wrong!”

Why is it so hard to resist the urge to build a new wheel?

As developers, our job is to build. We design a system inside our heads and the act of coding transforms our understanding of it into instructions. However, much of the discovery process is lost as thoughts turn into code and comments become stale. Take a look at any code you wrote six months ago and you’ll need time to figure out what was going on and what you were thinking when you wrote it—nevermind if it was somebody else’s code.

So then how should we approach building applications if the natural inclination is to want to build large future-proof software projects? The answer lies in the motivation for building. If it’s to get the project in front of users, then it’s paramount to release it as quickly as possible in the form of a minimum viable product (MVP). This is true regardless whether it’s a chargeable product” or not. Even a UI-less library needs an MVP to get real-world use.

When you’re initially working on an app, build quickly, break well-established rules, and shamelessly reuse code and resources. The initial development stage is not the time to think about elegance. It is by its very nature unscalable—if it was well-architected, you have wasted valuable development time on ancillary aspects that don’t yet matter for apps without users.

For many apps, a database is vital, but I’m here to caution you that it doesn’t matter which one you choose; you’ll end up redesigning your data model if and when you scale. The best course of action is to use the simplest possible data store, like text files in the file system. Whether you’re writing a serverless app or not, integrating a key-value database is an easy choice to make.

By distilling your data model into a series of keys and values, you can better understand the relationships in your data design, and in the future, this will carry over to help scale the app, if it ever gets to that; key-value stores tend to be highly scalable beasts.

Of course, one could argue that you should use a relational model from the get-go. While there are complex apps that benefit from one, not every app needs a relational database. A simple key-value store will often suffice. Remember, it’s easier to over-engineer than under-engineer.

So what operations does a key-value database offer?

  • Get a value by its key
  • Set a key and its value
  • Delete a key and its associated value
  • Enumerate keys, starting at some arbitrary position

It doesn’t seem like much, but with these primitives, you can implement any conceivable data store. Key enumeration is what makes everything work: accessing the right portions of the key space using starting and ending keys to navigate through it.

Most apps will need to keep track of registered users and store information like email address, password hash, and other metadata. With a relational database, you’d typically model this with a users table and define columns for each attribute.

Let’s model it instead in the key-value domain:

users:<id> = {
    “email”: “user@example.com”,
    “created_at”: “2019-09-25T02:03:04Z”,
    ...
}

While it may be easy to work with a JSON value, the data isn’t denormalized and makes querying difficult. A more optimzed decomposition for a key-value model would add an additional key for email:

users:<id> = {
    “email”: “user@example.com”,
    “created_at: “2019-09-25T02:03:04Z”,
    ...
}
users:email:<email> = <user id>

Now, when we implement our login system, we can:

  1. Look up the user by email to get the user id
  2. Look up the user by id to get the actual user details

What we’ve essentially done is implement an index like in a traditional database, but what if we go a step further? Let’s duplicate the data so we only have to make one query instead of two:

users:<id> = {
    “email”: “user@example.com”,
    “created_at: “2019-09-25T02:03:04Z”,
    ...
}
users:email:<email> = {
    “email”: “user@example.com”,
    “created_at: “2019-09-25T02:03:04Z”,
    ...
}

A user lookup then just involves a single retrieval of users:email:<email>. While the duplicate values in the database may seem off-putting and even wrong at first, it turns out that key-value stores are exactly optimized for this kind of denormalization!

“There are only two hard things in computer science: cache invalidation and naming things.”

Phil Karlton

With this model, just think how will I query the data? and design your key space schema accordingly. There is a subtlety to naming your keys in a way that makes range queries possible. Say we wanted to query users by the date they joined, we would add a new key-value pair:

users:date:<date>:<user id> = <user id>

Then it becomes trivial to perform a range query using the prefix users:data:<date>: where date is a normalized ISO 8601 date, yielding a natural sort order. (As in the previous examples, you could store the full user details instead of just the user id.)

As you can see, a key-values are a more free-form approach to modeling your data, but let you focus on only the things that matter and are a great alternative to SQL databases. Often times, we end up spending countless hours setting up databases, schemas, and dealing with ORM systems, where we could simply treat data as keys and values. Of course, key-value databases are not a panacea, but you may be surprised how they can be a great fit for most apps.

In an upcoming blog post, we’ll cover how to practically perform these kinds of range queries with a more complete sample app, so stay tuned!

If you’d like to try building an app using only a key-value database and see how simple it is to apply these principles, I encourage you to give KVdb.io a try!



Like what you see? Give KVdb a spin.