What is a wide-column datastore?
A wide-column store (or extensible record stores) is a type of NoSQL database. It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table. A wide-column store can be interpreted as a two-dimensional key–value store.
I love starting simple and gradually easing into the details. Here is an explanation from Szymon Warda that I really like.
Here is how data is represented in a key-value database:
Here is the same idea in a wide-column database:
We can look at wide-column stores as a 2-dimensional key-value store, where the first key is used as a row identifier and the second is used as a column identifier.
|Key-value store||Wide-column store|
|key = rowID, value = data||key = (rowID, columnID), value = data|
That was my ELI5 of what a wide-column database is. Now let’s see more closely at the data model.
They say a picture is a 1000 words. Here are my 1000 words. It shows the smallest components and how they build up to the full row-based data structure.
Multiple rows will look something like this:
Image credit: Akshay Pore
Data is stored in column cells, which gets grouped into column families. A group of families is linked to a row key. Each column family typically contains columns that are used together, so the columns are stored as a contiguous block on disk, enhancing performance.
You could see that a wide-column store is like a mix between a relational database and a document store. It still uses rows and columns like that of a relational database. And the column formats can be different for each row in the same table. This combines the regimented tabular structure of the relational model and the flexible data schema of the document model.
Why did we come up with wide-column databases? As in, why couldn’t we just stick with key-value and relational databases?
Back in the ages, we had the following pain points:
- relational model lacked flexibility due to its schema and structure, resulting in a lack of data compression.
- key-value store had a lack of structure that prevented partial write queries i.e. update column value, add column.
- key-value stores also cannot filter data by value
Wide-column databases have been in development for a while. But so have other databases, and now the distinctive pain points have changed.
Today, a lot of the pain points mentioned above are… debatable
- relational model still lacks flexibility but data compression warrants a whole other discussion.
- tools like Redis offer key-value stores with structure and partial updates. Is it performant though? Is it good enough for a petabyte-and-beyond scale?
- tools like Redis offer filtering by value using secondary indexes. Is it great? Maybe, maybe not.
Pros and Cons
Here are some key benefits of a wide-column datastore
- MASSIVE scalability. Petabyte-scale data. Even beyond petabytes.
- consistent, performant performance on HEAVY write loads
Here are the drawbacks
- inefficient updates
- inefficient joins / aggregations
Situations where it’s ideal
- WRITEs far exceed the READs
- data is rarely updated
- read access is known by a primary key
- there is no need for joins or aggregates
- financial data (time-series)
- temperature monitoring (time-series)
- log data
- analytical data
- IoT sensor data
That’s all. Hope you learned something.