Dan D Kim

Let's share stories

SQL vs NoSQL Databases

2021-05-03 Dan D. Kimsystem design

When it comes to databases, there are two main types we categorize them by: SQL and NoSQL databases.

Also called relational vs non-relational.

This post aims to provide a quick overview of their differences and general guidelines on when to use what.

SQL

SQL stands for Structured Query Language. SQL is the query language used to interact with relational databases. If you have seen queries that read something like SELECT * FROM items, you have seen SQL queries.

Relational databases store data in a tabular structure, where each row is an entity and each column is a data point of that entity. Each table adheres to a specific schema, meaning each row of data needs to conform to the schema.

If you’ve used spreadsheets before, there’s a good chance you are already familiar with the tabular data found in SQL databases.

Unlike NoSQL, there is only one variation of SQL databases - relational. Meaning MySQL, PostgreSQL, MariaDB, etc all conform to the concept of storing data in rows and columns.

NoSQL

NoSQL refers to “anything that is not SQL”. A NoSQL database is a non-relational database, such as document, graph, wide-column, and key-store.

One big difference with NoSQL databases is that they usually don’t enforce the data to conform to a specified schema. You could add user data with the user’s geolocation and viewing screen size to the database, and then add another user but this time with just their full name and nothing else. Your data isn’t forced into a consistent schema. This gives NoSQL databases a lot of flexibility in terms of data handling and makes them really suitable for certain scenarios.

Some people refer to this as schema vs schemaless but that’s a little misleading.

Schemas do exist in some NoSQL engines. It’s just implicit.

A better vocabulary here is that SQL uses schema-on-write while NoSQL uses schema-on-read.

Now, here are 4 popular NoSQL databases you should be aware of.

Document Databases

Data gets stored in document-format, similar to a JSON representation, which gets grouped into collections. Documents are schema-less and can have any kind of structure.

They are used for various general-purpose databases. MongoDB and CouchDB are examples.

For more, here is my post on what I think you should know about document databases for the system design interview.

Graph Databases

Data is stored either as a node (vertices) or lines (edges) of a graph. Great for representing relational data.

Ideal for analyzing and traversing relationships between connected nodes. Neo4J, Infinite Graph are examples.

For more, here is my post on graph databases.

Key-Value Stores

Data is stored in key-value pairs.

They are ideal for large amounts of data with simple lookup queries. Redis is a popular example.

Wide-Column Stores

Similar to relational but instead of “tables”, we have column families.

A column family contains multiple rows, and each row can contain its own set of columns.

As with other schema-less data models, you don’t know the columns of a row until runtime.

And rows don’t need to have the same number of columns.

They are ideal for large datasets. HBase and Cassandra are popular examples.

High-Level Difference

We want to look at querying, storage models, and scalability.

Querying

SQL and NoSQL use completely different query languages. The queries are just different, due to the difference in the storage models.

SQL is a declarative query language that is immensely powerful and useful. It does a great job of defining the constraints, conditions, and transformations of data.

NoSQL databases use different queries based on the specific NoSQL type. Also referred to as UnQL (Unstructured Query Language).

Storage Models

If you ever used a spreadsheet, you may already be familiar with SQL data models. They are just tables where each row represents an entity. Columns are data points about an entity.

Example table in SQL format:

Name Age City School
Dan 28 Calgary Waterloo
Bob 22 Toronto Toronto
Cathy 25 Montreal McGill

As for NoSQL, the data model differs by type. Here is what a document schema may look like:

{
  "name": <string>,
  "age": <integer>,
  "followers": <Schema Document> | [<Schema Document>, ...],
  "createdDate": <integer>,
  "isPremium": <boolean>
}

Scalability

SQL databases are inherently better at vertical scaling, while NoSQL databases are a better fit for horizontal scaling.

Vertical scaling is where you add more resources (CPU, RAM) to a machine. Horizontal scaling is where you add more machines and distribute your load to multiple machines.

However, it is important to note that with the recent advancements in database engines of both types, this distinction is becoming thinner and thinner.

AWS’s DynamoDB provides ACID-compliance.

MySQL provides the flexibility and high availability of NoSQL databases.

MongoDB provides multi-document ACID transactions.

On top of that, NoSQL databases are adapting to be suitable for vertical scaling if needed, and some SQL databases have made special advancements for distributed processing.

Does it scale? is no longer the main question people consider when choosing between SQL and NoSQL.

Scalability today will come from proper effective design and implementations. Each database will have its own set of guidelines on how to best configure your database to scale properly. If you are using a cloud provider like AWS, this might already be taken care of.

When to use what

These are general guidelines. Depending on the complexity of the app, developers will need to do further research to architect a robust design.

Need ACID Compliance?

If you need ACID compliance, SQL is the way to go.

ACID properties ensure that all database transactions remain accurate and consistent, and support the recovery from failures that might occur during processing operations. Source.

SQL databases are ACID compliant (Atomicity, Consistency, Isolation, Durability).

A popular example of using SQL is in financial transactions - each transaction is crucial and all datastores need to be consistently synchronized. All datastores are guaranteed to be reading the latest data with ACID-compliant databases.

Is your data structured and unchanging?

SQL is the way to go in this case.

If you know what the data will exactly look like, and that it won’t change much in the future, you can benefit from SQL’s normalization and other optimization features. These will often result in better performance and more efficient use of resources.

Expect large datasets with little to no structure?

If you need to store large amounts of unstructured data, NoSQL has your back.

NoSQL generally doesn’t set any restrictions or constraints on your data model, which can come in handy when you are not exactly sure how your application data will evolve.

Depending on the use case, you may find a very good fit for one of the various NoSQL databases.

Want distributed computing?

Traditionally, NoSQL would be favored here.

I say traditionally because nowadays, we all use AWS, GCP, and Azure.

Tech giants have come up with products like AWS Aurora and AWS RDS which are relational databases but have powerful distributed computing features.

With AWS, it’s almost like you don’t have to worry about this whole topic.

But if you are implementing some on-site solution for a client without relying on AWS or the like, then NoSQL would be your better bet.

NoSQL databases are just inherently better suited for distributed computing.

Also, you may want to double-check if it’s cheaper to go with a NoSQL product. Say, could AWS DocumentDB be more efficient and cheaper for your application than AWS RDS?

This is where the developer needs to understand their application’s needs.

How will data be read? How will it be written? How frequently? Does it differ by feature? What will peak usage be like? What about average usage? What bandwidth is needed?

These are just a few of the many questions that need to be asked when picking out the right tool.


That’s all. Happy studying!