Dan D Kim

Let's share stories

SQL vs NoSQL Databases

2021-05-03 Dan D. Kimsystem design

When it comes to databases, there are two main types we categorize them by: SQL and NoSQL databases.

Also called relational vs non-relational.

This post aims to provide a quick overview of their differences and general guidelines on when to use what.

SQL

SQL stands for Structured Query Language. SQL is used to interact with relational databases. Relational databases store data in tables, where each row is an entity and each column is a data point of that entity.

If you’ve used spreadsheets before, there’s a good chance you are already familiar with SQL.

Unlike NoSQL, there is only one variation of SQL databases. Meaning MySQL, PostgreSQL, MariaDB, etc all conform to the concept of storing data in rows and columns.

NoSQL

NoSQL refers to non-relational databases, such as document, graph, wide-column, and key-store.

Before I brief you on them, here is one important point about NoSQL.

In NoSQL, the database usually doesn’t enforce the data to conform to a specified schema.

Here’s what I mean:

In SQL, if you want to write to the database, your data needs to conform to the specified schema. Otherwise, your query will fail.

In NoSQL, the data structure is not enforced. One data entry could have a different structure than another.

Some people refer to this as schema vs schemaless but that’s a little misleading.

Schemas do exist in some NoSQL engines. It’s just implicit.

A better vocabulary here is that SQL uses schema-on-write while NoSQL uses schema-on-read.

Now, here are 4 popular NoSQL databases you should be aware of.

Document Databases

Data gets stored in document-format, similar to a JSON representation, which are grouped into collections. Documents are schema-less and can have any kind of structure.

They are used for various general-purpose databases. MongoDB and CouchDB are examples.

Graph Databases

Data is stored either as a node (vertices) or lines (edges) of a graph. Great for representing relational data.

Ideal for analyzing and traversing relationships between connected nodes. Neo4J, Infinite Graph are examples.

Key-Value Stores

Data is stored in key-value pairs.

They are ideal for large amounts of data with simple lookup queries. Redis is a popular example.

Wide-Column Stores

Similar to relational but instead of “tables”, we have column families.

A column family contains multiple rows, and each row can contain its own set of columns.

As with other schema-less data models, you don’t know the columns of a row until runtime.

And rows don’t need to have the same number of columns.

They are ideal for large datasets. HBase and Cassandra are popular examples.

High-Level Difference

We want to look at querying, storage models, and scalability.

Querying

SQL and NoSQL use completely different query languages. The queries are just different, due to the difference in the storage models.

SQL is a declarative query language that is immensely powerful and useful. It does a great job of defining the constraints, conditions, and transformations of data.

NoSQL databases use different queries based on the specific NoSQL type. Also referred to as UnQL (Unstructured Query Language).

Storage Models

If you ever used a spreadsheet, you may already be familiar with SQL data models. They are just tables where each row represents an entity. Columns are data points about an entity.

Example table in SQL format:

Name Age City School
Dan 28 Calgary Waterloo
Bob 22 Toronto Toronto
Cathy 25 Montreal McGill

As for NoSQL, the data model differs by type. Here is what a document schema may look like:

{
  "name": <string>,
  "age": <integer>,
  "followers": <Schema Document> | [<Schema Document>, ...],
  "createdDate": <integer>,
  "isPremium": <boolean>
}

I will make in-depth posts about NoSQL databases in the future. For now, the important thing to understand is that they don’t conform to a schema.

Scalability

SQL databases are inherently better at vertical scaling, while NoSQL databases are a better fit for horizontal scaling.

Vertical scaling is where you add more resources (CPU, RAM) to a machine. Horizontal scaling is where you add more machines and distribute your load to multiple machines.

However, it is important to note that with the recent advancements in database engines of both types, this distinction is becoming thinner and thinner.

AWS’s DynamoDB provides ACID-compliance.

MySQL provides the flexibility and high availability of NoSQL databases.

MongoDB provides multi-document ACID transactions.

On top of that, NoSQL databases are adapting to be suitable for vertical scaling if needed, and some SQL databases have made special advancements for distributed processing.

When to use what

These are general guidelines. Depending on the complexity of the app, developers will need to do further research to architect a robust design.

Need ACID Compliance?

If you need ACID compliance, SQL is the way to go.

ACID properties ensure that all database transactions remain accurate and consistent, and support the recovery from failures that might occur during processing operations. Source.

SQL databases are ACID compliant (Atomicity, Consistency, Isolation, Durability).

A popular example of using SQL is in financial transactions - each transaction is crucial and cannot be lost, hence SQL’s ACID compliance could come in very handy.

Is your data structured and unchanging?

SQL is the way to go in this case.

If you know what the data will exactly look like, and that it won’t change much in the future, you can benefit from SQL’s normalization and other optimization features. These will often result in better performance and more efficient use of resources.

Expect large datasets with little to no structure?

If you need to store large amounts of unstructured data, NoSQL has your back.

NoSQL generally doesn’t set any restrictions or constraints on your data model, which can come in handy when you are not exactly sure how your application data will evolve.

Depending on the use case, you may find a very good fit for one of the various NoSQL databases.

Want distributed computing?

Traditionally, NoSQL would be favored here.

I say traditionally because nowadays, we all use AWS, GCP, and Azure.

Tech giants have come up with products like AWS Aurora and AWS RDS which are relational databases but have powerful distributed computing features.

With AWS, it’s almost like you don’t have to worry about this whole topic.

But if you are implementing some on-site solution for a client without relying on AWS or the like, then NoSQL would be your better bet.

NoSQL databases are just inherently better suited for distributed computing.

Also, you may want to double-check if it’s cheaper to go with a NoSQL product. Say, could AWS DocumentDB be more efficient and cheaper for your application than AWS RDS?

This is where the developer needs to understand their application’s needs.

How will data be read? How will it be written? How frequently? Does it differ by feature? What will peak usage be like? What about average usage? What bandwidth is needed?

These are just a few of the many questions that need to be asked when picking out the right tool.


That’s all. Happy studying!