Non-Relational Databases

⁠

We saw an introduction to non-relational databases in the previous unit. Let’s dig into some more concepts pertaining to these systems in a little more detail.

🧘 Flexible Data Model/Schema-less

In general, non-relational databases work like key-value stores. They do not store data in rows and columns like the way relational databases store data. Rather, the value is usually a data blob, meaning it can have a fuzzy structure that changes from time to time. This way, the data is not tied down to a strict schema.

Let's look at an example. Let's assume, we have a database that stores metadata about a user. Metadata could be first name, last name, location, and email.

⁠

Screenshot 2024-04-29 at 2.35.23 PM.png

⁠

The system allows users to view their data, edit their data and also new users to add their data.

Now, we can imagine that with a system like this, there will always be an opportunity to include more and more metadata. For example, it's possible that a little later down the line we add date of birth, interests, educational background, professional background, etc.

Now if this system was modeled with a relational database, then every time a new field was added, we would have to change the entire schema of the table and apply the schema to the database before it could be used. This could mean slower development time, database downtime, and if it happens often enough, it could really slow things down (both development-wise and performance-wise). Therefore, in this case, a good choice would be to have a key-value store or a document database whose key would be the ID of the user and value would be a JSON block of all the information.

Let's look at some pros and cons of having a schema-less DB:

✅ Pros

Flexibility of data model: We are not bogged down by strict schemas and painful schema changes. Fast evolving systems benefit hugely from this.

Faster writes: Non-relational databases perform better in general during writes. Imagine not having to validate the data schema, data types, foreign keys, etc.

❌ Cons

Increased possibility of corruption: Since the data is simply a blob, it's possible for bad data to get in if not handled by the application code correctly. For example, at a DB level, there is no check for what the JSON keys are.

Less clarity and data organization: If you look at a relational table, just a glance at the schema can tell you what the data would look like. However, in a non-relational DB, that's very hard to tell because in theory, every JSON blob of different data points could have a different set of keys in them.

No support for join queries: Since the data is unstructured, if your system has queries that need to join multiple non-relational tables, the query performance could be poor. In the above example, let's assume you have another key-value store of locations keyed by ID, like this:

Screenshot 2024-04-29 at 2.45.54 PM.png

⁠

Now if the User metadata has a location pointing to this ID (instead of the name string), then for a query like “Give me all users in San Francisco”, you would have to parse through all the users’ information and one by one, match it with the above table to get the result. This would be something a relational database would do much better with joins.

No support for range and comparison queries. Imagine we had a query that wanted to find all users born after a certain date. NoSQL databases cannot optimize this query and will have to scan all the rows to get the result, whereas relational databases would have no problems doing this optimally.

🙋 Question: Does this mean that we can never have validation on the structure of data when we use non-relational databases?

We can, but that will most likely have to be done in the application code. The application should validate the structure while writing or reading from the database and then handle errors accordingly. This is why these systems are also called as “Schema on Read”.

🚏 Transactional Support/BASE

In general, document databases or non-relational databases are meant for single object reads and writes. They are built for blazing fast writes and for scaling horizontally. This means that generally they are not built to support “transactions” across objects or multiple operations. Supporting transactions consumes time and resources. Think about starting a logical transaction and then all the work you need to do for locking and to make sure all the operations within the transaction are complete and if not, all the rollback operations that need to be done, etc. Non-relational databases work best for use cases where transactional support is not needed across objects and operations.

Therefore, non-relational databases do not inherently support the ACID properties that relational databases do.

Instead of ACID, another term has been coined for non-relational databases called BASE. Base stands for basically available, soft state and eventual consistency. Read more about BASE on

🙋 Question: So is it not possible to support transactions over multiple objects in a non-relational database?

No, it is totally possible to have transaction support in a non-relational database. However, since they were not made with an intention to support transactions, they usually don't have “native” support for this. Therefore, typically the way to do that would be to run another layer of middleware on top of the non-relational database that does the transaction management for the application.

OMID is a system that helps support transactions on HBase (which natively does not support multi-object transactions)

Some non-relational databases, like MongoDB, have introduced support for multi-document transactions.

reading

Additional Reading

Relational vs document databases today, Chapter 2. Designing Data intensive Applications.