Most relational database systems satisfy the ACID properties: Atomicity, Consistency, Isolation, and Durability. The CAP theorem states that for distributed data stores at most two of the following properties can be guaranteed: consistency, availability and partition tolerance. This gives base to the general stance that relational database systems suffer of horizontal scalability problems.
The ever-growing volumes of data in modern society give rise to database systems that are capable of processing huge amounts requiring high performance and horizontally scalability. Its in fields as Big Data Analytics, Business Intelligence and social networks where relational database systems fall short (Moniruzzaman and Hossain 2013).
NoSQL is a class of database management systems that do not rely on the traditional relational data model but one of semi- and unstructured data. They are typically used for large data sets that are prone to the performance and scalability limitations of the relational data model (Makris et al. 2016). NoSQL database systems focus on processing large datasets while offering increased scalability, typically at the expense of consistency (due to the CAP theorem).
They can typically be classified into the categories: key-value stores, column-family stores, document-stores, and graph databases (Moniruzzaman and Hossain 2013; Makris et al. 2016).
Key-value stores store values associated by keys. The behavior can be compared to a map implemented by a hash table. They are typically compatible with the operations $put(key, value)$, $get(key)$, and $delete(key)$. Most key-value stores operate (mostly) in memory which lead to high throughput and low latencies, making them especially suitable for caching. Two popular key-value stores are Memcached and Redis. Both allow distributed storage and implement nowadays a hybrid storage mechanism where they don't operate only in memory but also allow persistent storage.
A intuitive way of understanding the concept of column-family stores is recognizing that relational database systems store data by rows while column-family stores store columns of related data. Column-family stores store a map of key-value pairs with a set of columns as value and an identifier as key. Each column is a triple of the column name, value and a timestamp.
They are typically suitable for large sparse datasets due to efficient partitioning and storage of null values (Makris et al. 2016). Where as relational database management systems store a null value in each row, column-family stores only when needed.
Document-stores store documents containing semi-structured data in formats as XML, JSON, or BSON. Each key value can be seen as a primary key and row when considering the relational data model. Compared to the previously discussed key-value stores, document-stores typically allow processing and manipulation of data directly in the document. They are again good for storing sparse data due to storage of semi-structured data that do not necessarily need to comply with the structure of other documents.
MongoDB and CouchDB are two popular document-store database management systems. MongoDB version 3.2 introduced combining two collections. This can be compared to the regular JOIN functionality of relational database systems.
Suppose we want to store events of multiple subsystems that each produce events in JSON format of different structures; and assume all events have a user identifier but only some have additional information stored. We store this as a document which is a row in the relational database model. Now we have an application that wants to extract events with the name of the associated user. Without considering the internal implications, without join support it is required to either:
- sequentially read and transfer all users, map all user ids, and finally read and transfer events by user ids;
- sequentially read and transfer all events, map all user ids, and finally read and transfer users by user ids; or
- read all users and events in parallel.
All scenarios are detrimental in terms of volumes transfered and latencies due to data unnecessarily leaving the premise. In the first two scenarios latency is even worse due to sequential operations.
Graph databases store relations but not as in the traditional sense. Relations are represented by edges between nodes, both can be supplemented by properties. They are typically used where the relations are more important than the data itself.
NoSQL database management systems are becoming more popular due to new requirements where relational database management systems fall short. This is typically the case where volumes of data are large and where data does not necessarily follow a relational data model. NoSQL database management systems allow for performant processing and horizontal scalability at the cost of consistency. Even though NoSQL and relational database models were built for different goals, functional differences seem to decrease where typical NoSQL features make it into relational database systems and vice versa.
Makris, Antonios, Konstantinos Tserpes, Vassiliki Andronikou, and Dimosthenis Anagnostopoulos. 2016. "A Classification of Nosql Data Stores Based on Key Design Characteristics." Procedia Computer Science 97:94--103.
Moniruzzaman, ABM, and Syed Akhter Hossain. 2013. "Nosql Database: New Era of Databases for Big Data Analytics-Classification, Characteristics and Comparison." arXiv Preprint arXiv:1307.0191.