By Andrew Oliver, InfoWorld (US) | Aug 8, 2012
Big data events almost inevitably offer an introduction to NoSQL and why you can't just keep everything in an RDBMS anymore. Right off the bat, much of your audience is in unfamiliar territory. There are several types of NoSQL databases and rational reasons to use them in different situations for different datasets. It's much more complicated than tech industry marketing nonsense like "NoSQL = scale."
Part of the reason there are so many different types of NoSQL databases lies in the CAP theorem, aka Brewer's Theorem. The CAP theorem states you can provide only two out of the following three characteristics: consistency, availability, and partition tolerance. Different datasets and different runtime rules cause you to make different trade-offs. Different database technologies focus on different trade-offs. The complexity of the data and the scalability of the system also come into play.
Another reason for this divergence can be found in basic computer science or even more basic mathematics. Some datasets can be mapped easily to key-value pairs; in essence, flattening the data doesn't make it any less meaningful, and no reconstruction of its relationships is necessary. On the other hand, there are datasets where the relationship to other items of data is as important as the items of data themselves.
Relational databases are based on relational algebra, which is more or less an outgrowth of set theory. Relationships based on set theory are effective for many datasets, but where parent-child or distance of relationships are required, set theory isn't very effective. You may need graph theory to efficiently design a data solution. In other words, relational databases are overkill for data that can be effectively used as key-value pairs and underkill for data that needs more context. Overkill costs you scalability; underkill costs you performance.
Key-value pair databases
Key-value pair databases include the current 1.8 edition of Couchbase and Apache Cassandra. These are highly scalable, but offer no assistance to developers with complex datasets. If you essentially need a disk-backed, distributed hash table and can look everything up by identity, these will scale well and be lightning fast. However, if you find that you're looking up a key to get to another key to get to another key to get to your value, you probably have a more complicated case.