Not always SQL
The NoSQL concept was first used in 1998 by Carlo Strozzi – though with a slightly different meaning. It was only when a group of professionals met for an IT conference in San Francisco in 2009 that the term was given the same meaning it has today. The professionals needed a brief and unique hashtag for Twitter so that people would all be referring to the same thing. Eric Evans – then employed at Rackspace – suggested NoSQL (not only sql). It was accepted by the community and since then has been the regularly used common denominator for a group of databases that differ from relational databases.
In just a few years, a new heterogeneous group of database systems has captured the world's attention as a more and more obvious alternative to the relational database systems. This attention is fully deserved. NoSQL databases represent an approach to database technology that has the potential to fundamentally change enterprise IT architecture.
Analysts, businesses and IT experts speculate from time to time on how the world’s data volumes are increasing by the second. For example, data management company AIS recently assessed that if you collected all the world's data records from 2012 and burned them on DVDs, the stack would reach to the moon and back five times. In the same vein, a Cisco analysis has estimated that in 2016 overall the world’s data centres will handle 6.6 zettabytes of data per year. This is equivalent to every person on earth streaming approx. 2.5 hours of HD video each day. But regardless of such musings and guesswork, it is a fact that the world's data volumes have increased explosively in recent years. Storing this data is a challenge that we are capable of meeting. But when it comes to extracting meaning from these enormous amounts of data – particularly in real time – the challenge is of a completely different nature.
In terms of databases, SQL has since the mid-1980s been the preferred default language when it came to storing and retrieving data from a relational database system – such as MySQL or Oracle. Data was divided into tables, enabling it to be stored and accessed according to a defined structure. But starting in the early 2000s, the Internet and businesses began generating a different type of data. The nature of this data was much more unstructured, and it resisted being fitted into traditional tables. It included data from Web 2.0 applications and social media, images, geographic information, chats, etc. This change in both data volume and data types led to the emergence of a group of databases, known as NoSQL, which in recent years has gained wider and wider acceptance. Google and Amazon were among the first to use NoSQL databases, and many others have since followed in their wake. Today, major companies such as Facebook, Mozilla, Adobe, Foursquare, LinkedIn and Digg all use NoSQL databases. As a sign that NoSQL databases are not just for Internet giants, but increasingly also for 'ordinary' companies, IT research company Gartner included NoSQL databases in its report entitled 'Emerging Technologies Hype Cycle 2012'.
While there is broad consensus that there are a number of database systems on the market that differ fundamentally from relational database systems, there is less agreement on a proper definition of NoSQL databases. Not even British database guru Martin Fowler dares to attempt on an actual definition of the concept, since the various NoSQL databases do not share much in common. Instead, Fowler proposes a number of generic characteristics of NoSQL databases:
- They use (as a rule) non-SQL language
- Many of them are designed to run on clusters
- Many are open source
- They do not operate with a fixed schema structure
Although the wording of NoSQL does suggest it, the ‘No’ in NoSQL stands for ‘Not only’, rather than ‘No’. This means that NoSQL databases can actually use SQL-type query language, but usually do not do so. Martin Fowler's second characteristic points to the fact that relational databases and SQL are designed to run on one machine, while many NoSQL databases are designed to run on large clusters of machines. This enables NoSQL databases to deliver much higher response times because the system can quickly distribute the load across a large number of computers. The third characteristic is that many of the NoSQL databases are open source. Among other things, this means that, with a limited investment, companies can download, implement, and test whether an application and a specific NoSQL database can communicate with each other. Finally, NoSQL databases do not operate with a fixed schema structure like relational databases. For example, if a company would like to store a customer's phone number, first and last name, address and city, all this data must be defined in a fixed structure in a relational database. This means that the entire structure must be changed if, for example, the company wants to add the customer's preferred product as an additional table. This is expensive and bothersome. NoSQL databases are designed to allow the addition of data without a pre-defined structure. This means that the application can be changed in real time without fear of downtime.
From ACID to BASE
As already mentioned, there are several different ways to illustrate the difference between relational and NoSQL databases. In addition to the the above range of characte-ristics, two database rule sets can also be compared. The first set of rules is known as ACID (Atomic, Consistent, Isolated, Durable), to which relational databases always adhere. This means that a transaction is either carried out completely or not at all (Atomic), that only valid data is added to the database (Consistent), that transactions never affect each other (Isolated), and that transactions are never lost (Durable).
The NoSQL databases operate according to a different set of rules known as BASE (Basic Availability, Soft state, Eventually consistent). BASE is easiest to explain backwards,
beginning with Eventually consistent. An ACID system guarantees data consistency after each transaction; a BASE system guarantees data consistency within a reasonable period of time after each transaction. In other words, there is data consistency in the system – just not immediately. This leads on to the Soft State principle. If the data is not consistent at all times, the system must take a temporary data state - a Soft State - into account. Finally, the sum of both these principles means that data accessibility is given very high priority in a NoSQL system – even if coincident errors occur in the database system, operating system or hardware. If parts of the database do not work, other parts of the database take over, so that data can always be accessed.
A business decision
Companies, consultants and experts can quickly lose their way in the jungle of technological opportunities within the burgeoning world of database systems. Because the question is, when is a traditional relational database system the best solution to handle the company's data and when is it
better to seek alternatives among the NoSQL systems? The answer to that question should be based on a business decision made in consultation with the IT manager. Take Amazon, for example. Amazon decided very quickly that their business model was to provide prompt shopping to its customers. On the other hand, customers might find that a purchased item was not in stock after all, due to data inconsistency. But never mind, said Amazon. We just want to be known for always giving our customers a shopping opportunity. This goal was out of sync with the rule set in an ACID database system, so Amazon developed its own database system, Dynamo, which better supported the always available approach. An IT decision made on the basis of a business decision. Amazon had the financial muscle to build its own customised database architecture. Many companies do not have those resources. But the point remains the same. Choosing a relational database system as a default reaction is a thing of the past.
Advantages and disadvantages of NoSQL databases:
- High scalability
- High schema flexibility
- Suitable for distributed
- Less administration
- Low costs
- No standardisation
- Technologies still immature
- Limited tooling possibilities
- Eventual consistency is not
- intuitive to program