Today we will continue with AWS stuff after a short break. You might remember a brief introduction to the database category in Episode 65: The Amazon Web Services Jungle. It’s time to expand upon that a bit as we did with compute and storage categories before.
We will dive a bit deeper into four services, which somehow represents different approaches to organized data storage: Relational Database Service, DynamoDB – a NoSQL database, ElastiCache – an in-memory data grid and Redshift – data warehouse.
RDS – Long time ago…
Amazon RDS, released in 2009, is a service for managing “classic” relational databases, stuff that usually first comes to mind when we think of “database”. RDS offers several engines: PostgreSQL, MySQL, MariaDB, Oracle database, MS SQL Server and Amazon’s own AuroraDB. The service simplifies setting up a database – we don’t have to prepare a machine, install the database server on it, care about updates, licensing, HDD size, CPU, RAM, scaling, backups, replication, stand-by and many similar issues, it’s all handed over on a plate.
AuroraDB, introduced in 2014, as one of database systems supported by RDS, is based on MySQL. They are compatible, meaning tools designed for MySQL will work with Aurora, though Aurora yet lacks some MySQL features, for example supports only InnoDB storage engine. It performs better than MySQL under most circumstances due to tight integration with low level SSD-based storage drivers of AWS underlying infrastructure.
DynamoDB – To SQL or not to SQL?
Amazon DynamoDB, introduced in 2012, is a propriety NoSQL database system. Although technically such systems are even older than relational database, the term NoSQL appeared in early twentieth century and rose to popularity along with big data, when companies, like Facebook or Google, began to store a massive number of documents, media files and cat pictures. Data structures used in NoSQL databases are different from those in relational databases, making them more suitable in some domains, often sacrificing consistency over simplicity and better horizontal scaling. There are few types of NoSQL databases: column, document, key-value, graph and multi-model. DynamoDB can be classified as key-value type – an associative array with arbitrary objects identified by keys. It Is based on the Dynamo principles: incremental scalability, symmetry, decentralization and heterogeneity.
In general, NoSQL is better at handling large amount of data without relations, simpler and more flexible. Relation databases, on the other hand, are better at transactions with ACID properties – Atomicity, Consistency, Isolation, Durability, handling rigidly structured data and complex queries with joins and grouping.
ElastiCache – Hold my beer!
Amazon ElastiCache is a service managing one of two available in-memory open-source object store engines: Memcached and Redis, used mainly as object cache. Amazon role here is pretty similar as in case of RDS – it takes care of software configuration, updates, scaling, fail-over etc. In-memory data store can be used as database, cache or message broker. The main advantage stems from the fact that it keeps data in RAM memory instead of hard drives, so it’s much faster. The disadvantage is that if the node goes down even temporarily, the data is lost. The likelihood of that is much higher than the likelihood of a hard drive failure which would have the same effect. In order to overcome that, suitable nodes replication has to be put in place, but it all happens behind the scenes. More serious consideration is the cost of RAM compared to hard drive. In-memory caching is effective when we have relatively small set of data, that is accessed very often. In that case the effective cost of the solution can be several times lower than if data was accessed from hard drives due to performance gains. Memcached or Redis then? The answer is: most likely Redis. Check out this article on Why Redis beats Memcached for caching for more details.
Redshift – Such intelligence, much business.
Amazon Redshift, introduced in 2013, is a managed data warehouse based on PostgreSQL. In general, data warehouses are used in business intelligence to aggregate, label and organize data from different sources, in order to perform data analysis and reporting efficiently. Redshift uses a relational database, but unlike in RDS managed databases, it is column-organized, which increases data retrieval performance and enables massively parallel processing of queries much more efficiently than in row-organized databases. This way even complex ad-hoc queries can be executed very fast, making it ideal for business intelligence purposes.
Most likely we will take a look at yet another group of AWS services in the future, but I still need to decide which one will be interesting to write about (and hopefully to read about).
Currently I’m in the middle of Geecon 2017 in Kraków, Poland, so the next episode / episodes will most likely be some kind of conference report. Stay tuned!