April is definitely the AWS month in here. We started in episode 64 with an introduction, and an overview of first 3 service groups, then in the two following episodes we went through next 15 groups at a lightning fast pace for a total of 80 individual services. Oh my, that was a lot of links, and I bet, something new appeared meanwhile.
Today we are going to expand a bit upon the storage category, most importantly: S3, Glacier, EFS and EBS. We will talk about what those are exactly, what are the options, use cases and we will present some tips. I actually missed EBS from the list in the first article, since it does not appear under storage category in the AWS console, it is however present there on the webpage. However, let’s start with the most commonly known service, the S3.
Simple Storage Service
S3 is one of the oldest publicly available AWS services, and was launched in 2006. It’s an object-based storage for files up to 5TB in size. In order to upload one, we have to create a bucket first. Buckets act as top-level folders, and their names are globally unique. Therefore, files can be accessed via URL consisting of service address, bucket name and file name. Aside from HTTP REST API, we can manipulate buckets and files from AWS console, command line interface, or programmatically via AWS SDK for particular language. For durability, files are distributed among multiple availability zones. When new file is uploaded, we are guaranteed read after write consistency. When files is modified or deleted, we have eventual consistency, meaning we can run into older version of the file for some time.
There are few storage classes within S3, namely:
Standard – the most expensive in terms of cost per amount of data being stored. It guarantees 99,999999999% of durability (yes eleven nines), achieved via ridiculously high amount of actual copies of the data. It’s designed for 99,99% availability, while the SLA is for 99,9%. What it means, is that chances that the data will be unavailable are quite real, but the chances that the data will be lost forever are practically nonexistent.
Infrequent Access – the cost of storage is lower, however we are charged per each retrieval of data. Durability stays the same, availability and SLA is 99,9% and 99% respectively. Minimum file size is 128 kB, and minimum storage duration is 30 days.
Reduced Redundancy Storage – durability is only 99,99%, while the rest of parameters is the same as Standard class. It’s recommended for data that can be easily recreated.
Glacier – considered as both S3 storage class and the separate service is a cold data store. It is considerably cheaper than Standard class, has similar durability, but no availability SLA and retrieval time between 3 and 5 hours. No minimum file size and 90 days of minimum storage duration.
We can configure policies for moving files automatically between storage classes over time, as well as just delete them.
EFS vs EBS
Let’s leave S3 and Glacier for now, and look at EFS and EBS. Those two seem to be confusing at first.
EFS, or Elastic File System is a relatively new service, it’s a NAS – Network Attached Storage operating at file level, that can be accessed by multiple EC2 instances. The advantages are that the file system is managed, so we don’t have to care about formatting and it scales automatically depending on the actual amount of data that we put on it.
EBS or Elastic Block Store is a SAN – Storage Area Network, that operates on block level, and needs a single EC2 instance. It requires formatting (so we can choose whatever file system we like on it), comes in many variants optimized for different IO characteristics like throughput, latency and number of operations per second. It can also be configured into various RAID setups. The advantages are lower price and possibility of low-level fine-tuning of the device depending on particular needs.
Some random thoughts on storage uses:
- If our data is accessed very frequently, consider placing it in the database instead of S3.
- Files and folders inside buckets are indexed according to their name prefixes, thus diversifying them can yield better performance.
- If our files are very small, placing them in Infrequent Access storage class might be more expensive than Standard class, due to minimum 128 kB size.
- Similarly, if the files are short-lived, IA might actually be more expensive due to minimum storage duration of 30 days.
- It’s not possible to append to file in S3 bucket, we need to download it, modify and upload again.
- Files can be versioned, but each version takes space, and we are charged for it. It also somehow blurs the real amount of space we are using.
- Captain Obvious here: Using data compression might save a lot of money on S3.
I found a very good and detailed article with much more tips here.
AWS Storage services are immensely popular, which can be observed during recent S3 outage at the end of February 2017, when many big websites relying on S3 had issues or went down completely for several hours. As Amazon stated, the reason was actually a human error – a typo in number of servers to be removed in a command. It is somehow scary when too much of the Internet relies on single cloud provider.
There is one more service in the category, the Storage Gateway, however I decided it’s not that interesting, so if you want to know more about it, AWS website and Google will be your mates. As we are, more or less, done with the storage overview, in the next episode we will tackle what AWS has to offer in the compute category. Stay tuned, and subscribe!