S3 versioning and lifecycles
Captain's log, stardate d445.y38/AB
As a development agency, we're no strangers to also building the technical architecture for our clients. That includes using Amazon's S3 service for storage, which we've used in many projects, like that time we used it for a web platform for a musical contest with thousands of submissions in a very short period of time.
Since we created MarsBased, in early 2014, we have been using Amazon S3 extensively to make our clients' life easier.
For this one e-commerce project in particular, we need to store a large amount of audio and video files from chats in Amazon S3. We also have to keep them indefinitely because of a functional requirement: the platform produces QR codes linking to those audios and videos, and they need to be accessible at all times. For that reason, we added "Versioning" to the bucket as a safety net in case we accidentally deleted files.
One of the features of Versioning is that when you delete a file, it does not completely delete the file. Instead, it creates a delete marker, which acts like a soft-delete. This way, recovering the file when needed becomes an extremely trivial task.
The downside to using versioning and not completely removing the files is that you are charged for these files, as if they were active files. You're paying for the convenience, so to speak.
Another downside to using Versioning is that you will end up accumulating too many files which can grow largely out of control (both in size and in the billing!). In our case, for this particular client, we've got circa 10TB of files, and we're actively hard-deleting some files we know they will never get used. Imagine if we didn't delete them!
What we want to do, in cases like this, is to keep the soft-deleted files for some time (say, a year) and then delete them completely. This can be achieved in S3 by creating a lifecycle expiration rule that applies to previous versions of an object.
- A file is uploaded. The current version of the file is the file itself. Since the rule is configured to act on previous versions, the file is never expired.
- The file is deleted. Since versioning is enabled, a delete marker is created. Now we have 2 versions: an old version with the file and the current version which is just the delete marker.
- After a year, that previous version will expire according to the rule and, thus, the file will be permanently deleted. This leaves an orphan delete marker, but Amazon offers an option to automatically and periodically get rid of those.
This way we have devised a soft-delete mechanism which keeps files for a year, after they are marked for deletion.
Another interesting thing we did is to create a lifecycle rule to move files from "Standard" storage to "Infrequent Access Storage" after 30 days of creation (which is the minimum time allowed).
This reduced the cost of storage to roughly 50% while still being able to retrieve the file immediately. The only caveat is that files less than 128KB cannot be moved there, and we have a significant percentage of audio files less than that size.
For the sake of clarity, I'm sharing some screenshots below, with the parameters of the rule:
Transition rule to move files from Standard to Standard Infrequent Access after 30 days (the minimum)
Expiration rule to expire "Previous versions" (aka: file when there is a delete marker) after 6 months. This also tells Amazon to get rid of orphan delete markers
Make sure "Enable versioning" is selected!
Hope you found this useful! I will keep posting about my findings!