Simple...Not
The first S in S3 was supposedly for the word simple. This has been a massive failure, although it has gotten better. S3 can do so much that it is actually hard to wrap your head around it all at once. First, a few esoteric things you need help from other services to accomplish: you can use FSx for Lustre to mount the S3 bucket as an NFS drive and read and write objects as if it were any other NAS (Fuse does this too). You could also use the Transfer Family to set up an SFTP server right on top of the bucket, all you need to supply is a public SSH key and a username. But what else? S3 can host objects written by countless AWS services (reports, logs, and audits). S3 can act as a simple web server (HTTP only for custom domains, but there is always an HTTPS endpoint you could use) but also implement complex redirect rules. S3 can act as the origin for CloudFront to serve up files to the public (which is how this website originally worked before the Lambda@Edge conversion). S3 can have at least a dozen permission models - who owns the objects in the bucket, who pays for them, is it the same as the bucket owner, signed URLs for temporary gated upload and download capabilities. Objects can have metadata directly attached or written to DynamoDB. Objects can be encrypted using multiple different systems, and will soon be encrypted by default (although this is basically checkbox encryption to satisfy an audit or control requirement, not truly effective encryption, but still, a start). If you want to provide cross-account access, you'll need a bucket policy attached, or you can turn on bucket level ACLs and provide grants to other principals, either inside AWS or even by email address. You can version the objects, such that deleting one merely writes a delete marker to the object, but previous versions remain indefinitely, sometimes without you even knowing. Buckets can have lifecycle policies to move the objects to different storage tiers to save money (the Intelligent Tiering Glacier Long Term Storage class is approximately 85% cheaper than the standard price). You can trigger notifications to SNS topics or even directly invoke Lambdas when objects are created, updated, or destroyed, allowing you to manage complex workflows and lifecycles however your business demands. I am probably missing at least half a dozen additional useful features, but you get the idea. It is anything but simple.
Simple Rules
All that said, here is the summary worth taking away:
- Encrypt with a customer-managed CMK - it is only $1/mo and worth every penny. You will have permissions issues. Do not encrypt objects meant to be public, that makes no sense.
- Version all buckets every time or write down why not
- Always turn on S3 logging (which is far cheaper than CloudTrail Data Events) but recognize that CloudTrail API and Data events are also useful. Write the logs to a bucket (and do not turn on logging for the logging bucket).
- Minimize the number of buckets you create but do not mix data across environments in a single bucket
- Always turn on whatever Public Access Block settings you can, review these often to see if you can check off more
- The folders you see in S3 are fake - there is no folder structure, no hierarchy, it's all convenience to align with your existing mental model of folders on disk.
- S3 has a maximum throughput of 3000 - 5000 tps, if you go over this, you can get throttled. Avoid this by researching hot partitions, and use a random prefix when you can to allocate where objects will go in your bucket
- Always use Intelligent Tiering and set a lifecycle to transition objects there after 24 hours. Enable Glacier storage classes on the bucket if you can.
- Check Storage Lens once a month to see that things are in the classes you expect, buckets are growing as you expect, and there are no anomalous errors.
- Always set a lifecycle policy to delete partial multi-part uploads and old versions of objects after some reasonable time.
- Bucket names have to be globally unique, and as soon as you delete a bucket, someone can scoop up the name. Choose a naming scheme ahead of time and make sure you're happy with it and you can stick to it. Also realize you will have exceptions to this naming scheme for many reasons, some will seem good, many will seem ridiculous. Do your best. I like "company-environment-purpose"