AWS Quick Notes : Amazon S3
--
S3 is an infinitely scaling object storage offered by AWS.
Objects are stored in buckets that have a globally unique name. Buckets are defined at a region level. The object can be traversed using a full path, also known as a key. For example s3://sample-bucket/some-folder/sample.txt
The maximum size of an object supported by S3 is 5TB. Multi-part uploads should be used for any object > 5GB.
Each S3 object supports additional fields such as metadata, tags and version.
Each S3 object has a public URL. By default access is denied to the object.
S3 supports full versioning capabilities at the bucket level.Deletion of a versioned file will add a delete marker on that file so the files are hidden. Deleting a delete marker can restore a file and its versions.
S3 Encryption
Encryption in transit is supported by utilizing an HTTPS endpoint.
There is a S3 default encryption option available as well. This can be set from the UI. It can use any of the options below.This ensures if the file was not explicitly encrypted, it encrypts by default.
Encryption at rest is supported for either the whole bucket or one or more versions of all objects by utilizing one of the following four options-
- Client Side Encryption
Here the client uses a client library. One such client library is Amazon S3 Encryption Client, Clients must encrypt and decrypt all data on their own.
- SSE-S3 — AWS S3 owned, handled and managed keys
Handled and managed by Amazon. Object is encrypted server-side with AES-256 encryption.The header “x-amz-server-side-encryption”:”AES256” is used to indicate to S3 that this type of encryption should be used for the object being uploaded.
- SSE-KMS — AWS owned and handled keys managed with KMS
Here AWS still owns and handles the Customer Master Keys(CMK) but they are managed in KMS.KMS allows for more user control, audit trail.The header “x-amz-server-side-encryption”:”aws:kms” is used to indicate to S3 that this type of encryption should be used for the object being uploaded.The keys can be either AWS Managed Keys or you can choose from your own.
- SSE-C — Customer owned and managed encryption keys
Amazon does not store, handle or own the keys.They are managed by the customer.HTTPS protocol must be used since the encryption key is provided in the security header itself.This option can be done through the AWS CLI only , not supported through the user console.
Security
MFA Delete can prevent accidental deletion of versioned S3 buckets.MFA will be needed to either delete an object or suspending versioning on a S3 bucket.Only the bucket owner(root-account) can enable or disable MFA-Delete.
Pre-signed URLs — if you need to give limited time access(default is 3600 seconds) to objects in S3, pre-signed URLs are the best option.Use — expires-in argument to change the default value.Users inherit the permissions of the originator of the URL. Pre-signed URLs can be generated for both a PUT or a GET.The cli command is aws s3 presign.
Block all public access is ON by default.Blocking all S3 objects from public access can be done at the whole account level as well.To host a static website, this has to be turned OFF.
Bucket Policies
Resource based policies
Resource based policies are rules that can be setup from the S3 console.
Each object has a ACL so we can set the access level at the object level.
Bucket level ACLs also are available to use.
S3 Bucket Policies can be used to grant or block public access to buckets, force objects to be encrypted at upload and cross account access.Blocking public access can be done through ACLs and is very important to prevent data leaks
User Based policies
These are IAM based policies.
An IAM principal can access an S3 object if either the user IAM permissions allow it OR the resource policy allows access AND there is no explicity DENY for that user.
S3 supports VPC Endpoints when providing access to instances inside VPCs. This enables the VPC to get access to S3 without going over the open internet.
Logging
S3 Access Logs can be stored in any S3 bucket. All requests made to S3 are logged. Data can be analyzed using Amazon Athena or other data analysis tools.
CloudTrail provides a record of any API access .
Cross Origin Resource Sharing
CORS header(Access-Control-Allow-Origin, Access-Allow-origin-Methods) need to be enabled to support requests from other domains. The header can allow a specific origin or all origins(*).
S3 Replication(CRR And SRR)
Cross Region Replication or Same Region Replication are features that allow asynchronous copying within AWS.
Bucket Versioning needs to be enabled in both source and destination.
The buckets can be in two different accounts, and in two different regions.
Rules can be applied for replication so you can limit the scope of what gets replicated.
If activating replication on an existing bucket, only point forward objects will be replicated.
For DELETE operations — its optional to replicate delete markers.Delete markers created by lifecycle rules are not replicated.Permanent deletes are not replicated.
There is no chaining of replication. Replication is only between two buckets.
Cross Region replication must be setup for every region.
Files are updated asynchronously in near real time.
Cross Region replication may be used for disaster recovery, low cost and latency for analysis in a few different regions.
To make S3 content available globally , especially it is static content, it may be better to consider Cloudfront as the solution.
Performance
Performance concurrency is per prefix.
S3 allows for 3500 PUT/COPY/POST>DELETE and 5500 GET requests per second per prefix in a bucket.
Its best to spread the requests across multiple prefixes if larger throughput is required.
If using SSE-KMS for encryption you could be impacted with KMS limits.
Optimization
Multi-Part Upload is recommended for all files >100MB and is mandatory for files >5GB.Since it uploads in parallel, the speed of an upload increases.
S3 Transfer Acceleration can speed up uploads since objects are first uploaded to the edge location which then moves the data into an S3 bucket in the region required on the AWS private backbone network.
S3 Byte Range Fetches allow for faster downloads with parallel GET requests for specific byte ranges.
S3 Event Notifications
AWS S3 can send events into SNS, SQS and Lambdas. For example if an object is uploaded and contains a blog post, and event can trigger a lambda which then summarizes the post through a complex algorithm and stores the summary document in S3 as well.
S3 — Requester Pays
Request Pay buckets are such as that the requester of the data actually pays for the network costs.
Consistency on S3
As of December 2020, strong consistency has been introduced on S3 buckets.