Simple Storage Service (S3)
Amazon's Object Storage Service (NOT a comprehensive file system)
Essentials
Can serve many purposes when designing highly available, fault tolerant, and secure application architectures.
Bulk (unlimited) static object storage
Various storage classes to optimize cost vs. needed object availability/durability
Object versioning
Access restrictions via S3 bucket policies/permissions
Object management via lifecycle policies
Hosting static files & websites
Origin for CloudFront CDN
File shares and backup/archiving for hybrid networks (via AWS Storage Gateway)
Facts
Objects stay within an AWS region and are synced across AZ's for extremely high availability and durability (99.999999999%) (11 9's)
Extra charges out of regions
Latency loss (travels farther)
Create buckets in a region where it makes sense to it's purpose:
Serve content to customers
Share data with EC2
Read Consistency Rules
ALL regions now support read-after-write consistency for PUTS of new objects into S3
Objects can be immediately available after "putting" an object in S3
All regions use eventual consistency for PUTS overwriting existing objects and DELETES of objects.
Components
Buckets
Main storage container of S3, contain a grouping of information and have sub name spaces that are similar to folders (and are called folders)
Tags can be used to organize buckets (ie. tag based on application the bucket belongs to.)
Default encryption
Bucket Names:
Each bucket must have a unique name GLOBALLY (across all of AWS)
DNS-compliant
No uppercase letters or underscores
Lowercase letters, numbers, and hyphens (periods for static web hosting)
Bucket Limitations:
Only 100 buckets can be created in an AWS account at a time (soft limit)
Bucket ownership cannot be transferred to another account once a bucket is created.
Objects
Include metadata information:
Set of name-key pairs
Contain information specified by the user, and AWS information such as storage class
Each object must be assigned a storage class, which determines the object's availability, durability and cost
By default, objects are private (even if the bucket is public!)
Objects can:
Be as small as 0 bytes and as large as 5 TB
Have multiple versions (if versioning is enabled)
Be made publicly available via a URL
Automatically switch to a different storage class or deleted (via lifecycle policies)
Encrypted
Organized into "sub-name" spaces called folders
Object Encryption
SSE (Server Side Encryption)
S3 can encrypt the object before saving it on the partitions in the data centers and decrypt when it is downloaded (AES-256)
SSE-S3 - S3 managed encryption Keys
SSE-C - Customer provided encryption keys
SSE-KMS - S3 uses master key in KMS
Client Side Encryption
Application encrypts/decrypts before/after retrieval
SSL terminated endpoints for the API
Objects are encrypted in transit
Select From
New functionality added that allows you to extract records from a CSV or JSON using SQL expressions! Can be done programmatically as well via API.
Folders
For simplicity, S3 supports the concept of "folders"
This is done only as a means of grouping objects
Amazon S3 does this by using the key-name prefixes for objects
Amazon S3 has a flat structure, there is no hierarchy like you would see in a typical file system!
Features
Versioning
A feature to manage and store all old/new/deleted versions of an object
By default, versioning is disabled on all buckets/objects
Once its enabled, you can only "suspend" versioning. It cannot be fully disabled.
Suspending versioning only prevents new versions from being created. All objects with existing versions will maintain their older versions.
Versioning can only be set on the bucket level and applies to ALL objects in the bucket.
Different lifecycle policies can be applied to the current version and previous versions
Versioning and lifecycle policies can both be enabled on a bucket at the same time.
Can be used with lifecycle policies to create a great archiving and backup solution in S3.
Cross Region Replication available when versioning enabled.
Storage Classes
Each storage class has varying attributes that dictate:
Storage cost
Object availability
Frequency of access to the object
Standard
Designed for general, all-purpose storage
Default storage
99.999999999% object durability (11 9's)
99.99% object availability
Most expensive
Infrequent Access (S3-IA)
Designed for objects that you do not frequently access, but must be immediately available when accessed.
99.999999999% object durability (11 9's)
99.9% object availability
Less expensive than the standard storage class
30 day minimum
One Zone Infrequent Access (S3 One Zone-IA)
Designed for non-critical, reproducible objects.
99.999999999% object durability (11 9's)
99.5% object availability
Less expensive than S3-IA
30 day minimum
Glacier:
Designed for long-term archival storage (not to be used for backups)
May take several hours for objects stored in Glacier to be retrieved
99.999999999% object durability (11 9's)
Cheapest S3 storage class (looooow, like $.004/GB/month)
90 day minimum
Lifecycle Policies
A set of rules that automate the migration of an object's storage class to a different storage class (or deletion), based on specified time intervals.
By default, lifecycle policies are disabled on a bucket/object
Are customizable to meet your company's data retention policies
Great for automating the management of object storage and to be more cost efficient
Separate policies for current version and previous versions
Event Notifications
S3 event notifications allow you to set up automated communication between S3 and other AWS services when a selected event occurs in an S3 bucket.
Common event notification triggers:
ObjectCreated
Put
Post
Copy
ObjectRemoved
Event notification can be sent to the following AWS services
SNS
Lambda
SQS Queue
Permissions
All buckets and objects are private by default - only the resource owner has access
The resource owner can grant access to the resource (bucket/objects) through S3 "resource based policies" OR access can be granted through a traditional IAM user policy
Resource based policies (for S3) are:
Bucket policies
Attached only to the S3 bucket (not an IAM user)
The permissions in the policy are applied to all objects in the bucket
The policy specifies what actions are allowed or denied for a particular user of that bucket:
Granting access to an anonymous User
Who (a "principal") can execute certain actions like PUT or DELETE
Restricting access based off of IP address (generally for CDN management)
S3 Access Control Lists
Grant access to users in other AWS accounts or to the public
Both buckets and objects has ACLs
Object ACLs allow us to share an S3 object with the public via a URL link
Static Web Hosting
Low-cost, highly reliable web hosting service for static website (content that does not change frequently)
When enabled, static web hosting will provide you with a unique endpoint (url) that you can point to any properly formatted file stored in an S3 bucket (HTML, CSS, JS)
Amazon Route53 can also map human-readable domain names to static web hosting buckets, which are ideal for DNS failover solutions
Cross-Origin Resource Sharing (CORS)
A method of allowing a web application located in one domain to access and use resources in another domain.
This allows web applications running JS or HTML5 to execute objects in S3 buckets (JS, web fonts)
Like a mini CDN for JS libraries or HTML components! Need to turn this on or else there will be a security error on the client for cross-site scripting!
Glacier
An archival storage types. Offline cold-storage
Used for data that is NOT accessed frequently
"Check out" and "check in" jobs can take hours, meaning how long it can take for the data to be changed and/or retrieved
Integrates with Amazon S3 lifecycle policies for easy archiving
Very inexpensive and cost effective archival storage solution
Glacier should NOT be used as a backup solution
99.999999999% (11 9s) durability
SSL/TLS Endpoint
Encryption at Rest
Can "Lock" A Vault
"Readying" an object takes hours, then you can download.
Data Retrieval (pricing varies)
Expedited: 1-5 minutes
Standard: 3-5 hours
Bulk: 5-12 hours
Transfer into S3
Single Operation Upload
Traditional method where file is uploaded in one part
Upload up to 5GB, but good practice is to use multipart for anything over 100MB
Multipart Upload
Allows you to upload a single object as a set of parts
Allows for uploading parts of a file concurrently
Allows for stopping/resuming file uploads
If transmission of any part fails, you can retransmit that part without affecting other parts
After all parts of your object are uploaded, Amazon S3 assembles these parts and creates the object
Required for objects 5GB and larger, highly suggest over 100MB
Can be used for up to 5TB file size
S3 Transfer Acceleration
User uploads through Cloudfront Edge Locations
Enable on a bucket and change endpoint:
mybucket.s3.amazonaws.com -> mybucket.s3-accelerate.amazonaws.com
Snowball
Snowball is a petabyte-scale data transport solution
Snowball uses an AWS provided secure transfer appliance
Quickly move large amounts of data into and out of the AWS cloud
Up to 80TB per device
100TB will take 100 days over 1Gbps, but with two snowballs it takes a week!
Snowball Edge
Snowball plus onboard compute capability
Up to 100TB per device
Can be clustered
S3 API Interface
Lambda Functions
Snowmobile
Exabyte-scale data transfer
100PB per Snowmobile, move in a few weeks plus transport time (could take 20 years over a 1Gbps direct line!)
Storage Gateway
Connects local data center software appliances to cloud based storage such as Amazon S3
VMWare or Hyper-V
Store local data in S3
Encryption at rest and in transit
Volume Gateway
EBS snapshots for Disaster Recovery
Gateway-Cached Volumes
Create storage volumes and mount them as iSCSI devices on on-premise servers
Gateway will store the data written to this volume in Amazon S3 and will cache frequently accessed data on-premise in the storage device
Gateway-Stored Volumes
Store all the data locally (on-premise) in storage volumes
Gateway will periodically take snapshots of the data as incremental backups and stores them on S3
File Gateway
Local NFS
Objects are stored and retrievable in S3
Tape Gateway
Emulates industry-standard iSCSI-based virtual tape libraries
Common backup applications (eg. Veeam, Veritas, Arcserve, Dell, etc.)
Last updated
Was this helpful?