DynamoDB
DynamoDB is a fully managed NoSQL database. You don't have to manage the underlying applications or infrastructure, all you need to care about is how to store and retrieve your data.
Essentials (from CSA)
Fully-managed NoSQL database service provided by AWS
Similar to MongoDB, but home-grown by AWS
Schemaless, uses key-value store
Specify the required throughput capacity, and DynamoDB does the rest
Fully-managed:
All provisioning and scaling of underlying hardware managed
Fully distributed and scales automatically with demand
Fault tolerant and highly available
Fully synchronized data across all AZs within the region of creation
Easily integrates with other AWS services such as Elastic MapReduce
Can easily move data to a Hadoop Cluster in MapReduce
Popular use cases:
IOT (storing meta-data)
Gaming (storing session information, leaderboards)
Mobile (storing user profiles, personalization)
Only needs one key: partition key
If partition key is not unique to that table, you can add a sort key
Databases as Services
What Drives Database Evolution?
What Triggers New Database Tech?
Need to handle more data
Need to process data more quickly
Need to process data less expensively
Need to handle data more reliably
Initial Database Evolutions
The human brain
Limited storage, reliability, speed
Writing and paper records
Limited speed
Punch Cards
Magnetic Tapes
Distributed file systems
More expensive but faster
Relational Databases
Focused on normalizing and deduplicating data
Goals for relational databases
Reduce the storage footprint
Ask questions and produce materialized views (SQL)
Storage was expensive
Processing was cheap
Ad hoc queries
Critical Developments
CPU is now expensive
Storage is now cheap
Data explosion from terabytes to petabytes
SQL vs. NoSQL
Relational Databases (SQL)
Normalized and relational
Consolidates Data
Planning, problem solving
Ad hoc queries
Data analysis and interrogation
Supports unknown access patterns
Slower because it has to be more flexible
NoSQL
Denormalized and hierarchical
Instantiates views
Scales horizontally
Transaction speed and performance
Doesn't easily support ad-hoc analysis
DynamoDB Essentials
Benefits
Fully managed NoSQL
Running NoSQL at scale is hard. Saves you from managing servers, clusters and shards of traditional NoSQL
Document or key-value store
Designed to be fast at any scale
Low single-digit millisecond latency
AWS IAM access Controls
Suitable to event-driven programming models (aka Serverless)
HIPPA compliance (and other compliances)
Global Tables - Multi-region, multi-master
Drawbacks
Rely on AWS to resolve any internal issues
Not SQL - not well suited to data interrogation and analysis
Need to make specific design choices up-front to use effectively in some situations
DynamoDB specific item size limits
Core Components
Tables
Tables are collections of data
Each table contains items
Items
Groups of attributes that are uniquely identifiable among all other items in a table
Each is made of one or more attributes
Similar to rows or records
Attributes
The fundamental data element of DynamoDB
Similar to fields and columns in other dbs
Example item from a table called 'Music'
{ "Artist": "Anthony James", "SongTitle": "Learning for Dayz", "AlbumTitle": "A Few of My Favorite Things", "Genre": "Punk Rock", "Price": "1.99", "CriticalRating": "8.1" }
Primary Keys
Uniquely identifies each item in the table, so that no two items can have the same key.
Types
Simple primary key (partition key)
Made from one attribute called partition key
In this model - no two items can have the same partition key
Items must have a partition key
Composite primary key (partition and sort key)
Made up of two attributes - the partition key and the sort key
The partition key determines the physical partition space the item is stored in
The sort key determines the order for items with the same partition key
Items can have the same partition OR sort key, but must have unique partition AND sort keys
Items must have both partition and sort key
Only data allowed are string, number or binary.
In the above example, "Artist" and "SongTitle" would be the partition and sort key respectively.
Interacting with DynamoDB
Throughput Capacity
Provisioned throughput is the maximum amount of capacity that an application can consume from a table or index.
When throughput is reached requests get throttled
Can also use DynamoDB auto scaling to avoid throttling
Increase capacity when needed/decrease periodically
Reserved Capacity can be purchased up-front
Cheaper
Requires steadier workloads
Write Capacity (WCUs)
One WCU = One write per second of up to 1KB
5 WCUs = Write up to 5KB/second
10 KB items takes 10 WCUs to write
Read Capacity Unites (RCUs)
One strongly consistent read per second
Or two eventually consistent reads per second
Reads items up to 4KB
80 items 6KB each per second
6KB/4KB = 1.5 or 2 RCU (strongly consistent reads, round up)
80 items * 2 RCU = 160 RCUs total
Read Consistency
Eventually Consistent Reads
Usually reflects changes made within 1-2 seconds
May contain stale data
Takes half the RCUs of Strongly Consistent Reads
Use by default, unless otherwise
Default for GetItem, Query and Scan
Not suited to apps that need reads guaranteed to reflect most-recent writes
Strongly Consistent Reads
Reflects all writes up to the time of the read
Takes twice as many RCUs as eventually consistent reads
Must be used explicitly when calling APIs and working with SDKs
GetItem, Query and Scan operations can take a ConsistentRead parameter which enables strongly consistent reads
Can rely on recently made writes being reflected in subsequent API calls
GetItem, Queries, and Scans
GetItem
Gets an item that matches the primary key
Determines the exact storage location and retrieves the item
Highly efficient
Queries
Use on any table or index with a composite primary key
Find items based on primary key values
Can return all items with partition key or return subset based on sort key
Can also use a query filter to filter the results on any attribute after they are read, not just the sort key
Scans
Returns everything in the table
Sometimes needed to perform bulk data exports
Very inefficient
Avoid when possible
Can also filter scans on any attribute but only after they use read capacity
Last updated
Was this helpful?