DynamoDB

DynamoDB is a fully managed NoSQL database. You don't have to manage the underlying applications or infrastructure, all you need to care about is how to store and retrieve your data.

Essentials (from CSA)

Fully-managed NoSQL database service provided by AWS
Similar to MongoDB, but home-grown by AWS
Schemaless, uses key-value store
Specify the required throughput capacity, and DynamoDB does the rest
Fully-managed:
- All provisioning and scaling of underlying hardware managed
- Fully distributed and scales automatically with demand
- Fault tolerant and highly available
  - Fully synchronized data across all AZs within the region of creation
Easily integrates with other AWS services such as Elastic MapReduce
- Can easily move data to a Hadoop Cluster in MapReduce
Popular use cases:
- IOT (storing meta-data)
- Gaming (storing session information, leaderboards)
- Mobile (storing user profiles, personalization)
Only needs one key: partition key
- If partition key is not unique to that table, you can add a sort key

Databases as Services

What Drives Database Evolution?
- What Triggers New Database Tech?
  - Need to handle more data
  - Need to process data more quickly
  - Need to process data less expensively
  - Need to handle data more reliably
- Initial Database Evolutions
  - The human brain
    Limited storage, reliability, speed
  - Writing and paper records
    Limited speed
  - Punch Cards
  - Magnetic Tapes
  - Distributed file systems
    More expensive but faster
- Relational Databases
  - Focused on normalizing and deduplicating data
  - Goals for relational databases
    Reduce the storage footprint
    Ask questions and produce materialized views (SQL)
  - Storage was expensive
  - Processing was cheap
  - Ad hoc queries
- Critical Developments
  - CPU is now expensive
  - Storage is now cheap
  - Data explosion from terabytes to petabytes
SQL vs. NoSQL
- Relational Databases (SQL)
  - Normalized and relational
  - Consolidates Data
  - Planning, problem solving
    Ad hoc queries
    Data analysis and interrogation
  - Supports unknown access patterns
  - Slower because it has to be more flexible
- NoSQL
  - Denormalized and hierarchical
  - Instantiates views
  - Scales horizontally
  - Transaction speed and performance
  - Doesn't easily support ad-hoc analysis

DynamoDB Essentials

Benefits

Fully managed NoSQL
- Running NoSQL at scale is hard. Saves you from managing servers, clusters and shards of traditional NoSQL
Document or key-value store
Designed to be fast at any scale
- Low single-digit millisecond latency
AWS IAM access Controls
Suitable to event-driven programming models (aka Serverless)
HIPPA compliance (and other compliances)
Global Tables - Multi-region, multi-master

Drawbacks

Rely on AWS to resolve any internal issues
Not SQL - not well suited to data interrogation and analysis
Need to make specific design choices up-front to use effectively in some situations
DynamoDB specific item size limits

Core Components

Tables
- Tables are collections of data
- Each table contains items
Items
- Groups of attributes that are uniquely identifiable among all other items in a table
- Each is made of one or more attributes
- Similar to rows or records
Attributes
- The fundamental data element of DynamoDB
- Similar to fields and columns in other dbs
Example item from a table called 'Music'
{ "Artist": "Anthony James", "SongTitle": "Learning for Dayz", "AlbumTitle": "A Few of My Favorite Things", "Genre": "Punk Rock", "Price": "1.99", "CriticalRating": "8.1" }
Primary Keys
- Uniquely identifies each item in the table, so that no two items can have the same key.
- Types
  - Simple primary key (partition key)
    Made from one attribute called partition key
    In this model - no two items can have the same partition key
    Items must have a partition key
  - Composite primary key (partition and sort key)
    Made up of two attributes - the partition key and the sort key
    The partition key determines the physical partition space the item is stored in
    The sort key determines the order for items with the same partition key
    Items can have the same partition OR sort key, but must have unique partition AND sort keys
    Items must have both partition and sort key
  - Only data allowed are string, number or binary.
  - In the above example, "Artist" and "SongTitle" would be the partition and sort key respectively.

Interacting with DynamoDB

Throughput Capacity
- Provisioned throughput is the maximum amount of capacity that an application can consume from a table or index.
- When throughput is reached requests get throttled
- Can also use DynamoDB auto scaling to avoid throttling
  - Increase capacity when needed/decrease periodically
- Reserved Capacity can be purchased up-front
  - Cheaper
  - Requires steadier workloads
Write Capacity (WCUs)
- One WCU = One write per second of up to 1KB
- 5 WCUs = Write up to 5KB/second
- 10 KB items takes 10 WCUs to write
Read Capacity Unites (RCUs)
- One strongly consistent read per second
- Or two eventually consistent reads per second
- Reads items up to 4KB
- 80 items 6KB each per second
  - 6KB/4KB = 1.5 or 2 RCU (strongly consistent reads, round up)
  - 80 items * 2 RCU = 160 RCUs total
Read Consistency
- Eventually Consistent Reads
  - Usually reflects changes made within 1-2 seconds
  - May contain stale data
  - Takes half the RCUs of Strongly Consistent Reads
  - Use by default, unless otherwise
    Default for GetItem, Query and Scan
  - Not suited to apps that need reads guaranteed to reflect most-recent writes
- Strongly Consistent Reads
  - Reflects all writes up to the time of the read
  - Takes twice as many RCUs as eventually consistent reads
  - Must be used explicitly when calling APIs and working with SDKs
    GetItem, Query and Scan operations can take a ConsistentRead parameter which enables strongly consistent reads
  - Can rely on recently made writes being reflected in subsequent API calls
GetItem, Queries, and Scans
- GetItem
  - Gets an item that matches the primary key
  - Determines the exact storage location and retrieves the item
  - Highly efficient
- Queries
  - Use on any table or index with a composite primary key
  - Find items based on primary key values
  - Can return all items with partition key or return subset based on sort key
  - Can also use a query filter to filter the results on any attribute after they are read, not just the sort key
- Scans
  - Returns everything in the table
  - Sometimes needed to perform bulk data exports
  - Very inefficient
  - Avoid when possible
  - Can also filter scans on any attribute but only after they use read capacity

PreviousRDS NextElastiCache

Last updated 6 years ago

Was this helpful?