DynamoDB

DynamoDB is a fully managed NoSQL database. You don't have to manage the underlying applications or infrastructure, all you need to care about is how to store and retrieve your data.

Essentials (from CSA)

  • Fully-managed NoSQL database service provided by AWS

  • Similar to MongoDB, but home-grown by AWS

  • Schemaless, uses key-value store

  • Specify the required throughput capacity, and DynamoDB does the rest

  • Fully-managed:

    • All provisioning and scaling of underlying hardware managed

    • Fully distributed and scales automatically with demand

    • Fault tolerant and highly available

      • Fully synchronized data across all AZs within the region of creation

  • Easily integrates with other AWS services such as Elastic MapReduce

    • Can easily move data to a Hadoop Cluster in MapReduce

  • Popular use cases:

    • IOT (storing meta-data)

    • Gaming (storing session information, leaderboards)

    • Mobile (storing user profiles, personalization)

  • Only needs one key: partition key

    • If partition key is not unique to that table, you can add a sort key

Databases as Services

  • What Drives Database Evolution?

    • What Triggers New Database Tech?

      • Need to handle more data

      • Need to process data more quickly

      • Need to process data less expensively

      • Need to handle data more reliably

    • Initial Database Evolutions

      • The human brain

        • Limited storage, reliability, speed

      • Writing and paper records

        • Limited speed

      • Punch Cards

      • Magnetic Tapes

      • Distributed file systems

        • More expensive but faster

    • Relational Databases

      • Focused on normalizing and deduplicating data

      • Goals for relational databases

        • Reduce the storage footprint

        • Ask questions and produce materialized views (SQL)

      • Storage was expensive

      • Processing was cheap

      • Ad hoc queries

    • Critical Developments

      • CPU is now expensive

      • Storage is now cheap

      • Data explosion from terabytes to petabytes

  • SQL vs. NoSQL

    • Relational Databases (SQL)

      • Normalized and relational

      • Consolidates Data

      • Planning, problem solving

        • Ad hoc queries

        • Data analysis and interrogation

      • Supports unknown access patterns

      • Slower because it has to be more flexible

    • NoSQL

      • Denormalized and hierarchical

      • Instantiates views

      • Scales horizontally

      • Transaction speed and performance

      • Doesn't easily support ad-hoc analysis

DynamoDB Essentials

Benefits

  • Fully managed NoSQL

    • Running NoSQL at scale is hard. Saves you from managing servers, clusters and shards of traditional NoSQL

  • Document or key-value store

  • Designed to be fast at any scale

    • Low single-digit millisecond latency

  • AWS IAM access Controls

  • Suitable to event-driven programming models (aka Serverless)

  • HIPPA compliance (and other compliances)

  • Global Tables - Multi-region, multi-master

Drawbacks

  • Rely on AWS to resolve any internal issues

  • Not SQL - not well suited to data interrogation and analysis

  • Need to make specific design choices up-front to use effectively in some situations

  • DynamoDB specific item size limits

Core Components

  • Tables

    • Tables are collections of data

    • Each table contains items

  • Items

    • Groups of attributes that are uniquely identifiable among all other items in a table

    • Each is made of one or more attributes

    • Similar to rows or records

  • Attributes

    • The fundamental data element of DynamoDB

    • Similar to fields and columns in other dbs

    Example item from a table called 'Music'

    { "Artist": "Anthony James", "SongTitle": "Learning for Dayz", "AlbumTitle": "A Few of My Favorite Things", "Genre": "Punk Rock", "Price": "1.99", "CriticalRating": "8.1" }

  • Primary Keys

    • Uniquely identifies each item in the table, so that no two items can have the same key.

    • Types

      • Simple primary key (partition key)

        • Made from one attribute called partition key

        • In this model - no two items can have the same partition key

        • Items must have a partition key

      • Composite primary key (partition and sort key)

        • Made up of two attributes - the partition key and the sort key

        • The partition key determines the physical partition space the item is stored in

        • The sort key determines the order for items with the same partition key

        • Items can have the same partition OR sort key, but must have unique partition AND sort keys

        • Items must have both partition and sort key

      • Only data allowed are string, number or binary.

      • In the above example, "Artist" and "SongTitle" would be the partition and sort key respectively.

Interacting with DynamoDB

  • Throughput Capacity

    • Provisioned throughput is the maximum amount of capacity that an application can consume from a table or index.

    • When throughput is reached requests get throttled

    • Can also use DynamoDB auto scaling to avoid throttling

      • Increase capacity when needed/decrease periodically

    • Reserved Capacity can be purchased up-front

      • Cheaper

      • Requires steadier workloads

  • Write Capacity (WCUs)

    • One WCU = One write per second of up to 1KB

    • 5 WCUs = Write up to 5KB/second

    • 10 KB items takes 10 WCUs to write

  • Read Capacity Unites (RCUs)

    • One strongly consistent read per second

    • Or two eventually consistent reads per second

    • Reads items up to 4KB

    • 80 items 6KB each per second

      • 6KB/4KB = 1.5 or 2 RCU (strongly consistent reads, round up)

      • 80 items * 2 RCU = 160 RCUs total

  • Read Consistency

    • Eventually Consistent Reads

      • Usually reflects changes made within 1-2 seconds

      • May contain stale data

      • Takes half the RCUs of Strongly Consistent Reads

      • Use by default, unless otherwise

        • Default for GetItem, Query and Scan

      • Not suited to apps that need reads guaranteed to reflect most-recent writes

    • Strongly Consistent Reads

      • Reflects all writes up to the time of the read

      • Takes twice as many RCUs as eventually consistent reads

      • Must be used explicitly when calling APIs and working with SDKs

        • GetItem, Query and Scan operations can take a ConsistentRead parameter which enables strongly consistent reads

      • Can rely on recently made writes being reflected in subsequent API calls

  • GetItem, Queries, and Scans

    • GetItem

      • Gets an item that matches the primary key

      • Determines the exact storage location and retrieves the item

      • Highly efficient

    • Queries

      • Use on any table or index with a composite primary key

      • Find items based on primary key values

      • Can return all items with partition key or return subset based on sort key

      • Can also use a query filter to filter the results on any attribute after they are read, not just the sort key

    • Scans

      • Returns everything in the table

      • Sometimes needed to perform bulk data exports

      • Very inefficient

      • Avoid when possible

      • Can also filter scans on any attribute but only after they use read capacity

Last updated

Was this helpful?