Databases - DynamoDB

  • AWS proprietary technology, managed NoSQL database
  • Serverless, provisioned capacity, auto scaling, on demand capacity
  • Can replace ElastiCache as a key/value store
    • Advantage is that it’s serverless e.g. no provisioning & maintenance
    • Not sub millisecond (as ElasticSearch) performance but single digit (1-9 millisecond) performance.
  • Availability
    • Multi AZ in 3 data centers by default
    • Highly available
    • Backup and Restore: Point in time restore like RDS without performance impact
      • ❗ On restored table, you need to manually setup the following: ASG, IAM, CloudWatch metrics/alarm, tags, stream & TTL settings.
  • Read & Writes are decoupled: you can provision read and write capacity units
  • Consistency
    • Reads can be eventually consistent or strongly consistent
      • Eventually consistent: accept stale data
      • Strongly consistent: get always newest data
    • (Optionally) Transactions for writes
      • All or nothing type of operations
      • Coordinated Insert, Update & Delete across multiple tables
      • ❗ Include up to 10 unique items or up to 4 MB of data
  • Monitoring through CloudWatch
  • ❗ Can only query on primary key, sort key or indexes
  • 💡 Use cases:
    • Serverless applications development (small documents 100s KB)
    • Distributed serverless cache
    • Doesn’t have SQL query language available
    • Has transactions capability
  • Amazon DMS (Database Migration Service) can be used to migrate to DynamoDB from Mongo, Oracle, MySQL, S3, etc…)
  • You can launch a local DynamoDB on your computer for development purposes.
  • 💡 Best Practices
    • Keep item sizes small
    • If you are storing serial data in DynamoDB that will require actions based on data/time use separate tables for days, weeks, months
    • Store more frequently and less frequently accessed data in separate tables
      • Allows you to set appropriate provisioning levels independently for the tables
    • If possible compress larger attribute values
    • Store objects larger than 400KB in S3 and use pointers (S3 Object ID) in DynamoDB

Security

  • VPC Endpoints available to access DynamoDB without internet
    • VPC Endpoint = an endpoint network interface (ENI)
  • Access fully controlled by IAM
  • Encryption at rest using KMS
  • Encryption in transit using SSL / TTLS

Scaling options

  1. On-demand
    • Auto scales by Application Auto Scaling
      • Scaling policy
        • Whether to scale read or/and write capacity.
        • Minimum and maximum provisioned capacity unit
        • Supports setting target utilization with target tracking.
    • 2.5x more expensive than provisioned capacity (use with care!)
  2. Provisioned RCU (read compute unit) & WCU (write compute unit)
    • They’re decoupled can be increased / decreased individually
    • 💡 Allows you to control costs for predictable workloads.
    • 1 RCU
      • 1 strongly consistent read of 4 KB per second
      • 2 eventually consistent read of 4 KB per second
    • 1 WCU = 1 write of 1 KB per second
    • Throughput can be exceeded temporarily using burst credit
      • If burst credit are empty, you’ll get ProvisionedThroughputException.
      • It’s then advised to do an exponential back-off retry

DynamoDB Tables

  • DynamoDB is made of tables
    • You don’t create database, database is available and you just create tables
  • Each table has a primary key (must be decided at creation time)
    • Primary key values are unique identifiers across all partitions
  • Each table have optional secondary indexes.
    • Efficient access to data with attributes other than the primary key
    • Global secondary index
      • = Partition key (optionally + sort key) that can be different from those on the base table.
      • Index can span all of the data in the base table, across all partitions
      • No size limitations: Has its own provisioned throughput settings for read + write separate from table.
    • Local secondary index
      • = Partition key same as the base table + different sort key.
        • Sort key e.g. timestamp to query for time ranges.
      • Every partition of a local secondary index is scoped to a base table partition that has the same partition key value.
      • ❗ The total size of indexed items for any one partition key value can’t exceed 10 GB.
    • 💡 All indexes requires unique partition key and a sort key
      • ❗ So A limitation is e.g. when you need to perform an indexed lookup of records by time across multiple primary keys:
        • DynamoDB might not be the ideal service for you to use
        • You might need to utilize a separate table (either in DynamoDB or a relational store) to store item metadata that you can perform an indexed lookup against.
  • Each table can have an infinite number of items (=rows)
  • Each item has attributes (can be added over time - can be null)
  • ❗ Maximum size of an item is 400 KB
  • Data types supported are:
    • Scalar types: string, number, binary, boolean, null
    • Document types: list, map
    • Set types: string set, number set, binary set

DynamoDB Streams

  • Changes in DynamoDB (Create, Update, Delete) can end up in a DynamoDB Stream
    • Generates changelogs for every operation
  • Enables driven programming by integrating to AWS Lambda.
    • ❗ Must enable it to be able trigger a lambda.
  • Can implement cross region replication using Streams
    • Streams enable DynamoDB to get a changelog and use that changelog to replicate data across regions
  • Stream has 24 hours of data retention

DAX: DynamoDB Accelerator

  • Seamless cache for DynamoDB, no application rewrite
    • Application talks directly to DynamoDB accelerator with same connection string
  • Writes goes through DAX to DynamoDB
  • Micro second latency for cached reads & queries
  • 📝 Solves Hot Key problem (too many reads)
  • 5 minutes TTL for cache by default
  • ❗ Up to 10 nodes in the cluster
  • Multi AZ (3 nodes minimum recommended for production)
  • Security with encryption at rest with KMS, VPC integration, IAM, CloudTrail…

Global Tables

  • Global service with region specific tables
  • ❗ You must enable first Dynamo Streams
  • Global Tables are multi region, fully replicated, high performance
  • Multi-Master
    • Create multiple read-write instances per region
    • All changes are replicated to all tables.
    • Allows write-scaling.
  • If data is updated from multiple regions
    • last writer wins to ensure eventual consistency

Licenses and Attributions


Speak Your Mind

-->