Huzaifa Asif - AWS DynamoDB Fundamentals

October 26, 2024

1. Introduction

DynamoDB is a NoSQL database technology created by Amazon that is renowned for its high performance. As opposed to a Relational Database, NoSQL databases are not structured using tables and relations, but instead, data is stored using unique keys. This allows for data to be stored in the form of a JSON document and retrieved quickly by looking up the key. DynamoDB is utilized in a variety of applications such as mobile applications, gaming, ad technology, and other applications that require a fast data layer.

DynamoDB can be adjusted to meet your read and write capacity needs in Provisioned Capacity mode, or you can use On-Demand mode, which requires little to no capacity planning.

‍

2. List of DynamoDB Key Features

DynamoDB replicates data across three Availability Zones in a given region, utilizing solid-state drives (SSDs) to store three copies of the data.
Tables are composed of rows which are referred to as Items, and each Item is made up of Attributes which are displayed as columns.
A system that can be infinitely scaled for read-write input/output operations, optimally utilizing IOPS-enabled solid-state drives.
Data stored in DynamoDB can be securely backed up to Amazon S3 for long-term storage.
Integration of Amazon Machine Learning with other AWS services such as Elastic MapReduce (Amazon EMR), Data Pipeline, and Amazon Kinesis is possible. This provides users with the ability to take advantage of the scalability and performance of these services to process and manage data quickly and efficiently.
The pricing model for Amazon DynamoDB is pay-per-use, meaning that customers will only pay for the hardware and services they actually use, rather than paying for resources they don’t need.
Security and access control can be managed using the AWS Identity and Access Management (IAM) service. This service helps to create and manage users, groups, and permissions to control access to AWS resources.
For those looking for enterprise-grade features, a robust SLA, monitoring tools, and a private VPN are essential components. These components are designed to enable businesses to run smoothly and securely.

‍

3. DynamoDB Data Types

DynamoDB supports the following data types:

‍

3.1. Scaler

- Number (including both integer and floating point)

- String

- Binary

- Boolean

- Null

‍

3.2. Multi-valued

- String Set [“Steph”, “Klay”, “Kevin”],

- Number Set [23, 12, -3, 34.3]

- Binary Set [“uKASDB”, “ASDDSFFF”]
Binaries must be encoded using base64 before being sent to DynamoDB. A binary set store's unique binary attributes. Binary sets are useful for representing a collection of unique binary values, such as images, documents, or any other binary data. Binary sets allow developers to store and retrieve binary data in an efficient and organized manner.

‍

3.3. Document

- List (DynamoDB List data type is a data type that allows users to store an ordered collection of items, similar to an array.)

- Map (DynamoDB Map data type is an unordered collection of key-value pairs that can be used to store and retrieve data)

‍

4. DynamoDB Table Structure

Partition Key, also known as HASH, is the primary key for identifying items in a table. This partition key is used as input to an internal hash function which determines in which physical partition the item will be stored. All items in a table with a partition key must have unique values for the partition key, as two items cannot have the same partition key value.

Partition Key and Sort Key, also referred to as a composite primary key, are composed of two attributes. The partition key is used as an input to an internal hash function which determines the physical storage of the item within DynamoDB. All items with the same partition key are stored together in sorted order by their sort key value.

It is possible for two items to have the same partition key value, but they must have different sort key values. In a table with a composite primary key, you can access any item directly by providing the respective partition and sort key values.

A composite primary key provides additional flexibility when querying data. For example, if you supply only the partition key value, DynamoDB will retrieve all items with that key. You can also provide a value for the partition key and a range of values for the sort key to retrieve a subset of items with the same partition key. For example, a table of movies may have a composite primary key composed of the Producer and Title. You can access any movie in the table directly by providing the Producer and Title values for that item. You can also use the Producer value and a range of values for Title to retrieve a subset of movies by that author.

‍

5. Secondary Indexes

There are two types of Indexes used in DynamoDB: Local Secondary Index (LSI) and Global Secondary Index (GSI)

‍

5.1. LSI (Local Secondary index)

- Supports strongly or eventual consistency reads.

- Can only be created together with the base table and cannot be modified or deleted unless also deleting the table.

- Only Composite.

- Maximum of 10GB per partition.

- Shares capacity units with the base table.

- Must have the same Partition Key (PK) as the base table.

‍

5.2. GSI (Global Secondary Index)

- Offers only eventual consistency reads, but can create, modify, or delete at any time.

- Supports both Simple and Composite keys.

- Can have any attribute such as Primary Key (PK) or Secondary Key (SK).

- No size restriction per partition.

- Has its own capacity settings, not shared with the base table

‍

6. DynamoDB Reads and Writes Consistency

DynamoDB can be configured to provide either Eventually Consistent Reads (default) or Strongly Consistent Reads on a per-call basis.

Eventually Consistent Reads may not be consistent, but it will be available immediately. Generally, data copies should become consistent within one second.

Strongly Consistent Reads guarantee that any read operation will always return the most up-to-date version of the data, as it is always read from the leader partition. This ensures that data is never inconsistent, although latency may be higher than with other read methods. Data consistency is guaranteed within 1 second.

‍

7. Capacity Modes

DynamoDB has two capacity modes, Provisioned and On-Demand. You can switch between these modes once every 24 hours.

‍

7.1. Provisioned

Provisioned Throughput Capacity is the allocated capacity that your application is able to read or write from a table or index per second. It is best for applications that have predictable or steady traffic and is measured in Read Capacity Units (RCUs) and Write Capacity Units (WCUs).

Auto Scaling with Provisioned capacity mode should be enabled. This setting allows you to set a minimum and maximum capacity for your DynamoDB table. DynamoDB will automatically adjust the capacity between these values, and will throttle calls that exceed the maximum capacity for an extended period of time.

If you make requests that exceed the capacity that has been provisioned for you, you will get an Exception: ProvisionedThroughputExceededException (throttling). Throttling occurs when requests are blocked because the frequency of reads or writes is higher than the thresholds that have been set. For example, this can happen if you exceed the provisioned capacity, if partitions are splitting, or if there is a mismatch between the capacity of a table or index.

‍

7.2. On-Demand

On-Demand Capacity is the pay-per-request service, which is especially suited for new or unpredictable workloads. The throughput is limited only by the default upper limits for a table, up to 40K RCUs and 40K WCUs. However, if the maximum throughput exceeds double the previous peak capacity within 30 minutes, throttling may occur. Therefore, it is important to note that On-Demand can become costly under certain circumstances.

‍

8. Calculating Read and Write Capacity Units

‍

8.1. Calculating Read Capacity Unit (RCU):

A read capacity unit is equivalent to one strongly consistent read per second or two eventually consistent reads per second for an item of up to 4 KB in size.

How to calculate RCUs for strongly consistent reads

- Round data up to the nearest 4

- Divide data by 4

- Multiply the result by the number of reads

Example:

- 20 reads at 40KB per item. (40/4) x 20 = 200 RCUs

- 5 reads at 6KB per item. (8/4) x 5= 10 RCUs

‍

How to calculate RCUs for eventually consistent reads

- Round data up to the nearest 4

- Divide data by 4

- Multiply by the number of reads

- Divide the final number by 2

- Round up to the nearest whole number

Example:

- 20 reads at 40KB per item. ( (40/4) x 20 ) / 2 = 100 RCUs

- 15 reads at 10KB per item. ( (12/4) x 15 ) / 2 = 23 RCUs

‍

8.2. Calculating Write Capacity Unit (WCU):

A write capacity unit is equivalent to one write operation per second, for an item that is up to 1 KB in size.

How to calculate Writes

- Round data up to the nearest whole number.

- Multiply by the number of writes

Example:

- 20 writes at 40KB per item. 40 x 20 = 800 WCUs

- 78 writes at 1KB per item. 1 x 78 = 78 WCUs

- 34 writes at 500 BYTES per item. 1 x 34 = 34 WCUs

‍

9. DynamoDB Partitions

DynamoDB automatically splits large tables into smaller chunks of data called Partitions to improve read speeds. Partitioning occurs when the table exceeds 10GB of data or when the table exceeds 3000 Read Capacity Units, or 1000 Write Capacity Units. In addition, DynamoDB may split a partition if it detects a hot partition issue to try and evenly distribute the RCUs and WCUs across the Partitions.

‍

9.1. Block Diagram For Partition

‍

9.2. Partition Formula

‍

10. DynamoDB Streams

DynamoDB streams provide a log of changes to a table, similar to a transaction log.

A DynamoDB stream is a sequence of changes made to items in an Amazon DynamoDB table. It can be activated to keep track of any modifications to data items in the table.

Your application must connect to a DynamoDB Streams endpoint and issue API requests in order to read and process a stream. Stream records are organized into shards which act as a container for multiple records. Shards are ephemeral and can split into multiple new shards automatically. When a stream is disabled, any open shards will be closed and the data will remain readable for 24 hours. It is important to process the parent shard before the child shard to ensure the stream records are in the correct order. The DynamoDB Streams Kinesis Adapter can handle this automatically.

This feature is useful in a range of scenarios, such as when sending welcome messages to new customers or when updating messages or pictures in a group chat. An endpoint must be maintained for DynamoDB and DynamoDB Streams to ensure data is kept up-to-date.

‍

11. DynamoDB Accelerator

DAX is a fully managed, in-memory caching system for DynamoDB that runs in a cluster to provide write-through caching.

- Reads are eventually consistent.

- Requests to the cluster are evenly distributed among its nodes.

- DAX can dramatically reduce read response times to microseconds

‍

11.1. When to use DAX

- Applications that need the quickest response times.

- Applications that read items regularly

- Apps that are read-intensive.

‍

11.2. When Not to Use DAX

- Applications that require strongly consistent reads.

- Applications that do not need microsecond read response times

- Write-intensive applications or those with minimal read activity

- If you don’t need DAX, consider ElastiCache as an alternative

‍

12. Summary

DynamoDB is a NoSQL database technology created by Amazon for high-performance applications. It replicates data across three Availability Zones and stores three copies of the data using solid-state drives (SSDs). It is infinitely scalable and supports a variety of data types. DynamoDB offers two capacity modes, Provisioned and On-Demand, and has two types of indexes, Local Secondary Index (LSI) and Global Secondary Index (GSI). Additionally, DynamoDB Streams and DynamoDB Accelerator (DAX) are available for specific workloads.

‍