October 26, 2024
DynamoDB is a NoSQL database technology created by Amazon that is renowned for its high performance. As opposed to a Relational Database, NoSQL databases are not structured using tables and relations, but instead, data is stored using unique keys. This allows for data to be stored in the form of a JSON document and retrieved quickly by looking up the key. DynamoDB is utilized in a variety of applications such as mobile applications, gaming, ad technology, and other applications that require a fast data layer.
DynamoDB can be adjusted to meet your read and write capacity needs in Provisioned Capacity mode, or you can use On-Demand mode, which requires little to no capacity planning.
DynamoDB supports the following data types:
- Number (including both integer and floating point)
- String
- Binary
- Boolean
- Null
- String Set [“Steph”, “Klay”, “Kevin”],
- Number Set [23, 12, -3, 34.3]
- Binary Set [“uKASDB”, “ASDDSFFF”]
Binaries must be encoded using base64 before being sent to DynamoDB. A binary set store's unique binary attributes. Binary sets are useful for representing a collection of unique binary values, such as images, documents, or any other binary data. Binary sets allow developers to store and retrieve binary data in an efficient and organized manner.
- List (DynamoDB List data type is a data type that allows users to store an ordered collection of items, similar to an array.)
- Map (DynamoDB Map data type is an unordered collection of key-value pairs that can be used to store and retrieve data)
Partition Key, also known as HASH, is the primary key for identifying items in a table. This partition key is used as input to an internal hash function which determines in which physical partition the item will be stored. All items in a table with a partition key must have unique values for the partition key, as two items cannot have the same partition key value.
Partition Key and Sort Key, also referred to as a composite primary key, are composed of two attributes. The partition key is used as an input to an internal hash function which determines the physical storage of the item within DynamoDB. All items with the same partition key are stored together in sorted order by their sort key value.
It is possible for two items to have the same partition key value, but they must have different sort key values. In a table with a composite primary key, you can access any item directly by providing the respective partition and sort key values.
A composite primary key provides additional flexibility when querying data. For example, if you supply only the partition key value, DynamoDB will retrieve all items with that key. You can also provide a value for the partition key and a range of values for the sort key to retrieve a subset of items with the same partition key. For example, a table of movies may have a composite primary key composed of the Producer and Title. You can access any movie in the table directly by providing the Producer and Title values for that item. You can also use the Producer value and a range of values for Title to retrieve a subset of movies by that author.
There are two types of Indexes used in DynamoDB: Local Secondary Index (LSI) and Global Secondary Index (GSI)
- Supports strongly or eventual consistency reads.
- Can only be created together with the base table and cannot be modified or deleted unless also deleting the table.
- Only Composite.
- Maximum of 10GB per partition.
- Shares capacity units with the base table.
- Must have the same Partition Key (PK) as the base table.
- Offers only eventual consistency reads, but can create, modify, or delete at any time.
- Supports both Simple and Composite keys.
- Can have any attribute such as Primary Key (PK) or Secondary Key (SK).
- No size restriction per partition.
- Has its own capacity settings, not shared with the base table
DynamoDB can be configured to provide either Eventually Consistent Reads (default) or Strongly Consistent Reads on a per-call basis.
Eventually Consistent Reads may not be consistent, but it will be available immediately. Generally, data copies should become consistent within one second.
Strongly Consistent Reads guarantee that any read operation will always return the most up-to-date version of the data, as it is always read from the leader partition. This ensures that data is never inconsistent, although latency may be higher than with other read methods. Data consistency is guaranteed within 1 second.
DynamoDB has two capacity modes, Provisioned and On-Demand. You can switch between these modes once every 24 hours.
Provisioned Throughput Capacity is the allocated capacity that your application is able to read or write from a table or index per second. It is best for applications that have predictable or steady traffic and is measured in Read Capacity Units (RCUs) and Write Capacity Units (WCUs).
Auto Scaling with Provisioned capacity mode should be enabled. This setting allows you to set a minimum and maximum capacity for your DynamoDB table. DynamoDB will automatically adjust the capacity between these values, and will throttle calls that exceed the maximum capacity for an extended period of time.
If you make requests that exceed the capacity that has been provisioned for you, you will get an Exception: ProvisionedThroughputExceededException (throttling). Throttling occurs when requests are blocked because the frequency of reads or writes is higher than the thresholds that have been set. For example, this can happen if you exceed the provisioned capacity, if partitions are splitting, or if there is a mismatch between the capacity of a table or index.
On-Demand Capacity is the pay-per-request service, which is especially suited for new or unpredictable workloads. The throughput is limited only by the default upper limits for a table, up to 40K RCUs and 40K WCUs. However, if the maximum throughput exceeds double the previous peak capacity within 30 minutes, throttling may occur. Therefore, it is important to note that On-Demand can become costly under certain circumstances.
A read capacity unit is equivalent to one strongly consistent read per second or two eventually consistent reads per second for an item of up to 4 KB in size.
How to calculate RCUs for strongly consistent reads
- Round data up to the nearest 4
- Divide data by 4
- Multiply the result by the number of reads
Example:
- 20 reads at 40KB per item. (40/4) x 20 = 200 RCUs
- 5 reads at 6KB per item. (8/4) x 5= 10 RCUs
How to calculate RCUs for eventually consistent reads
- Round data up to the nearest 4
- Divide data by 4
- Multiply by the number of reads
- Divide the final number by 2
- Round up to the nearest whole number
Example:
- 20 reads at 40KB per item. ( (40/4) x 20 ) / 2 = 100 RCUs
- 15 reads at 10KB per item. ( (12/4) x 15 ) / 2 = 23 RCUs
A write capacity unit is equivalent to one write operation per second, for an item that is up to 1 KB in size.
How to calculate Writes
- Round data up to the nearest whole number.
- Multiply by the number of writes
Example:
- 20 writes at 40KB per item. 40 x 20 = 800 WCUs
- 78 writes at 1KB per item. 1 x 78 = 78 WCUs
- 34 writes at 500 BYTES per item. 1 x 34 = 34 WCUs
DynamoDB automatically splits large tables into smaller chunks of data called Partitions to improve read speeds. Partitioning occurs when the table exceeds 10GB of data or when the table exceeds 3000 Read Capacity Units, or 1000 Write Capacity Units. In addition, DynamoDB may split a partition if it detects a hot partition issue to try and evenly distribute the RCUs and WCUs across the Partitions.
DynamoDB streams provide a log of changes to a table, similar to a transaction log.
A DynamoDB stream is a sequence of changes made to items in an Amazon DynamoDB table. It can be activated to keep track of any modifications to data items in the table.
Your application must connect to a DynamoDB Streams endpoint and issue API requests in order to read and process a stream. Stream records are organized into shards which act as a container for multiple records. Shards are ephemeral and can split into multiple new shards automatically. When a stream is disabled, any open shards will be closed and the data will remain readable for 24 hours. It is important to process the parent shard before the child shard to ensure the stream records are in the correct order. The DynamoDB Streams Kinesis Adapter can handle this automatically.
This feature is useful in a range of scenarios, such as when sending welcome messages to new customers or when updating messages or pictures in a group chat. An endpoint must be maintained for DynamoDB and DynamoDB Streams to ensure data is kept up-to-date.
DAX is a fully managed, in-memory caching system for DynamoDB that runs in a cluster to provide write-through caching.
- Reads are eventually consistent.
- Requests to the cluster are evenly distributed among its nodes.
- DAX can dramatically reduce read response times to microseconds
- Applications that need the quickest response times.
- Applications that read items regularly
- Apps that are read-intensive.
- Applications that require strongly consistent reads.
- Applications that do not need microsecond read response times
- Write-intensive applications or those with minimal read activity
- If you don’t need DAX, consider ElastiCache as an alternative
DynamoDB is a NoSQL database technology created by Amazon for high-performance applications. It replicates data across three Availability Zones and stores three copies of the data using solid-state drives (SSDs). It is infinitely scalable and supports a variety of data types. DynamoDB offers two capacity modes, Provisioned and On-Demand, and has two types of indexes, Local Secondary Index (LSI) and Global Secondary Index (GSI). Additionally, DynamoDB Streams and DynamoDB Accelerator (DAX) are available for specific workloads.