Amazon Dynamo DB is a fully managed NoSQL database solution that provides enterprise-grade performance and scalability. You can create database tables that store and retrieve any amount of data. Dynamo DB Instances are stored on SSD Storage and automatically replication across multiple Availability Zones. Specifically, Dynamo DB will replicate across 3 separate datacenters. This article, Amazon DynamoDB Exam Tips will help you prepare for the AWS Certified Developer Associate Exam and also the AWS Certified Solutions Architect Associate exam.
Dynamo DB will also distribute traffic and data for a table over multiple partitions. You need to specify the read and write capacity and DynamoDB will provide the necessary infrastructure to support the required throughput levels. You can further adjust the read and write capacity after tables have been created.
Dynamo DB Components
DynamoDB comprises of core basic components that include Tables, Items and Attributes. A Table is a collection of Items, each of which is a collection of one or more attributes. DynamoDB uses primary keys to uniquely identify each item in a table and secondary indexes to provide more querying flexibility.
- Tables – DynamoDB stores data in tables which are a collection of data; for example Students Table to store student information data
- Items – Each table contains multiple items, where an item is a group of attributes that uniquely identifies it when compared to other items. Items can be considered similar to rows or records in a relational database system
- Attributes – Each item is composed of one or more attributes. Attributes are similar to fields or columns. For example, an item in the Students table could be a student’s record, that has attributes including StudentID, FirstName, LastName etc.
Important Note: Whereas in a Relational Databases System, you must pre-define the table name, primary key, a list of columns, the data type for each column, etc., in DynamoDB, you only need to ensure that the table has a primary key. Individual items in DynamoBD can have any number of attributes but you are limited to each item not exceeding 400KB
When reading an item from DynamoDB, the response may not reflect the results of recently completed writes. This is because DynamoDB maintains multiple copies of the data across multiple availability zones and so it offers eventual consistent reads which are cheaper than strongly consistent reads.
- Eventual Consistent Reads – With Event Consistent Read operations, the response might not reflect the results of a recently completed write operation. The response might include some stale data. However, generally, the lag is no more than one second. If you repeat your read request after a short time, the response should return the latest data.
- Strongly Consistent Reads – You can request strongly consistent reads and DynamoDB will respond with the most up-to-date data, reflecting the updates from all prior write operations that were successful.
If you require your data read to deliver the latest updates always and your application cannot afford a lag, then you would need to opt for the strongly consistent reads. However also note that strongly consistent read does not offer the best read performance when compared to eventually consistent reads.
Unlike traditional relational databases, where you need to specify the columns, their names as well as data types that will be contained in them, DynamoDB only requires specifying a Primary Key field to start with. You do not need to specify all attributes ahead of time of an item; you can add columns on the fly. This gives you the flexibility to expand the schema as required over time.
When creating a table or secondary index, you must specify the data type of the primary key (partition key and sort key). There are three categories of data types:
- Scalar – Represents one value and the following five scalar types are
- String – up to a maximum of 400KB
- Number – positive or negative up to 38 digits
- Binary – up to 400KB in size
- Set Data Types – These are unique lists of one or more Scalar Value. Each value is unique in a set and must be of the same data type. There is no guarantee of order:
- String Set – Unique list of string attributes
- Number Set – Unique list of number attributes
- Binary Set – Unique list of Binary attributes
- Document Data Types – used to represent multiple nested attributes and is like a JSON file in structure. Data types can be nested within each other up to 32 levels deep 35. You can have Lists and Map
- List – used to store an ordered list of attributes of different data types
- Map used to store unordered list of key/value pairs
When you create a table, you need to specify a Primary key which uniquely identifies every item in a database. DynamoDB support two types of private keys:
Partition Key – The primary key is defined with a single attribute and is known as the partition key. DynamoDB uses the partition key’s value to build an unordered hash index which is used to identify the partition in the which the item will be stored.
- Note that if you are only using only a partition key, you cannot have two items on the same table using the same partition key
Partition Key and Sort Key – This is known as a composite primary key and is made up of two attributes, namely the primary (partition) key and the sort (range) key. You can uniquely identify and item if you provide both the partition key and sort key. Note that you can have multiple items with the same partition key if they have different sort keys.
To illustrate the use of composite keys, consider an example where you have online collaboration tools or team chat system. A user will have a primary key (partition key) and he/she should be able to post multiple messages with this primary key to uniquely identify himself/herself.
Since the user can post multiple messages, you want to ensure that each record, however, is unique and if you were to simply use the partition key alone, you would encounter issues. The use of composite keys thus makes sense in this case, where multiple records can have the same partition key, but then you have a sort key which is different for every record. In this example, the sort key could be the timestamp of when the message was posted.
The other key point to note here is that all items with the same partition key are grouped together in order by the sort key value. This adds a level of efficiency.
- The partition key of an item is also known as its hash attribute. The term is because DynamoDB uses an internal hash function to evenly distribute data items across partitions, based on their partition key values.
- The sort key of an item is known as a range attribute. DynamoDB uses sort keys to stores items with the same partition key physically close together, in sorted order by the sort key value.
Amazon DynamoDB enables you to query the data in a table using an optionally defined alternative key know as a Secondary Index. There are two types of indexes:
Local Secondary Index – This is an index that has the same partition key as the table, but a different sort key. These can only be created when the table is created. Furthermore, you cannot modify or delete a Local Secondary Index once created.
In the above example of an online team chat tool, you can have a local secondary index where you have a different sort key. You could then use that the alternative sort key to be the date and time when they Log In to the chat application.
Global Secondary Index – this is an index with a partition key and sort key that can both be different from those on the table. Global secondary indexes can be created or deleted on a table at any time
Secondary indexes enable you to search large tables efficiently rather than use scan operations. It enables you to conduct different query patterns. Note that a table can only have one Local Secondary Index, but multiple Global Indexes. Also, note that if you use secondary indexes, these get updated when an item is modified and consume write capacity units.
Note: You can have up to 5 Local Secondary Indexes and 5 Global Secondary Indexes.
Amazon DynamoDB Streams
Applications can be designed to keep track of recent changes and they perform some action on the changed record sets. For example, social media sites send notification messages of your new post to your friends so that they are made aware of your updates.
This method of streaming data is a feature available on DynamoDB and enables you to get a list of item changes for a 24-hour period.
The stream is essentially ordered flow of information about changes to items in an Amazon DynamoDB table. Once enabled, DynamoDB captures information about every modification to data items in the table.
You can also use streaming to extend the functionally of your application without necessarily modifying the application code. For example, you can use a tool to read the log of changes from the stream and create an additional application or implement Amazon Lambda functions to deliver added functionality.
Additional Key Points to note:
- DynamoDB Streams write a stream record with the primary key attributes of the items that were modified
- Stream records will appear in a sequence to match the actual modification. This is done by assigning a stream record with a sequence number
- DynamoDB writes steam records in near real time and this can help you design applications that need to consume such streams and take action as content changes.
The end application needs to connect to a DynamoDB Streams endpoint and issue API requests to read and process streams.
Each stream consists of stream records. Each stream record represents a single data modification in the DynamoDB table to which the stream belongs.
Stream records are organised into groups, also known as shards. Shards contain multiple stream records, which contains the information required for accessing and iterating through these records. The stream records within a shard are removed automatically after 24 hours.
DynamoDB enables you to write and read from the tables you create a Database. You can create, update and delete individual items. In addition, you can use multiple querying options to search for data in your tables. To ensure high availability and low latency responses, you are required to specify you read and write throughput values when you create a table.
DynamoDB uses this information to reserve sufficient hardware resources and appropriately partitions your data over multiple servers to meet your throughput requirements. When you create a table, you need to specify the following capacity units:
- Read Capacity Units
- Readers are rounded to increments of 4KB in size
- In Eventual Consistent Reads, one read capacity unit is 2 reads per second for items up to 4KB
- For Strongly Consistent Reads, one read capacity unit consist of 1 read per second of up to 4KB in size
- Write Capacity Units
- Number of 1KB writes per second
If you have configured your tables with secondary indexes, DynamoDB will consume additional capacity units. For example, if you wanted to add a single 1 KB item to a table, and that item contained an indexed attribute, you would need 2 write capacity units—one for writing to the table and another for writing to the index
- Remember – One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for items up to 4 KB in size
- Remember – One write capacity unit represents one write per second for items up to 1 KB in size.
Read Capacity Requirements
If you have a table and you want to read 100 items per second with strongly consistent reads and your items are 8KB in size, you would calculate the required provisioned capacity as follows;
- 8KB/4KB = 2 capacity units
- 2 read capacity units per item x 100 reads per second = 200 read capacity units
Note: Eventual Consistent Reads would require 100 read capacity units
Write Capacity Requirements
If you have a table and you want to write 50 items per second and your items are 4KB in size, you would calculate the required provisioned capacity as follows;
- 4KB/1KB = 4 capacity units
- 4 write capacity units per item x 50 writes per second = 200 write capacity units
Important Note: If you exceed your read or write throughput that you have provisioned, you will get 400 HTTP Status Code which essentially states that you have exceeded the maximum allowed provision throughput for a table or one or more global secondary index.
DynamoDB allows customers to purchase reserved capacity, as described at Amazon DynamoDB Pricing. With reserved capacity, you pay a one-time upfront fee and commit to a minimum usage level over a period of either a 1-year or 3-year agreement. By reserving your read and write capacity units ahead of time, you realise significant cost savings compared to on-demand provisioned throughput settings.
You can use a Query or a Scan to search for items in a DynamoDB table. Queries are primary search operations to help you search for items in a table or a secondary index using the primary key attribute values. You need to provide partition key name and value to search for and you can also provide a sort key name and value and use a comparison operator to refine the search results.
- Queries will return all data attributes for items with the specified primary key. You can optionally use the ProjectExpress parameter if you want your query to only return some of the attributes and not all.
- When you run a query, your results will come back sorted by the sort key. By default, the sort order is ascending. You can change the order to descending order by setting the ScanIndexForward parameter to false
A Scan Operation will read every item in a table or secondary index and return all data attribute for every item in the table or index. As tables grow, scan operation slows. Using the ProjectExpression parameter, you can configure your scans to only return some of the attributes that you want rather than all of them
A single Scan request can retrieve a maximum of 1 MB of data; DynamoDB can optionally apply a filter expression to this data, narrowing the results before they are returned to the user. Query Operations are more efficient than Scan Operations.
Key Security features offered with Amazon DynamoDB:
- Granular control over access rights and permissions for users and administrators
- IAM policies to grant access rights and specify allow and deny operations
- Conditions to restrict access to individual items or attributes
- Applications that require read/write access can be granted temporary or permanent access control keys. As best practice, you can use IAM roles associated with EC2 instance to grant necessary rights to applications rather than store keys in configuration files
- For mobile applications, you can use web identity federation and AWS Security Token Service (STS), to issues temporary keys that expire after a short period
Scaling and Portioning
DynamoDB can scale horizontally exceptionally well when compared to other database engines. It does this by using partitions to meet the storage and performance requirements of your applications.
A partition is area storage for a table, stored on solid-state drives (SSDs) and automatically replicated across multiple Availability Zones within an AWS region. Partition management is handled entirely by DynamoDB as backend operations
DynamoDB will store items of a single table across multiple partitions. The decision on which partition to store the data is based on the partition key. DynamoDB will use the partition key to distribute new items across all available partitions and further ensure that items with the same partition key are stored on the same partition
A single partition can hold approximately 10 GB of data and can support a maximum of 3,000 read capacity units or 1,000 write capacity units.
When you create a new table, the initial number of partitions can be expressed as follows:
( readCapacityUnits / 3,000 ) + ( writeCapacityUnits / 1,000 ) = initialPartitions (rounded up)
E.g. If you created a table with 1,000 read capacity units and 250 write capacity units. In this case, the initial number of partitions would be:
( 1,000 / 3,000 ) + ( 250 / 1,000 ) = 0.5833 –> 1
When the number of items in DynamoDB table increases, additional partitions will be added by splitting the existing partition.
- During a split, data is evenly distributed from the old partition to the new partition
- Old partitions provisioned throughput capacity is also split equally among the new partition
A partition split can occur in response to:
- Increased provisioned throughput settings
- Increased storage requirements
DynamoDB provides some flexibility in the per-partition throughput provisioning. When you are not fully utilising a partition’s throughput, DynamoDB retains a portion of your unused capacity for later bursts of throughput usage.
You can use BatchGetItem if your application needs to read multiple items. A single BatchGetItem request can retrieve up to 16 MB of data and contain up to 100 items. In addition, a single BatchGetItem request can retrieve items from multiple tables.
The BatchWriteItem operation lets you put or delete multiple items. BatchWriteItem can write up to 16 MB of data, of up to 25 put or delete requests. The maximum size of an individual item is 400 KB in size. In addition, a singleBatchWriteItem request can put or delete items in multiple tables.
Often in multi-user configuration, there is a danger that multiple users can attempt to modify attribute values of an item at the same time. Each user may not know that the other user is also writing to the same item and hence there is a danger of conflict. If not addressed, valid changes can be overwritten by changes that are incorrect due to lack of information between users.
DynamoDb uses conditional writes for PutItem, DeleteItem and UpdateItem to help address the conflict issue. With conditional writes, an operation will only succeed if the item attributes meet one or more conditions. If not, it will return an error.
As an example, consider an eCommerce application where the price of an item is currently at $10. A conditional write has been configured and two users are about to update the price of the time. The first user updates the price to $15. The operation will succeed because the condition was met, which was that the expected price prior to the update was $10.
The next user makes attempts to update the price to $12. Because his operation was also expecting the price to be $10, it fails given that the price is also set to $15 from the previous update operation. Note that conditional writes are idempotent. You can send the same conditional write request multiple times, but it will have no further effect on the item after the first time DynamoDB performs the specified update. So, in a situation where while you were updating an item, you experienced network outages; you may not know if the update took place or not. If the update already took place, and your price was updated to $15, then further updates will be disregarded as the condition has now changed.
To request a conditional PutItem, DeleteItem, or UpdateItem, you specify the condition(s) in the ConditionExpression parameter.
With DynamoDb you can use atomic counters to increment or decrement the value of an existing attribute when you perform an UpdateItem operation. All write requests are applied in the order in which they were received. Atomic counter updates are not idempotent which means that counters will update and increment each time you call UpdateItem. This means that you risk updating counters multiple times for example if you suspect that the update was not successful but in fact it was. This means atomic counters while useful for some applications may not be the right choice for counter sensitive apps like voting software.
Web Identity Federation
You can use web identity federation for authentication and authorization if your application is targeted at a wider audience. Instead of creating IAM users for your customer base, you can have your users sign in to an Identity provider and obtain a temporary security credential from the AWS Security Token Service (STS)
Web identity federation supports the following identity providers:
- Login with Amazon
Key Steps to Configuring Web Identity Federation for an App.
- Register your app with the third-party identity provider, e.g. Facebook. You will be assigned an app ID
- Create an IAM role for your identity provider. You will then need to attach an IAM policy to the role. This policy must define the DynamoDB resource required by your app and the permissions your app has on the DynamoDB resource
During Sign process, the following process occurs
- The app calls the identity provider to authenticate and user. The identity provider then returns a web identity token to the app
- The app calls the AWS STS and passes the web identity token using the AssumeRoleWithWebIdentity API, specifying the ARN for the IAM role. AWS then grants access to the resource in accordance with the role’s security policy
- The app calls DynamoDB to access the tables as specified in the policy
Note: The default duration, which specifies how long the temporary security credentials are valid has a minimum of 15 minutes and the maximum (or default) value of 1 hour.
Additional Exam Tips
180 Practice Exam Questions – Get Prepared for your Exam Day!
Our Exam Simulator with 180 practice exam questions comes with comprehensive explanations that will help you prepare for one of the most sought-after IT Certifications of the year. Register Today and start preparing for your AWS Certified Solutions Architect – Associate Exam.