The Nifty AWS DynamoDB Features You Should Probably Be Using

Joonas Laitio
4 min readDec 9, 2019

--

DynamoDB is an increasingly relevant NoSQL persistence solution for those looking for an easy-to-manage database service in the AWS ecosystem that will scale quite well with little effort. It launched with relatively few features and has steadily added new ones here and there, so even if you have used it for a long time there might be some goodies you have missed. Here are a few of the most important ones to bring you up to speed, along with their CloudFormation templating just to illustrate how easy they are to set up.

On-Demand Pricing

Launched: 2018–11–28

Capacity handling used to be cumbersome with DynamoDB. You had to provision your read and write capacity in advance, and you would pay for that capacity regardless of whether there was actual load or not. Also any scaling that capacity had to be done with an Autoscaling Group, which can be quite a chore to set up, bloating the Cloudformation templates up to hundreds of lines, and they often still don’t react fast enough to sudden changes in traffic.

This all changed when DynamoDB On-Demand Pricing was introduced. With a single boolean configuration value you only pay for what you actually use, and you scale automatically. If your loads are extremely consistent and predictable you will save some money by still using provisioned throughputs, but for a lot of use cases this should be the first configuration setting to enable.

CloudFormation:

BillingMode: PAY_PER_REQUEST

Data retention with Time-To-Live Specifications

Launched: 2017–02–27

People that are used to relational databases will eventually face the harsh access pattern reality of DynamoDB: Each and every query has to explicitly list partition keys that the query targets, and DynamoDB scans (i.e. full table scans) have horrible performance and are categorically regarded as bad practice.

One repercussion of this is that you cannot simply create an index on a timestamp field, and then make a query for all records that are older than a certain threshold. You need another approach for handling data retention, the deletion of old records.

Luckily DynamoDB has a built-in mechanism for this: you can denote a certain field to be the “time to live” attribute, parsed as an unix timestamp of the date when the record should be deleted. DynamoDB will handle this automatically in the background. AWS only guarantees deletion within 48 hours however, so don’t go thinking you can expire your caches this way.

CloudFormation:

TimeToLiveSpecification:
AttributeName: myAttribute
Enabled: true

Backups with Point-In-Time Recovery

Launched: 2018–03–26

Backups are a necessary evil in the database world, and they are seldom very nice to work with. The old solution for DynamoDB backups were snapshots triggered manually or through the SDK, typically from a scheduled task. Point-In-Time Recovery is a feature that enables you to roll back the data to any time point within the last 35 days, a wonderful way to safeguard against disasters without losing data accrued since the last snapshot interval. It’s also easy to set up — you guessed it — a single button click in the console or a single boolean in your infracode. You can still use snapshots for backups that you want to store longer than 35 days.

The regular caveat of restoring DynamoDB backups still applies PITR restores: they are always done by creating a new table with the data from the backup. You cannot restore a backup “in-place” to the same table. You should always design your DynamoDB-powered applications in a way that enables the easy modification of table names.

CloudFormation:

PointInTimeRecoverySpecification:
PointInTimeRecoveryEnabled: true

Reacting to change with DynamoDB Streams

Launched: 2014–11–10

This is an old one, but bears repeating because the recent uptick in implementing services based on an event architecture brings even more interesting use cases for DynamoDB Streams. If you set up a stream for your DynamoDB table, any modifications (inserts, updates, removals) trigger a lambda function with the data from the operation. This is a great way to react to change — write audit and history logs, channel data to other services, push updated data to Serverless Websocket clients, you name it.

An important implementation detail is to realize that these streamed updates are batched, so a single Lambda invocation may get the data for dozens of operations. Take care to implement the error handling of your stream handler in a way that doesn’t lose data or lead to duplicate handling for other events in the batch.

CloudFormation:

StreamSpecification:
StreamViewType: NEW_AND_OLD_IMAGES

--

--

Joonas Laitio

Engineer, referee, bassist. Building foundations for others to go crazy on.