9 thoughts
last posted Aug. 30, 2014, 8:41 p.m.

6 earlier thoughts


DynamoDB is a hosted NoSQL database, part of Amazon AWS. We're using it as the primary data store for our messaging system.

Dynamo pricing is based on three factors: dataset size, provisioned read capacity, and provisioned write capacity. Write capacity is by far the most expensive (approx 3x as expensive as reads).

The provisioned I/O capacity is interesting: as you reach your provisioned capacity, calls to Dynamo being returning HTTP 400 responses, with a Throttling status. This is the cue for your application to back off and retry. I'll come back to throttling shortly.

When you start out -- say with no data, and 3000 reads and 1000 writes per second -- all of your data is in a single partition. As you add data or scale up your provisioned capacity, Amazon transparently splits your data into additional partitions to keep up. This is one of the selling points: that you don't have to worry about sharding or rebalancing.

It's not just your data that gets split when you hit the threshold of a partition: it's the provisioned capacity, as well. So if you have your table provisioned for 1000 writes per second and 3000 reads per second, and your data approaches the capacity of a single partition, it will be split into two partitions. Each partition will be allocated 500 writes per second and 1500 reads per second.

DynamoDB works best with evenly distributed keys and access, so that shouldn't be a problem. But it could be: if you try to make 600 writes per second to data that all happens to live in a single partition, you'll be throttled even though you think you have excess capacity.

Provisioning that I/O capacity is important to get right: it's not sufficient to turn the dial all the way to 11 from day 1. That's because Dynamo will also split a partition based on provisioned I/O capacity. A single a partition is targeted roughly at the 1000/3000 level, so doubling that to 2000/6000 will also cause a split, regardless of how much data you have.

Splits due to provisioned I/O capacity -- particularly when you dramatically increase the capacity for a high volume ingest -- are the source of dilution.

"Dilution" is the euphemism Amazon uses to refer to a situation where the provisioned I/O is distributed across so many partitions that you're effective throughput is "diluted". So why would this happen? Well, remember that a partition can be split when either data size or provisioned I/O increases.

Partitions only split, they are never consolidated.

So if you decide that you want an initial ingest to complete at a much faster rate than your application is going to sustain in production and increase the provisioned I/O to match, you're effectively diluting your future I/O performance by artificially increasing the number of partitions.

Whomp whomp.

2 later thoughts