|
1 | | -# Write Sharding |
2 | | -One way to better distribute writes across a partition key space in Amazon DynamoDB is to expand the space. You can do this in several different ways. You can add a random number to the partition key values to distribute the items among partitions. Or you can use a number that is calculated based on something that you're querying on. |
| 1 | +# Write Sharding in DynamoDB |
3 | 2 |
|
4 | | -## Examples |
5 | | -Code examples provided demonstrate writing and reading from a DynamoDB table using write sharding. |
| 3 | +## Overview |
6 | 4 |
|
7 | | -## Run it |
8 | | -Python: The script requires you have Python3 and installed modules: boto3, json, and random. |
| 5 | +Write sharding is a technique used to distribute write operations more evenly across multiple partitions in Amazon DynamoDB. This pattern helps prevent hot partitions and throttling by expanding the partition key space, allowing for better throughput and performance. |
9 | 6 |
|
10 | | -DynamoDB: Create a table called "ExampleTable" with a partition key of "pk" and a sort key of "sk". Change the AWS Region to your closest. |
| 7 | +## Why Use Write Sharding? |
11 | 8 |
|
12 | | -% python3 WriteShardingExample.py |
| 9 | +When a DynamoDB table receives a high volume of write operations targeting the same partition key, it can lead to: |
13 | 10 |
|
14 | | -## Disclaimer |
15 | | -Provided as a sample. The script assumes the runtime has an AWS account with appropriate permissions. |
| 11 | +1. **Hot partitions**: Uneven distribution of traffic where some partitions receive significantly more requests than others |
| 12 | +2. **Throttling**: Requests exceeding the provisioned throughput for a specific partition |
| 13 | +3. **Performance degradation**: Slower response times due to partition-level bottlenecks |
| 14 | + |
| 15 | +Write sharding addresses these issues by distributing writes across multiple logical partitions. |
| 16 | + |
| 17 | +## Sharding Techniques |
| 18 | + |
| 19 | +This example demonstrates two common write sharding techniques: |
| 20 | + |
| 21 | +### 1. Random Suffix Sharding |
| 22 | + |
| 23 | +Append a random number to the partition key to distribute items randomly across partitions. |
| 24 | + |
| 25 | +```python |
| 26 | +shard_id = random.randint(0, write_shard_count-1) |
| 27 | +pk = f'{date}.{str(shard_id)}' |
| 28 | +``` |
| 29 | + |
| 30 | +**Pros:** |
| 31 | +- Simple to implement |
| 32 | +- Provides good distribution for write operations |
| 33 | + |
| 34 | +**Cons:** |
| 35 | +- Requires querying all shards when reading data |
| 36 | +- No predictable way to access a specific item without scanning all shards |
| 37 | + |
| 38 | +### 2. Calculated Suffix Sharding |
| 39 | + |
| 40 | +Use a calculation based on an attribute of the item to determine the shard. |
| 41 | + |
| 42 | +```python |
| 43 | +shard_id = int(item_id) % write_shard_count |
| 44 | +pk = f'{date}.{str(shard_id)}' |
| 45 | +``` |
| 46 | + |
| 47 | +**Pros:** |
| 48 | +- Deterministic - same item always goes to the same shard |
| 49 | +- Can retrieve specific items without querying all shards |
| 50 | +- Good for items that need to be accessed individually |
| 51 | + |
| 52 | +**Cons:** |
| 53 | +- May still create hot partitions if the calculation doesn't distribute evenly |
| 54 | +- Requires knowing the attribute used in the calculation when reading |
| 55 | + |
| 56 | +## Reading from Sharded Tables |
| 57 | + |
| 58 | +When using write sharding, reading data typically requires one of these approaches: |
| 59 | + |
| 60 | +1. **Query all shards**: For random suffix sharding, you need to query each shard and combine the results. |
| 61 | + |
| 62 | +```python |
| 63 | +allItems = [] |
| 64 | +for x in range(write_shard_count): |
| 65 | + pk = f"{date}.{str(x)}" |
| 66 | + resp = table.query(KeyConditionExpression=Key('pk').eq(pk)) |
| 67 | + allItems = allItems + resp['Items'] |
| 68 | +``` |
| 69 | + |
| 70 | +2. **Query specific shard**: For calculated suffix sharding, you can query just the shard where the item is stored. |
| 71 | + |
| 72 | +```python |
| 73 | +shard_id = int(item_id) % write_shard_count |
| 74 | +pk = f"{date}.{str(shard_id)}" |
| 75 | +resp = table.query(KeyConditionExpression=Key('pk').eq(pk)) |
| 76 | +``` |
| 77 | + |
| 78 | +## Example Code |
| 79 | + |
| 80 | +The provided Python example demonstrates: |
| 81 | +- Writing items using random suffix sharding |
| 82 | +- Reading items from all shards with random suffixes |
| 83 | +- Writing items using calculated suffix sharding |
| 84 | +- Reading items from a specific shard with a calculated suffix |
| 85 | + |
| 86 | +## Running the Example |
| 87 | + |
| 88 | +### Prerequisites |
| 89 | + |
| 90 | +1. Python 3 with the following modules installed: |
| 91 | + - boto3 |
| 92 | + - json |
| 93 | + - random |
| 94 | + - argparse |
| 95 | + |
| 96 | +2. DynamoDB table: |
| 97 | + - Table name: "ExampleTable" |
| 98 | + - Partition key: "pk" (String) |
| 99 | + - Sort key: "sk" (String) |
| 100 | + |
| 101 | +3. AWS credentials configured with appropriate permissions |
| 102 | + |
| 103 | +### Execution |
| 104 | + |
| 105 | +```bash |
| 106 | +# Run with default settings (us-east-1 region, 2 shards) |
| 107 | +python3 python/WriteShardingExample.py |
| 108 | + |
| 109 | +# Run with custom region and shard count |
| 110 | +python3 python/WriteShardingExample.py --region us-west-2 --shard-count 4 |
| 111 | +``` |
| 112 | + |
| 113 | +### Command-line Arguments |
| 114 | + |
| 115 | +- `--region`: AWS region name (default: us-east-1) |
| 116 | +- `--shard-count`: Number of write shards to use (default: 2) |
| 117 | + |
| 118 | +## Best Practices |
| 119 | + |
| 120 | +1. **Choose an appropriate shard count**: Too few shards won't distribute the load effectively, while too many shards can complicate read operations. |
| 121 | + |
| 122 | +2. **Consider your access patterns**: Choose between random and calculated sharding based on how you'll query the data. |
| 123 | + |
| 124 | +3. **Monitor partition metrics**: Use CloudWatch to monitor partition-level metrics and adjust your sharding strategy as needed. |
| 125 | + |
| 126 | +4. **Combine with other techniques**: Consider using write sharding alongside other DynamoDB best practices like TTL for time-series data or sparse indexes. |
| 127 | + |
| 128 | +## Additional Resources |
| 129 | + |
| 130 | +- [DynamoDB Best Practices for Partition Keys](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-design.html) |
| 131 | +- [DynamoDB Write Sharding Documentation](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-sharding.html) |
16 | 132 |
|
17 | 133 | ## Contribute |
18 | | -Be the first to enhance this code with a Pull Request. |
| 134 | + |
| 135 | +Contributions to enhance this example are welcome! Please submit a Pull Request with your improvements. |
0 commit comments