Skip to content

Commit fb5ef41

Browse files
author
Imhoertha Ojior
committed
Update WriteSharding README.md to document new command-line parameters
1 parent 7488125 commit fb5ef41

File tree

1 file changed

+128
-11
lines changed
  • schema_design/BuildingBlocks/WriteSharding

1 file changed

+128
-11
lines changed
Lines changed: 128 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,135 @@
1-
# Write Sharding
2-
One way to better distribute writes across a partition key space in Amazon DynamoDB is to expand the space. You can do this in several different ways. You can add a random number to the partition key values to distribute the items among partitions. Or you can use a number that is calculated based on something that you're querying on.
1+
# Write Sharding in DynamoDB
32

4-
## Examples
5-
Code examples provided demonstrate writing and reading from a DynamoDB table using write sharding.
3+
## Overview
64

7-
## Run it
8-
Python: The script requires you have Python3 and installed modules: boto3, json, and random.
5+
Write sharding is a technique used to distribute write operations more evenly across multiple partitions in Amazon DynamoDB. This pattern helps prevent hot partitions and throttling by expanding the partition key space, allowing for better throughput and performance.
96

10-
DynamoDB: Create a table called "ExampleTable" with a partition key of "pk" and a sort key of "sk". Change the AWS Region to your closest.
7+
## Why Use Write Sharding?
118

12-
% python3 WriteShardingExample.py
9+
When a DynamoDB table receives a high volume of write operations targeting the same partition key, it can lead to:
1310

14-
## Disclaimer
15-
Provided as a sample. The script assumes the runtime has an AWS account with appropriate permissions.
11+
1. **Hot partitions**: Uneven distribution of traffic where some partitions receive significantly more requests than others
12+
2. **Throttling**: Requests exceeding the provisioned throughput for a specific partition
13+
3. **Performance degradation**: Slower response times due to partition-level bottlenecks
14+
15+
Write sharding addresses these issues by distributing writes across multiple logical partitions.
16+
17+
## Sharding Techniques
18+
19+
This example demonstrates two common write sharding techniques:
20+
21+
### 1. Random Suffix Sharding
22+
23+
Append a random number to the partition key to distribute items randomly across partitions.
24+
25+
```python
26+
shard_id = random.randint(0, write_shard_count-1)
27+
pk = f'{date}.{str(shard_id)}'
28+
```
29+
30+
**Pros:**
31+
- Simple to implement
32+
- Provides good distribution for write operations
33+
34+
**Cons:**
35+
- Requires querying all shards when reading data
36+
- No predictable way to access a specific item without scanning all shards
37+
38+
### 2. Calculated Suffix Sharding
39+
40+
Use a calculation based on an attribute of the item to determine the shard.
41+
42+
```python
43+
shard_id = int(item_id) % write_shard_count
44+
pk = f'{date}.{str(shard_id)}'
45+
```
46+
47+
**Pros:**
48+
- Deterministic - same item always goes to the same shard
49+
- Can retrieve specific items without querying all shards
50+
- Good for items that need to be accessed individually
51+
52+
**Cons:**
53+
- May still create hot partitions if the calculation doesn't distribute evenly
54+
- Requires knowing the attribute used in the calculation when reading
55+
56+
## Reading from Sharded Tables
57+
58+
When using write sharding, reading data typically requires one of these approaches:
59+
60+
1. **Query all shards**: For random suffix sharding, you need to query each shard and combine the results.
61+
62+
```python
63+
allItems = []
64+
for x in range(write_shard_count):
65+
pk = f"{date}.{str(x)}"
66+
resp = table.query(KeyConditionExpression=Key('pk').eq(pk))
67+
allItems = allItems + resp['Items']
68+
```
69+
70+
2. **Query specific shard**: For calculated suffix sharding, you can query just the shard where the item is stored.
71+
72+
```python
73+
shard_id = int(item_id) % write_shard_count
74+
pk = f"{date}.{str(shard_id)}"
75+
resp = table.query(KeyConditionExpression=Key('pk').eq(pk))
76+
```
77+
78+
## Example Code
79+
80+
The provided Python example demonstrates:
81+
- Writing items using random suffix sharding
82+
- Reading items from all shards with random suffixes
83+
- Writing items using calculated suffix sharding
84+
- Reading items from a specific shard with a calculated suffix
85+
86+
## Running the Example
87+
88+
### Prerequisites
89+
90+
1. Python 3 with the following modules installed:
91+
- boto3
92+
- json
93+
- random
94+
- argparse
95+
96+
2. DynamoDB table:
97+
- Table name: "ExampleTable"
98+
- Partition key: "pk" (String)
99+
- Sort key: "sk" (String)
100+
101+
3. AWS credentials configured with appropriate permissions
102+
103+
### Execution
104+
105+
```bash
106+
# Run with default settings (us-east-1 region, 2 shards)
107+
python3 python/WriteShardingExample.py
108+
109+
# Run with custom region and shard count
110+
python3 python/WriteShardingExample.py --region us-west-2 --shard-count 4
111+
```
112+
113+
### Command-line Arguments
114+
115+
- `--region`: AWS region name (default: us-east-1)
116+
- `--shard-count`: Number of write shards to use (default: 2)
117+
118+
## Best Practices
119+
120+
1. **Choose an appropriate shard count**: Too few shards won't distribute the load effectively, while too many shards can complicate read operations.
121+
122+
2. **Consider your access patterns**: Choose between random and calculated sharding based on how you'll query the data.
123+
124+
3. **Monitor partition metrics**: Use CloudWatch to monitor partition-level metrics and adjust your sharding strategy as needed.
125+
126+
4. **Combine with other techniques**: Consider using write sharding alongside other DynamoDB best practices like TTL for time-series data or sparse indexes.
127+
128+
## Additional Resources
129+
130+
- [DynamoDB Best Practices for Partition Keys](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-design.html)
131+
- [DynamoDB Write Sharding Documentation](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-sharding.html)
16132

17133
## Contribute
18-
Be the first to enhance this code with a Pull Request.
134+
135+
Contributions to enhance this example are welcome! Please submit a Pull Request with your improvements.

0 commit comments

Comments
 (0)