Skip to content

Commit ffb320c

Browse files
authored
Merge pull request #169 from iojior/adding-readme
Fixing #126, #128 and #135. Thanks for your contributions Imhoertha
2 parents 7a0694f + c3a749a commit ffb320c

File tree

5 files changed

+664
-39
lines changed

5 files changed

+664
-39
lines changed
Lines changed: 128 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,135 @@
1-
# Write Sharding
2-
One way to better distribute writes across a partition key space in Amazon DynamoDB is to expand the space. You can do this in several different ways. You can add a random number to the partition key values to distribute the items among partitions. Or you can use a number that is calculated based on something that you're querying on.
1+
# Write Sharding in DynamoDB
32

4-
## Examples
5-
Code examples provided demonstrate writing and reading from a DynamoDB table using write sharding.
3+
## Overview
64

7-
## Run it
8-
Python: The script requires you have Python3 and installed modules: boto3, json, and random.
5+
Write sharding is a technique used to distribute write operations more evenly across multiple partitions in Amazon DynamoDB. This pattern helps prevent hot partitions and throttling by expanding the partition key space, allowing for better throughput and performance.
96

10-
DynamoDB: Create a table called "ExampleTable" with a partition key of "pk" and a sort key of "sk". Change the AWS Region to your closest.
7+
## Why Use Write Sharding?
118

12-
% python3 WriteShardingExample.py
9+
When a DynamoDB table receives a high volume of write operations targeting the same partition key, it can lead to:
1310

14-
## Disclaimer
15-
Provided as a sample. The script assumes the runtime has an AWS account with appropriate permissions.
11+
1. **Hot partitions**: Uneven distribution of traffic where some partitions receive significantly more requests than others
12+
2. **Throttling**: Requests exceeding the provisioned throughput for a specific partition
13+
3. **Performance degradation**: Slower response times due to partition-level bottlenecks
14+
15+
Write sharding addresses these issues by distributing writes across multiple logical partitions.
16+
17+
## Sharding Techniques
18+
19+
This example demonstrates two common write sharding techniques:
20+
21+
### 1. Random Suffix Sharding
22+
23+
Append a random number to the partition key to distribute items randomly across partitions.
24+
25+
```python
26+
shard_id = random.randint(0, write_shard_count-1)
27+
pk = f'{date}.{str(shard_id)}'
28+
```
29+
30+
**Pros:**
31+
- Simple to implement
32+
- Provides good distribution for write operations
33+
34+
**Cons:**
35+
- Requires querying all shards when reading data
36+
- No predictable way to access a specific item without scanning all shards
37+
38+
### 2. Calculated Suffix Sharding
39+
40+
Use a calculation based on an attribute of the item to determine the shard.
41+
42+
```python
43+
shard_id = int(item_id) % write_shard_count
44+
pk = f'{date}.{str(shard_id)}'
45+
```
46+
47+
**Pros:**
48+
- Deterministic - same item always goes to the same shard
49+
- Can retrieve specific items without querying all shards
50+
- Good for items that need to be accessed individually
51+
52+
**Cons:**
53+
- May still create hot partitions if the calculation doesn't distribute evenly
54+
- Requires knowing the attribute used in the calculation when reading
55+
56+
## Reading from Sharded Tables
57+
58+
When using write sharding, reading data typically requires one of these approaches:
59+
60+
1. **Query all shards**: For random suffix sharding, you need to query each shard and combine the results.
61+
62+
```python
63+
allItems = []
64+
for x in range(write_shard_count):
65+
pk = f"{date}.{str(x)}"
66+
resp = table.query(KeyConditionExpression=Key('pk').eq(pk))
67+
allItems = allItems + resp['Items']
68+
```
69+
70+
2. **Query specific shard**: For calculated suffix sharding, you can query just the shard where the item is stored.
71+
72+
```python
73+
shard_id = int(item_id) % write_shard_count
74+
pk = f"{date}.{str(shard_id)}"
75+
resp = table.query(KeyConditionExpression=Key('pk').eq(pk))
76+
```
77+
78+
## Example Code
79+
80+
The provided Python example demonstrates:
81+
- Writing items using random suffix sharding
82+
- Reading items from all shards with random suffixes
83+
- Writing items using calculated suffix sharding
84+
- Reading items from a specific shard with a calculated suffix
85+
86+
## Running the Example
87+
88+
### Prerequisites
89+
90+
1. Python 3 with the following modules installed:
91+
- boto3
92+
- json
93+
- random
94+
- argparse
95+
96+
2. DynamoDB table:
97+
- Table name: "ExampleTable"
98+
- Partition key: "pk" (String)
99+
- Sort key: "sk" (String)
100+
101+
3. AWS credentials configured with appropriate permissions
102+
103+
### Execution
104+
105+
```bash
106+
# Run with default settings (us-east-1 region, 2 shards)
107+
python3 python/WriteShardingExample.py
108+
109+
# Run with custom region and shard count
110+
python3 python/WriteShardingExample.py --region us-west-2 --shard-count 4
111+
```
112+
113+
### Command-line Arguments
114+
115+
- `--region`: AWS region name (default: us-east-1)
116+
- `--shard-count`: Number of write shards to use (default: 2)
117+
118+
## Best Practices
119+
120+
1. **Choose an appropriate shard count**: Too few shards won't distribute the load effectively, while too many shards can complicate read operations.
121+
122+
2. **Consider your access patterns**: Choose between random and calculated sharding based on how you'll query the data.
123+
124+
3. **Monitor partition metrics**: Use CloudWatch to monitor partition-level metrics and adjust your sharding strategy as needed.
125+
126+
4. **Combine with other techniques**: Consider using write sharding alongside other DynamoDB best practices like TTL for time-series data or sparse indexes.
127+
128+
## Additional Resources
129+
130+
- [DynamoDB Best Practices for Partition Keys](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-design.html)
131+
- [DynamoDB Write Sharding Documentation](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-sharding.html)
16132

17133
## Contribute
18-
Be the first to enhance this code with a Pull Request.
134+
135+
Contributions to enhance this example are welcome! Please submit a Pull Request with your improvements.

schema_design/BuildingBlocks/WriteSharding/python/WriteShardingExample.py

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,25 @@
22

33
from __future__ import print_function # Python 2/3 compatibility
44
import boto3, random, json
5+
import argparse
56

67
from boto3.dynamodb.conditions import Key
78

89
from botocore.exceptions import ClientError
910

10-
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
11+
# Parse command line arguments
12+
parser = argparse.ArgumentParser(description='Write sharding example')
13+
parser.add_argument('--region', type=str, default='us-east-1', help='AWS region name (default: us-east-1)')
14+
parser.add_argument('--shard-count', type=int, default=2, help='Number of write shards to use (default: 2)')
15+
args = parser.parse_args()
16+
17+
# Initialize DynamoDB with the provided region
18+
dynamodb = boto3.resource('dynamodb', region_name=args.region)
1119

1220
table = dynamodb.Table('ExampleTable')
1321

14-
write_shard_count = 2
22+
# Use the provided shard count
23+
write_shard_count = args.shard_count
1524

1625
items = [
1726
{
@@ -118,4 +127,4 @@
118127
pk = "2021-02-01." + str(shard_id)
119128
resp = table.query(KeyConditionExpression=Key('pk').eq(pk))
120129
print("Data from table with calculated write shards")
121-
print(json.dumps(resp['Items'], indent=4, sort_keys=True))
130+
print(json.dumps(resp['Items'], indent=4, sort_keys=True))
Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
# Complaint Management System Data Modeling with Amazon DynamoDB
2+
3+
## Overview
4+
5+
This document outlines a use case using DynamoDB as a datastore for a complaint management system that efficiently handles customer complaints, agent interactions, and complaint status tracking. The system allows for creating complaints, tracking communications, managing escalations, and monitoring complaint status changes.
6+
7+
## Key Entities
8+
9+
1. Complaint
10+
2. Communication
11+
3. Customer
12+
4. Agent
13+
14+
## Design Approach
15+
16+
We employ a single-table design with a composite primary key and multiple Global Secondary Indexes (GSIs) to support various access patterns.
17+
18+
The following key structures are used:
19+
20+
- Base table
21+
- For a complaint item:
22+
- Partition key (PK)
23+
- Complaint ID (e.g., "Complaint123")
24+
- Sort key (SK)
25+
- "metadata" for complaint details
26+
- For a communication item:
27+
- Partition key (PK)
28+
- Complaint ID (e.g., "Complaint123")
29+
- Sort key (SK)
30+
- "comm#[timestamp]#[comm_id]" for communications
31+
32+
- Examples:
33+
34+
| PK | SK | Sample Attributes |
35+
| ----------- | ----------- | ----------- |
36+
| Complaint123 | metadata | customer_id, severity, complaint_description, current_state |
37+
| Complaint123 | comm#2023-05-01T14:30:00Z#comm456 | agentID, comm_text, complaint_state |
38+
39+
- Global Secondary Indexes:
40+
41+
1. **Customer_Complaint_GSI**
42+
- Partition key: customer_id
43+
- Sort key: complaint_id
44+
45+
- Example:
46+
47+
| customer_id | complaint_id | Sample Attributes |
48+
| ----------- | ----------- | ----------- |
49+
| custXYZ | Complaint123 | PK, SK, severity, current_state |
50+
51+
2. **Escalations_GSI**
52+
- Partition key: escalated_to
53+
- Sort key: escalation_time
54+
55+
- Example:
56+
57+
| escalated_to | escalation_time | Sample Attributes |
58+
| ----------- | ----------- | ----------- |
59+
| AgentB | 2023-05-02T09:15:00Z | PK, SK, severity, customer_id |
60+
61+
3. **Agents_Comments_GSI**
62+
- Partition key: agentID
63+
- Sort key: comm_date
64+
65+
- Example:
66+
67+
| agentID | comm_date | Sample Attributes |
68+
| ----------- | ----------- | ----------- |
69+
| AgentA | 2023-05-01T14:30:00Z | PK, SK, comm_text, complaint_state |
70+
71+
## Access Patterns
72+
73+
The schema design efficiently supports the following access patterns:
74+
75+
| Access pattern | Base table/GSI | Operation | Partition key value | Sort key value | Other conditions/Filters |
76+
| ----------- | ----------- | ----------- | ----------- | ----------- | ----------- |
77+
| Get complaint metadata | Base table | GetItem | PK=\<ComplaintID\> | SK="metadata" | |
78+
| Get all communications for a complaint | Base table | Query | PK=\<ComplaintID\> | begins_with(SK, "comm#") | |
79+
| Get all complaints for a customer | Customer_Complaint_GSI | Query | customer_id=\<CustomerID\> | | |
80+
| Find complaints escalated to an agent | Escalations_GSI | Query | escalated_to=\<AgentID\> | | |
81+
| View agent's communication history | Agents_Comments_GSI | Query | agentID=\<AgentID\> | | |
82+
| Find complaints by severity and state | Base table | Scan | | | Filter on severity and current_state |
83+
| Track complaint state changes | Base table | Query | PK=\<ComplaintID\> | begins_with(SK, "comm#") | Filter on complaint_state changes |
84+
85+
## Data Model Attributes
86+
87+
- **PK**: Partition key - Complaint ID
88+
- **SK**: Sort key - Either "metadata" or communication identifier
89+
- **customer_id**: ID of the customer who filed the complaint
90+
- **complaint_id**: Unique identifier for the complaint
91+
- **comm_id**: Communication identifier
92+
- **comm_date**: Timestamp of the communication
93+
- **complaint_state**: State of the complaint at the time of communication
94+
- **current_state**: Current state of the complaint (waiting, assigned, investigating, resolved)
95+
- **creation_time**: When the complaint was created
96+
- **severity**: Priority level (P1, P2, P3)
97+
- **complaint_description**: Detailed description of the issue
98+
- **comm_text**: Content of the communication
99+
- **attachments**: Set of S3 URLs for attached files
100+
- **agentID**: ID of the agent handling the communication
101+
- **escalated_to**: ID of the agent to whom the complaint was escalated
102+
- **escalation_time**: When the complaint was escalated
103+
104+
## Example Queries
105+
106+
### Get a specific complaint with all its communications
107+
108+
```javascript
109+
// Get complaint metadata
110+
const complaintMetadata = await docClient.get({
111+
TableName: 'Complaint_management_system',
112+
Key: {
113+
PK: 'Complaint123',
114+
SK: 'metadata'
115+
}
116+
}).promise();
117+
118+
// Get all communications for the complaint
119+
const complaintComms = await docClient.query({
120+
TableName: 'Complaint_management_system',
121+
KeyConditionExpression: 'PK = :pk AND begins_with(SK, :sk)',
122+
ExpressionAttributeValues: {
123+
':pk': 'Complaint123',
124+
':sk': 'comm#'
125+
}
126+
}).promise();
127+
```
128+
129+
### Get all complaints for a customer
130+
131+
```javascript
132+
const customerComplaints = await docClient.query({
133+
TableName: 'Complaint_management_system',
134+
IndexName: 'Customer_Complaint_GSI',
135+
KeyConditionExpression: 'customer_id = :custId',
136+
ExpressionAttributeValues: {
137+
':custId': 'custXYZ'
138+
}
139+
}).promise();
140+
```
141+
142+
### Get all escalated complaints for an agent
143+
144+
```javascript
145+
const escalatedComplaints = await docClient.query({
146+
TableName: 'Complaint_management_system',
147+
IndexName: 'Escalations_GSI',
148+
KeyConditionExpression: 'escalated_to = :agentId',
149+
ExpressionAttributeValues: {
150+
':agentId': 'AgentB'
151+
}
152+
}).promise();
153+
```
154+
155+
## Goals
156+
157+
- Efficiently manage customer complaints and related communications
158+
- Track complaint status changes and escalations
159+
- Enable efficient querying by customer, agent, or escalation status
160+
- Ensure scalability using Amazon DynamoDB's single-table design principles
161+
162+
## Schema Design
163+
164+
A comprehensive schema design is included, demonstrating how different entities and access patterns map to the DynamoDB table structure. [ComplaintManagementSchema.json](https://github.com/aws-samples/aws-dynamodb-examples/blob/master/schema_design/SchemaExamples/ComplainManagement/ComplaintManagementSchema.json)
165+
166+
## Design Considerations
167+
168+
1. **Single-Table Design**: All complaint data is stored in a single table to minimize latency and simplify operations.
169+
2. **Chronological Sorting**: Communications are automatically sorted by timestamp due to the SK format.
170+
3. **Flexible Attributes**: The schema accommodates various complaint types and communication formats.
171+
4. **Efficient Querying**: GSIs enable efficient access to data by customer, agent, or escalation status.
172+
5. **Scalability**: The schema is designed to handle a growing number of complaints and communications without performance degradation.

0 commit comments

Comments
 (0)