aws-samples
diff --git a/‎schema_design/BuildingBlocks/WriteSharding/README.md‎
Lines changed: 128 additions & 11 deletions b/‎schema_design/BuildingBlocks/WriteSharding/README.md‎
Lines changed: 128 additions & 11 deletions
diff --git a/‎schema_design/BuildingBlocks/WriteSharding/python/WriteShardingExample.py‎
Lines changed: 12 additions & 3 deletions b/‎schema_design/BuildingBlocks/WriteSharding/python/WriteShardingExample.py‎
Lines changed: 12 additions & 3 deletions
diff --git a/‎schema_design/SchemaExamples/ComplainManagement/README.md‎
Lines changed: 172 additions & 0 deletions b/‎schema_design/SchemaExamples/ComplainManagement/README.md‎
Lines changed: 172 additions & 0 deletions
@@ -1,18 +1,135 @@
-# Write Sharding 
-One way to better distribute writes across a partition key space in Amazon DynamoDB is to expand the space. You can do this in several different ways. You can add a random number to the partition key values to distribute the items among partitions. Or you can use a number that is calculated based on something that you're querying on.
+# Write Sharding in DynamoDB
 
-## Examples
-Code examples provided demonstrate writing and reading from a DynamoDB table using write sharding.
+## Overview
 
-## Run it
-Python: The script requires you have Python3 and installed modules: boto3, json, and random.
+Write sharding is a technique used to distribute write operations more evenly across multiple partitions in Amazon DynamoDB. This pattern helps prevent hot partitions and throttling by expanding the partition key space, allowing for better throughput and performance.
 
-DynamoDB: Create a table called "ExampleTable" with a partition key of "pk" and a sort key of "sk". Change the AWS Region to your closest.
+## Why Use Write Sharding?
 
-% python3 WriteShardingExample.py
+When a DynamoDB table receives a high volume of write operations targeting the same partition key, it can lead to:
 
-## Disclaimer
-Provided as a sample. The script assumes the runtime has an AWS account with appropriate permissions.
+1. **Hot partitions**: Uneven distribution of traffic where some partitions receive significantly more requests than others
+2. **Throttling**: Requests exceeding the provisioned throughput for a specific partition
+3. **Performance degradation**: Slower response times due to partition-level bottlenecks
+
+Write sharding addresses these issues by distributing writes across multiple logical partitions.
+
+## Sharding Techniques
+
+This example demonstrates two common write sharding techniques:
+
+### 1. Random Suffix Sharding
+
+Append a random number to the partition key to distribute items randomly across partitions.
+
+```python
+shard_id = random.randint(0, write_shard_count-1)
+pk = f'{date}.{str(shard_id)}'
+```
+
+**Pros:**
+- Simple to implement
+- Provides good distribution for write operations
+
+**Cons:**
+- Requires querying all shards when reading data
+- No predictable way to access a specific item without scanning all shards
+
+### 2. Calculated Suffix Sharding
+
+Use a calculation based on an attribute of the item to determine the shard.
+
+```python
+shard_id = int(item_id) % write_shard_count
+pk = f'{date}.{str(shard_id)}'
+```
+
+**Pros:**
+- Deterministic - same item always goes to the same shard
+- Can retrieve specific items without querying all shards
+- Good for items that need to be accessed individually
+
+**Cons:**
+- May still create hot partitions if the calculation doesn't distribute evenly
+- Requires knowing the attribute used in the calculation when reading
+
+## Reading from Sharded Tables
+
+When using write sharding, reading data typically requires one of these approaches:
+
+1. **Query all shards**: For random suffix sharding, you need to query each shard and combine the results.
+
+```python
+allItems = []
+for x in range(write_shard_count):
+    pk = f"{date}.{str(x)}"
+    resp = table.query(KeyConditionExpression=Key('pk').eq(pk))
+    allItems = allItems + resp['Items']
+```
+
+2. **Query specific shard**: For calculated suffix sharding, you can query just the shard where the item is stored.
+
+```python
+shard_id = int(item_id) % write_shard_count
+pk = f"{date}.{str(shard_id)}"
+resp = table.query(KeyConditionExpression=Key('pk').eq(pk))
+```
+
+## Example Code
+
+The provided Python example demonstrates:
+- Writing items using random suffix sharding
+- Reading items from all shards with random suffixes
+- Writing items using calculated suffix sharding
+- Reading items from a specific shard with a calculated suffix
+
+## Running the Example
+
+### Prerequisites
+
+1. Python 3 with the following modules installed:
+   - boto3
+   - json
+   - random
+   - argparse
+
+2. DynamoDB table:
+   - Table name: "ExampleTable"
+   - Partition key: "pk" (String)
+   - Sort key: "sk" (String)
+
+3. AWS credentials configured with appropriate permissions
+
+### Execution
+
+```bash
+# Run with default settings (us-east-1 region, 2 shards)
+python3 python/WriteShardingExample.py
+
+# Run with custom region and shard count
+python3 python/WriteShardingExample.py --region us-west-2 --shard-count 4
+```
+
+### Command-line Arguments
+
+- `--region`: AWS region name (default: us-east-1)
+- `--shard-count`: Number of write shards to use (default: 2)
+
+## Best Practices
+
+1. **Choose an appropriate shard count**: Too few shards won't distribute the load effectively, while too many shards can complicate read operations.
+
+2. **Consider your access patterns**: Choose between random and calculated sharding based on how you'll query the data.
+
+3. **Monitor partition metrics**: Use CloudWatch to monitor partition-level metrics and adjust your sharding strategy as needed.
+
+4. **Combine with other techniques**: Consider using write sharding alongside other DynamoDB best practices like TTL for time-series data or sparse indexes.
+
+## Additional Resources
+
+- [DynamoDB Best Practices for Partition Keys](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-design.html)
+- [DynamoDB Write Sharding Documentation](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-sharding.html)
 
 ## Contribute
-Be the first to enhance this code with a Pull Request.
+
+Contributions to enhance this example are welcome! Please submit a Pull Request with your improvements.
@@ -2,16 +2,25 @@
 
 from __future__ import print_function # Python 2/3 compatibility
 import boto3, random, json
+import argparse
 
 from boto3.dynamodb.conditions import Key
 
 from botocore.exceptions import ClientError
 
-dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
+# Parse command line arguments
+parser = argparse.ArgumentParser(description='Write sharding example')
+parser.add_argument('--region', type=str, default='us-east-1', help='AWS region name (default: us-east-1)')
+parser.add_argument('--shard-count', type=int, default=2, help='Number of write shards to use (default: 2)')
+args = parser.parse_args()
+
+# Initialize DynamoDB with the provided region
+dynamodb = boto3.resource('dynamodb', region_name=args.region)
 
 table = dynamodb.Table('ExampleTable')
 
-write_shard_count = 2
+# Use the provided shard count
+write_shard_count = args.shard_count
 
 items = [
     {
@@ -118,4 +127,4 @@
 pk = "2021-02-01." + str(shard_id)
 resp = table.query(KeyConditionExpression=Key('pk').eq(pk))
 print("Data from table with calculated write shards")
-print(json.dumps(resp['Items'], indent=4, sort_keys=True))
+print(json.dumps(resp['Items'], indent=4, sort_keys=True))
@@ -0,0 +1,172 @@
+# Complaint Management System Data Modeling with Amazon DynamoDB
+
+## Overview
+
+This document outlines a use case using DynamoDB as a datastore for a complaint management system that efficiently handles customer complaints, agent interactions, and complaint status tracking. The system allows for creating complaints, tracking communications, managing escalations, and monitoring complaint status changes.
+
+## Key Entities
+
+1. Complaint
+2. Communication
+3. Customer
+4. Agent
+
+## Design Approach
+
+We employ a single-table design with a composite primary key and multiple Global Secondary Indexes (GSIs) to support various access patterns.
+
+The following key structures are used:
+
+- Base table
+  - For a complaint item:
+    - Partition key (PK)
+      - Complaint ID (e.g., "Complaint123")
+    - Sort key (SK)
+      - "metadata" for complaint details
+  - For a communication item:
+    - Partition key (PK)
+      - Complaint ID (e.g., "Complaint123")
+    - Sort key (SK)
+      - "comm#[timestamp]#[comm_id]" for communications
+
+  - Examples:
+
+    | PK | SK | Sample Attributes |
+    | ----------- | ----------- | ----------- |
+    | Complaint123 | metadata | customer_id, severity, complaint_description, current_state |
+    | Complaint123 | comm#2023-05-01T14:30:00Z#comm456 | agentID, comm_text, complaint_state |
+
+- Global Secondary Indexes:
+
+  1. **Customer_Complaint_GSI**
+     - Partition key: customer_id
+     - Sort key: complaint_id
+     
+     - Example:
+     
+       | customer_id | complaint_id | Sample Attributes |
+       | ----------- | ----------- | ----------- |
+       | custXYZ | Complaint123 | PK, SK, severity, current_state |
+
+  2. **Escalations_GSI**
+     - Partition key: escalated_to
+     - Sort key: escalation_time
+     
+     - Example:
+     
+       | escalated_to | escalation_time | Sample Attributes |
+       | ----------- | ----------- | ----------- |
+       | AgentB | 2023-05-02T09:15:00Z | PK, SK, severity, customer_id |
+
+  3. **Agents_Comments_GSI**
+     - Partition key: agentID
+     - Sort key: comm_date
+     
+     - Example:
+     
+       | agentID | comm_date | Sample Attributes |
+       | ----------- | ----------- | ----------- |
+       | AgentA | 2023-05-01T14:30:00Z | PK, SK, comm_text, complaint_state |
+
+## Access Patterns
+
+The schema design efficiently supports the following access patterns:
+
+| Access pattern | Base table/GSI | Operation | Partition key value | Sort key value | Other conditions/Filters |
+| ----------- | ----------- | ----------- | ----------- | ----------- | ----------- |
+| Get complaint metadata | Base table | GetItem | PK=\<ComplaintID\> | SK="metadata" | |
+| Get all communications for a complaint | Base table | Query | PK=\<ComplaintID\> | begins_with(SK, "comm#") | |
+| Get all complaints for a customer | Customer_Complaint_GSI | Query | customer_id=\<CustomerID\> | | |
+| Find complaints escalated to an agent | Escalations_GSI | Query | escalated_to=\<AgentID\> | | |
+| View agent's communication history | Agents_Comments_GSI | Query | agentID=\<AgentID\> | | |
+| Find complaints by severity and state | Base table | Scan | | | Filter on severity and current_state |
+| Track complaint state changes | Base table | Query | PK=\<ComplaintID\> | begins_with(SK, "comm#") | Filter on complaint_state changes |
+
+## Data Model Attributes
+
+- **PK**: Partition key - Complaint ID
+- **SK**: Sort key - Either "metadata" or communication identifier
+- **customer_id**: ID of the customer who filed the complaint
+- **complaint_id**: Unique identifier for the complaint
+- **comm_id**: Communication identifier
+- **comm_date**: Timestamp of the communication
+- **complaint_state**: State of the complaint at the time of communication
+- **current_state**: Current state of the complaint (waiting, assigned, investigating, resolved)
+- **creation_time**: When the complaint was created
+- **severity**: Priority level (P1, P2, P3)
+- **complaint_description**: Detailed description of the issue
+- **comm_text**: Content of the communication
+- **attachments**: Set of S3 URLs for attached files
+- **agentID**: ID of the agent handling the communication
+- **escalated_to**: ID of the agent to whom the complaint was escalated
+- **escalation_time**: When the complaint was escalated
+
+## Example Queries
+
+### Get a specific complaint with all its communications
+
+```javascript
+// Get complaint metadata
+const complaintMetadata = await docClient.get({
+  TableName: 'Complaint_management_system',
+  Key: {
+    PK: 'Complaint123',
+    SK: 'metadata'
+  }
+}).promise();
+
+// Get all communications for the complaint
+const complaintComms = await docClient.query({
+  TableName: 'Complaint_management_system',
+  KeyConditionExpression: 'PK = :pk AND begins_with(SK, :sk)',
+  ExpressionAttributeValues: {
+    ':pk': 'Complaint123',
+    ':sk': 'comm#'
+  }
+}).promise();
+```
+
+### Get all complaints for a customer
+
+```javascript
+const customerComplaints = await docClient.query({
+  TableName: 'Complaint_management_system',
+  IndexName: 'Customer_Complaint_GSI',
+  KeyConditionExpression: 'customer_id = :custId',
+  ExpressionAttributeValues: {
+    ':custId': 'custXYZ'
+  }
+}).promise();
+```
+
+### Get all escalated complaints for an agent
+
+```javascript
+const escalatedComplaints = await docClient.query({
+  TableName: 'Complaint_management_system',
+  IndexName: 'Escalations_GSI',
+  KeyConditionExpression: 'escalated_to = :agentId',
+  ExpressionAttributeValues: {
+    ':agentId': 'AgentB'
+  }
+}).promise();
+```
+
+## Goals
+
+- Efficiently manage customer complaints and related communications
+- Track complaint status changes and escalations
+- Enable efficient querying by customer, agent, or escalation status
+- Ensure scalability using Amazon DynamoDB's single-table design principles
+
+## Schema Design
+
+A comprehensive schema design is included, demonstrating how different entities and access patterns map to the DynamoDB table structure. [ComplaintManagementSchema.json](https://github.com/aws-samples/aws-dynamodb-examples/blob/master/schema_design/SchemaExamples/ComplainManagement/ComplaintManagementSchema.json)
+
+## Design Considerations
+
+1. **Single-Table Design**: All complaint data is stored in a single table to minimize latency and simplify operations.
+2. **Chronological Sorting**: Communications are automatically sorted by timestamp due to the SK format.
+3. **Flexible Attributes**: The schema accommodates various complaint types and communication formats.
+4. **Efficient Querying**: GSIs enable efficient access to data by customer, agent, or escalation status.
+5. **Scalability**: The schema is designed to handle a growing number of complaints and communications without performance degradation.