Skip to content

Commit bc9ef95

Browse files
author
Imhoertha Ojior
committed
Enhance PrintDistinctPKs README.md with detailed table data models and language-specific instructions
1 parent fb5ef41 commit bc9ef95

File tree

1 file changed

+327
-1
lines changed

1 file changed

+327
-1
lines changed

scripts/PrintDistinctPKs/README.md

Lines changed: 327 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,327 @@
1-
# Update Me!
1+
# Print Distinct Partition Keys
2+
3+
This directory contains tools for working with partition keys in DynamoDB tables, including utilities to print distinct partition keys, load random test data, and test maximum values for different attribute types.
4+
5+
## Directory Structure
6+
7+
### 1. [Printer](./Printer)
8+
Scripts in multiple programming languages to scan a DynamoDB table and print distinct partition keys.
9+
10+
- **Java**: Implementation in Java
11+
- **Node.js**: Implementation in JavaScript for Node.js
12+
- **Python**: Implementation in Python
13+
14+
These scripts help you analyze the distribution of data across partition keys, which is useful for identifying potential hot partitions and optimizing table design.
15+
16+
#### Table Data Model for Printer Scripts
17+
18+
The Printer scripts are designed to work with any DynamoDB table that has a composite key (partition key and sort key). The scripts dynamically determine the key structure from the table's schema:
19+
20+
```
21+
TableName: <any-table-name>
22+
KeySchema:
23+
- AttributeName: pk
24+
KeyType: HASH
25+
- AttributeName: sk
26+
KeyType: RANGE
27+
AttributeDefinitions:
28+
- AttributeName: pk
29+
AttributeType: S
30+
- AttributeName: sk
31+
AttributeType: S
32+
```
33+
34+
The scripts support tables with sort keys of any of the three supported DynamoDB key types:
35+
- String (S)
36+
- Number (N)
37+
- Binary (B)
38+
39+
The Printer scripts:
40+
1. Determines the partition key and sort key names from the table's key schema
41+
2. Identifies the sort key's data type
42+
3. Uses the appropriate maximum value for the sort key type when scanning
43+
4. Efficiently retrieves only distinct partition key values
44+
45+
### Using the Printer Scripts
46+
47+
### Prerequisites
48+
- AWS CLI configured with appropriate credentials
49+
- Language-specific dependencies (Java, Node.js, or Python) depending on which scripts you want to use
50+
51+
52+
Each language implementation provides the same functionality but with language-specific setup and execution steps:
53+
54+
#### Java Implementation
55+
56+
1. Navigate to the Java directory:
57+
```
58+
cd Printer/java
59+
```
60+
61+
2. Build the project using Maven:
62+
```
63+
mvn clean package
64+
```
65+
66+
3. Run the application:
67+
```
68+
java -jar target/PrintDistinctPKs-1.0-SNAPSHOT.jar --table-name <your-table-name> --region <your-aws-region>
69+
```
70+
71+
4. Alternatively, use Docker:
72+
```
73+
docker build -t print-distinct-pks .
74+
75+
docker run --rm -it \
76+
-e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
77+
-e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
78+
-e AWS_DEFAULT_REGION=<your-aws-region> \
79+
-e DYNAMODB_TABLE_NAME=<your-table-name> \
80+
print-distinct-pks
81+
```
82+
83+
#### Node.js Implementation
84+
85+
1. Navigate to the Node.js directory:
86+
```
87+
cd Printer/nodejs
88+
```
89+
90+
2. Install dependencies:
91+
```
92+
npm install
93+
```
94+
95+
3. Run the script:
96+
```
97+
node print_distinct_pks.js --region <your-aws-region> --table-name <your-table-name>
98+
```
99+
100+
#### Python Implementation
101+
102+
1. Navigate to the Python directory:
103+
```
104+
cd Printer/python
105+
```
106+
107+
2. Run the script:
108+
```
109+
python print_distinct_pks.py --region <your-aws-region> --table-name <your-table-name>
110+
```
111+
112+
### 2. [RandomLoader](./RandomLoader)
113+
A Python script (`load_random_data.py`) that generates and loads random test data into DynamoDB tables.
114+
115+
Key features:
116+
- Creates tables with different sort key types (string, number, binary)
117+
- Generates random partition keys and sort keys
118+
- Configurable number of items per partition key
119+
- Useful for testing and benchmarking DynamoDB performance
120+
121+
#### Table Data Models for RandomLoader
122+
123+
The RandomLoader script creates three tables with different sort key types:
124+
125+
1. **String Sort Key Table (`sk-str-test-data`)**
126+
```
127+
TableName: sk-str-test-data
128+
KeySchema:
129+
- AttributeName: pk
130+
KeyType: HASH
131+
- AttributeName: sk
132+
KeyType: RANGE
133+
AttributeDefinitions:
134+
- AttributeName: pk
135+
AttributeType: S
136+
- AttributeName: sk
137+
AttributeType: S
138+
BillingMode: PAY_PER_REQUEST
139+
```
140+
141+
2. **Number Sort Key Table (`sk-num-test-data`)**
142+
```
143+
TableName: sk-num-test-data
144+
KeySchema:
145+
- AttributeName: pk
146+
KeyType: HASH
147+
- AttributeName: sk
148+
KeyType: RANGE
149+
AttributeDefinitions:
150+
- AttributeName: pk
151+
AttributeType: S
152+
- AttributeName: sk
153+
AttributeType: N
154+
BillingMode: PAY_PER_REQUEST
155+
```
156+
157+
3. **Binary Sort Key Table (`sk-bin-test-data`)**
158+
```
159+
TableName: sk-bin-test-data
160+
KeySchema:
161+
- AttributeName: pk
162+
KeyType: HASH
163+
- AttributeName: sk
164+
KeyType: RANGE
165+
AttributeDefinitions:
166+
- AttributeName: pk
167+
AttributeType: S
168+
- AttributeName: sk
169+
AttributeType: B
170+
BillingMode: PAY_PER_REQUEST
171+
```
172+
173+
Each table is populated with random data:
174+
- Random string partition keys (10 characters)
175+
- Between 1 and 10 items per partition key
176+
- Sort keys appropriate for each table type (string, number, or binary)
177+
- Total of approximately 5,000 items per table
178+
179+
180+
### Using the RandomLoader
181+
1. Navigate to the RandomLoader directory
182+
2. Review and modify the configuration variables at the top of `load_random_data.py` as needed
183+
3. Run the script: `python load_random_data.py --region <your-aws-region>`
184+
185+
186+
### 3. [LoadMaxValues](./LoadMaxValues)
187+
Scripts to test the maximum values for different attribute types in DynamoDB.
188+
189+
- **Java**: Implementation in Java
190+
- **Node.js**: Implementation in JavaScript for Node.js
191+
- **Python**: Implementation in Python
192+
193+
These scripts are useful for understanding the limits of DynamoDB's data types and ensuring your application handles edge cases correctly.
194+
195+
#### Table Data Models for LoadMaxValues
196+
197+
The LoadMaxValues scripts create three tables to test maximum values for different sort key types:
198+
199+
1. **Maximum String Sort Key Table (`max-str-sk-test-python`)**
200+
```
201+
TableName: max-str-sk-test-python
202+
KeySchema:
203+
- AttributeName: pk
204+
KeyType: HASH
205+
- AttributeName: sk
206+
KeyType: RANGE
207+
AttributeDefinitions:
208+
- AttributeName: pk
209+
AttributeType: S
210+
- AttributeName: sk
211+
AttributeType: S
212+
BillingMode: PAY_PER_REQUEST
213+
```
214+
- Tests with maximum string value: 256 repetitions of the maximum Unicode code point
215+
216+
2. **Maximum Number Sort Key Table (`max-num-sk-test-python`)**
217+
```
218+
TableName: max-num-sk-test-python
219+
KeySchema:
220+
- AttributeName: pk
221+
KeyType: HASH
222+
- AttributeName: sk
223+
KeyType: RANGE
224+
AttributeDefinitions:
225+
- AttributeName: pk
226+
AttributeType: S
227+
- AttributeName: sk
228+
AttributeType: N
229+
BillingMode: PAY_PER_REQUEST
230+
```
231+
- Tests with maximum number value: 9.9999999999999999999999999999999999999E+125
232+
233+
3. **Maximum Binary Sort Key Table (`max-bin-sk-test-python`)**
234+
```
235+
TableName: max-bin-sk-test-python
236+
KeySchema:
237+
- AttributeName: pk
238+
KeyType: HASH
239+
- AttributeName: sk
240+
KeyType: RANGE
241+
AttributeDefinitions:
242+
- AttributeName: pk
243+
AttributeType: S
244+
- AttributeName: sk
245+
AttributeType: B
246+
BillingMode: PAY_PER_REQUEST
247+
```
248+
- Tests with maximum binary value: 1024 bytes of 0xFF
249+
250+
Each table contains a single item with a fixed partition key ("sample-pk-value") and a sort key set to the maximum value for its data type.
251+
252+
## Use Cases
253+
254+
1. **Analyze Partition Key Distribution**
255+
- Identify potential hot partitions
256+
- Verify that your partition key design distributes data evenly
257+
258+
2. **Generate Test Data**
259+
- Create test tables with specific characteristics
260+
- Populate tables with random data for performance testing
261+
262+
3. **Test DynamoDB Limits**
263+
- Verify how your application handles maximum values
264+
- Understand the practical limits of different DynamoDB data types
265+
266+
### Using the LoadMaxValues Scripts
267+
268+
The LoadMaxValues scripts create tables and test maximum values for different attribute types in DynamoDB. Here are instructions for running the implementations in different languages:
269+
270+
#### Java Implementation
271+
272+
1. Navigate to the Java directory:
273+
```
274+
cd LoadMaxValues/java
275+
```
276+
277+
2. Build the project using Maven:
278+
```
279+
mvn clean package
280+
```
281+
282+
3. Run the application:
283+
```
284+
java -jar target/load-max-values-1.0.jar --region <your-aws-region>
285+
```
286+
287+
4. Alternatively, use Docker:
288+
```
289+
docker build -t load-max-values .
290+
291+
docker run --rm -it \
292+
-e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
293+
-e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
294+
-e AWS_DEFAULT_REGION=<your-aws-region> \
295+
load-max-values
296+
```
297+
298+
#### Python Implementation
299+
300+
1. Navigate to the Python directory:
301+
```
302+
cd LoadMaxValues/python
303+
```
304+
305+
2. Run the script:
306+
```
307+
python load_max_values.py --region <your-aws-region>
308+
```
309+
310+
#### Node.js Implementation
311+
312+
1. Navigate to the Node.js directory:
313+
```
314+
cd LoadMaxValues/nodejs
315+
```
316+
317+
2. Install dependencies:
318+
```
319+
npm install
320+
```
321+
322+
3. Run the script:
323+
```
324+
node load_max_values.js --region <your-aws-region>
325+
```
326+
327+
The scripts will create three tables with different sort key types (string, number, binary) and insert items with maximum values for each type.

0 commit comments

Comments
 (0)