|
1 | | -# Update Me! |
| 1 | +# Print Distinct Partition Keys |
| 2 | + |
| 3 | +This directory contains tools for working with partition keys in DynamoDB tables, including utilities to print distinct partition keys, load random test data, and test maximum values for different attribute types. |
| 4 | + |
| 5 | +## Directory Structure |
| 6 | + |
| 7 | +### 1. [Printer](./Printer) |
| 8 | +Scripts in multiple programming languages to scan a DynamoDB table and print distinct partition keys. |
| 9 | + |
| 10 | +- **Java**: Implementation in Java |
| 11 | +- **Node.js**: Implementation in JavaScript for Node.js |
| 12 | +- **Python**: Implementation in Python |
| 13 | + |
| 14 | +These scripts help you analyze the distribution of data across partition keys, which is useful for identifying potential hot partitions and optimizing table design. |
| 15 | + |
| 16 | +#### Table Data Model for Printer Scripts |
| 17 | + |
| 18 | +The Printer scripts are designed to work with any DynamoDB table that has a composite key (partition key and sort key). The scripts dynamically determine the key structure from the table's schema: |
| 19 | + |
| 20 | +``` |
| 21 | +TableName: <any-table-name> |
| 22 | +KeySchema: |
| 23 | + - AttributeName: pk |
| 24 | + KeyType: HASH |
| 25 | + - AttributeName: sk |
| 26 | + KeyType: RANGE |
| 27 | +AttributeDefinitions: |
| 28 | + - AttributeName: pk |
| 29 | + AttributeType: S |
| 30 | + - AttributeName: sk |
| 31 | + AttributeType: S |
| 32 | +``` |
| 33 | + |
| 34 | +The scripts support tables with sort keys of any of the three supported DynamoDB key types: |
| 35 | +- String (S) |
| 36 | +- Number (N) |
| 37 | +- Binary (B) |
| 38 | + |
| 39 | +The Printer scripts: |
| 40 | +1. Determines the partition key and sort key names from the table's key schema |
| 41 | +2. Identifies the sort key's data type |
| 42 | +3. Uses the appropriate maximum value for the sort key type when scanning |
| 43 | +4. Efficiently retrieves only distinct partition key values |
| 44 | + |
| 45 | +### Using the Printer Scripts |
| 46 | + |
| 47 | +### Prerequisites |
| 48 | +- AWS CLI configured with appropriate credentials |
| 49 | +- Language-specific dependencies (Java, Node.js, or Python) depending on which scripts you want to use |
| 50 | + |
| 51 | + |
| 52 | +Each language implementation provides the same functionality but with language-specific setup and execution steps: |
| 53 | + |
| 54 | +#### Java Implementation |
| 55 | + |
| 56 | +1. Navigate to the Java directory: |
| 57 | + ``` |
| 58 | + cd Printer/java |
| 59 | + ``` |
| 60 | + |
| 61 | +2. Build the project using Maven: |
| 62 | + ``` |
| 63 | + mvn clean package |
| 64 | + ``` |
| 65 | + |
| 66 | +3. Run the application: |
| 67 | + ``` |
| 68 | + java -jar target/PrintDistinctPKs-1.0-SNAPSHOT.jar --table-name <your-table-name> --region <your-aws-region> |
| 69 | + ``` |
| 70 | + |
| 71 | +4. Alternatively, use Docker: |
| 72 | + ``` |
| 73 | + docker build -t print-distinct-pks . |
| 74 | +
|
| 75 | + docker run --rm -it \ |
| 76 | + -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \ |
| 77 | + -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \ |
| 78 | + -e AWS_DEFAULT_REGION=<your-aws-region> \ |
| 79 | + -e DYNAMODB_TABLE_NAME=<your-table-name> \ |
| 80 | + print-distinct-pks |
| 81 | + ``` |
| 82 | + |
| 83 | +#### Node.js Implementation |
| 84 | + |
| 85 | +1. Navigate to the Node.js directory: |
| 86 | + ``` |
| 87 | + cd Printer/nodejs |
| 88 | + ``` |
| 89 | + |
| 90 | +2. Install dependencies: |
| 91 | + ``` |
| 92 | + npm install |
| 93 | + ``` |
| 94 | + |
| 95 | +3. Run the script: |
| 96 | + ``` |
| 97 | + node print_distinct_pks.js --region <your-aws-region> --table-name <your-table-name> |
| 98 | + ``` |
| 99 | + |
| 100 | +#### Python Implementation |
| 101 | + |
| 102 | +1. Navigate to the Python directory: |
| 103 | + ``` |
| 104 | + cd Printer/python |
| 105 | + ``` |
| 106 | + |
| 107 | +2. Run the script: |
| 108 | + ``` |
| 109 | + python print_distinct_pks.py --region <your-aws-region> --table-name <your-table-name> |
| 110 | + ``` |
| 111 | + |
| 112 | +### 2. [RandomLoader](./RandomLoader) |
| 113 | +A Python script (`load_random_data.py`) that generates and loads random test data into DynamoDB tables. |
| 114 | + |
| 115 | +Key features: |
| 116 | +- Creates tables with different sort key types (string, number, binary) |
| 117 | +- Generates random partition keys and sort keys |
| 118 | +- Configurable number of items per partition key |
| 119 | +- Useful for testing and benchmarking DynamoDB performance |
| 120 | + |
| 121 | +#### Table Data Models for RandomLoader |
| 122 | + |
| 123 | +The RandomLoader script creates three tables with different sort key types: |
| 124 | + |
| 125 | +1. **String Sort Key Table (`sk-str-test-data`)** |
| 126 | + ``` |
| 127 | + TableName: sk-str-test-data |
| 128 | + KeySchema: |
| 129 | + - AttributeName: pk |
| 130 | + KeyType: HASH |
| 131 | + - AttributeName: sk |
| 132 | + KeyType: RANGE |
| 133 | + AttributeDefinitions: |
| 134 | + - AttributeName: pk |
| 135 | + AttributeType: S |
| 136 | + - AttributeName: sk |
| 137 | + AttributeType: S |
| 138 | + BillingMode: PAY_PER_REQUEST |
| 139 | + ``` |
| 140 | + |
| 141 | +2. **Number Sort Key Table (`sk-num-test-data`)** |
| 142 | + ``` |
| 143 | + TableName: sk-num-test-data |
| 144 | + KeySchema: |
| 145 | + - AttributeName: pk |
| 146 | + KeyType: HASH |
| 147 | + - AttributeName: sk |
| 148 | + KeyType: RANGE |
| 149 | + AttributeDefinitions: |
| 150 | + - AttributeName: pk |
| 151 | + AttributeType: S |
| 152 | + - AttributeName: sk |
| 153 | + AttributeType: N |
| 154 | + BillingMode: PAY_PER_REQUEST |
| 155 | + ``` |
| 156 | + |
| 157 | +3. **Binary Sort Key Table (`sk-bin-test-data`)** |
| 158 | + ``` |
| 159 | + TableName: sk-bin-test-data |
| 160 | + KeySchema: |
| 161 | + - AttributeName: pk |
| 162 | + KeyType: HASH |
| 163 | + - AttributeName: sk |
| 164 | + KeyType: RANGE |
| 165 | + AttributeDefinitions: |
| 166 | + - AttributeName: pk |
| 167 | + AttributeType: S |
| 168 | + - AttributeName: sk |
| 169 | + AttributeType: B |
| 170 | + BillingMode: PAY_PER_REQUEST |
| 171 | + ``` |
| 172 | + |
| 173 | +Each table is populated with random data: |
| 174 | +- Random string partition keys (10 characters) |
| 175 | +- Between 1 and 10 items per partition key |
| 176 | +- Sort keys appropriate for each table type (string, number, or binary) |
| 177 | +- Total of approximately 5,000 items per table |
| 178 | + |
| 179 | + |
| 180 | +### Using the RandomLoader |
| 181 | +1. Navigate to the RandomLoader directory |
| 182 | +2. Review and modify the configuration variables at the top of `load_random_data.py` as needed |
| 183 | +3. Run the script: `python load_random_data.py --region <your-aws-region>` |
| 184 | + |
| 185 | + |
| 186 | +### 3. [LoadMaxValues](./LoadMaxValues) |
| 187 | +Scripts to test the maximum values for different attribute types in DynamoDB. |
| 188 | + |
| 189 | +- **Java**: Implementation in Java |
| 190 | +- **Node.js**: Implementation in JavaScript for Node.js |
| 191 | +- **Python**: Implementation in Python |
| 192 | + |
| 193 | +These scripts are useful for understanding the limits of DynamoDB's data types and ensuring your application handles edge cases correctly. |
| 194 | + |
| 195 | +#### Table Data Models for LoadMaxValues |
| 196 | + |
| 197 | +The LoadMaxValues scripts create three tables to test maximum values for different sort key types: |
| 198 | + |
| 199 | +1. **Maximum String Sort Key Table (`max-str-sk-test-python`)** |
| 200 | + ``` |
| 201 | + TableName: max-str-sk-test-python |
| 202 | + KeySchema: |
| 203 | + - AttributeName: pk |
| 204 | + KeyType: HASH |
| 205 | + - AttributeName: sk |
| 206 | + KeyType: RANGE |
| 207 | + AttributeDefinitions: |
| 208 | + - AttributeName: pk |
| 209 | + AttributeType: S |
| 210 | + - AttributeName: sk |
| 211 | + AttributeType: S |
| 212 | + BillingMode: PAY_PER_REQUEST |
| 213 | + ``` |
| 214 | + - Tests with maximum string value: 256 repetitions of the maximum Unicode code point |
| 215 | + |
| 216 | +2. **Maximum Number Sort Key Table (`max-num-sk-test-python`)** |
| 217 | + ``` |
| 218 | + TableName: max-num-sk-test-python |
| 219 | + KeySchema: |
| 220 | + - AttributeName: pk |
| 221 | + KeyType: HASH |
| 222 | + - AttributeName: sk |
| 223 | + KeyType: RANGE |
| 224 | + AttributeDefinitions: |
| 225 | + - AttributeName: pk |
| 226 | + AttributeType: S |
| 227 | + - AttributeName: sk |
| 228 | + AttributeType: N |
| 229 | + BillingMode: PAY_PER_REQUEST |
| 230 | + ``` |
| 231 | + - Tests with maximum number value: 9.9999999999999999999999999999999999999E+125 |
| 232 | + |
| 233 | +3. **Maximum Binary Sort Key Table (`max-bin-sk-test-python`)** |
| 234 | + ``` |
| 235 | + TableName: max-bin-sk-test-python |
| 236 | + KeySchema: |
| 237 | + - AttributeName: pk |
| 238 | + KeyType: HASH |
| 239 | + - AttributeName: sk |
| 240 | + KeyType: RANGE |
| 241 | + AttributeDefinitions: |
| 242 | + - AttributeName: pk |
| 243 | + AttributeType: S |
| 244 | + - AttributeName: sk |
| 245 | + AttributeType: B |
| 246 | + BillingMode: PAY_PER_REQUEST |
| 247 | + ``` |
| 248 | + - Tests with maximum binary value: 1024 bytes of 0xFF |
| 249 | + |
| 250 | +Each table contains a single item with a fixed partition key ("sample-pk-value") and a sort key set to the maximum value for its data type. |
| 251 | + |
| 252 | +## Use Cases |
| 253 | + |
| 254 | +1. **Analyze Partition Key Distribution** |
| 255 | + - Identify potential hot partitions |
| 256 | + - Verify that your partition key design distributes data evenly |
| 257 | + |
| 258 | +2. **Generate Test Data** |
| 259 | + - Create test tables with specific characteristics |
| 260 | + - Populate tables with random data for performance testing |
| 261 | + |
| 262 | +3. **Test DynamoDB Limits** |
| 263 | + - Verify how your application handles maximum values |
| 264 | + - Understand the practical limits of different DynamoDB data types |
| 265 | + |
| 266 | +### Using the LoadMaxValues Scripts |
| 267 | + |
| 268 | +The LoadMaxValues scripts create tables and test maximum values for different attribute types in DynamoDB. Here are instructions for running the implementations in different languages: |
| 269 | + |
| 270 | +#### Java Implementation |
| 271 | + |
| 272 | +1. Navigate to the Java directory: |
| 273 | + ``` |
| 274 | + cd LoadMaxValues/java |
| 275 | + ``` |
| 276 | + |
| 277 | +2. Build the project using Maven: |
| 278 | + ``` |
| 279 | + mvn clean package |
| 280 | + ``` |
| 281 | + |
| 282 | +3. Run the application: |
| 283 | + ``` |
| 284 | + java -jar target/load-max-values-1.0.jar --region <your-aws-region> |
| 285 | + ``` |
| 286 | + |
| 287 | +4. Alternatively, use Docker: |
| 288 | + ``` |
| 289 | + docker build -t load-max-values . |
| 290 | +
|
| 291 | + docker run --rm -it \ |
| 292 | + -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \ |
| 293 | + -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \ |
| 294 | + -e AWS_DEFAULT_REGION=<your-aws-region> \ |
| 295 | + load-max-values |
| 296 | + ``` |
| 297 | + |
| 298 | +#### Python Implementation |
| 299 | + |
| 300 | +1. Navigate to the Python directory: |
| 301 | + ``` |
| 302 | + cd LoadMaxValues/python |
| 303 | + ``` |
| 304 | + |
| 305 | +2. Run the script: |
| 306 | + ``` |
| 307 | + python load_max_values.py --region <your-aws-region> |
| 308 | + ``` |
| 309 | + |
| 310 | +#### Node.js Implementation |
| 311 | + |
| 312 | +1. Navigate to the Node.js directory: |
| 313 | + ``` |
| 314 | + cd LoadMaxValues/nodejs |
| 315 | + ``` |
| 316 | + |
| 317 | +2. Install dependencies: |
| 318 | + ``` |
| 319 | + npm install |
| 320 | + ``` |
| 321 | + |
| 322 | +3. Run the script: |
| 323 | + ``` |
| 324 | + node load_max_values.js --region <your-aws-region> |
| 325 | + ``` |
| 326 | + |
| 327 | +The scripts will create three tables with different sort key types (string, number, binary) and insert items with maximum values for each type. |
0 commit comments