A flexible synthetic data generator in Go, designed to produce realistic logs and metrics for testing and development.
Requires Go version 1.24+
- Clone the repository (alternatively download the latest release)
- Copy
config.sample.yamland edit as needed. - Start the data generator with config file as the only parameter
go run cmd/main.go --config ./config.yaml
Tip
Use --debug flag for debug level logs.
Given below are the supported configuration options.
Check config.sample.yaml for reference.
Given below are supported input types and their related environment variable overrides,
| YAML Property | Environment Variable | Description |
|---|---|---|
type |
ENV_INPUT_TYPE |
Specifies the input data type (e.g., LOGS, METRICS, ALB, NLB, VPC, CLOUDTRAIL, WAF). |
delay |
ENV_INPUT_DELAY |
Delay between a data point. Accepts value in format like 5s (5 seconds), 10ms (10 milliseconds). |
batching |
ENV_INPUT_BATCHING |
Set time delay between data batches. Accepts a time value similar to delay. Default is set to 0 (no batching). |
max_batch_size |
ENV_INPUT_MAX_BATCH_SIZE |
Set maximum byte size of a batch. Default is to ignore (no max size). |
max_data_points |
ENV_INPUT_MAX_DATA_POINTS |
Set maximum amount of data points to generate. Default is to ignore (no max limit). |
max_runtime |
ENV_INPUT_MAX_RUNTIME |
Set maximum duration load generator will run. Default is to ignore (no max limit). |
Note about input type,
LOGS: ECS (Elastic Common Schema) formatted logs based on zapMETRICS: Generate metrics similar to a CloudWatch metrics entryALB: Generate AWS ALB formatted logs with some random contentNLB: Generate AWS NLB formatted logs with some random contentVPC: Generate AWS VPC formatted logs with randomized contentCLOUDTRAIL: Generate AWS CloudTrail formatted logs with randomized content. Data is generated for AWS S3 Data Event.WAF: Generate AWS WAF formatted logs with randomized content
Example:
input:
type: LOGS # Input type LOGS
delay: 500ms # 500 milliseconds between each data point
batching: 10s # Emit generated data batched within 10 seconds
max_batch_size: 10000 # Limit maximum batch size to 10,000 bytes. The output is capped at 1000 bytes/second max
max_data_points: 10000 # Exit input after generating 10,000 data pointsTip
When max_batch_size is reached, elapsed time for batching will be considered before generating new data
Given below are supported output types (environment variable ENV_OUT_TYPE),
- FILE: Output to a file
- FIREHOSE: Output to a Firehose stream
- CLOUDWATCH_LOG: Output to a CloudWatch log group
- S3: Output to a S3 bucket
Sections below provide output specific configurations
| YAML Property | Environment Variable | Description |
|---|---|---|
location |
ENV_OUT_LOCATION |
Output file location. Default to ./out. When batching, file suffix will increment with numbers (e.g., out_0, out_2). |
Example:
output:
type: FILE
config:
location: "./data"| YAML Property | Environment Variable | Description |
|---|---|---|
s3_bucket |
ENV_OUT_S3_BUCKET |
S3 bucket name (required). |
compression |
ENV_OUT_COMPRESSION |
To compress or not the output. Currently supports gzip. |
path_prefix |
ENV_OUT_PATH_PREFIX |
Optional prefix for the bucket entry. Default to logFile-. |
Example:
output:
type: S3
config:
s3_bucket: "testing-bucket"
compression: gzip
path_prefix: "datagen"| YAML Property | Environment Variable | Description |
|---|---|---|
stream_name |
ENV_OUT_STREAM_NAME |
Firehose stream name (required). |
Example:
output:
type: FIREHOSE
config:
stream_name: "my-firehose-stream"| YAML Property | Environment Variable | Description |
|---|---|---|
log_group |
ENV_OUT_LOG_GROUP |
CloudWatch log group name. |
log_stream |
ENV_OUT_LOG_STREAM |
Log group stream name. |
Example:
output:
type: CLOUDWATCH_LOG
config:
logGroup: "MyGroup"
logStream: "data"Currently, this project only support AWS Cloud Service Provider (CSP). Given below are available configurations,
| YAML Property | Environment Variable | Description |
|---|---|---|
region |
AWS_REGION |
Region to use by exporters. Default is us-east-1. |
profile |
AWS_PROFILE |
Credential profile to use by exporters. Default is default. |
Example:
aws:
region: "us-east-1"
profile: "default"Generate ECS-formatted logs every 2s, batch them in 10 seconds and forward to S3 bucket
input:
type: LOGS
delay: 2s
batching: 10s
output:
type: s3
config:
s3_bucket: "testing-bucket"Generate ALB logs. No delay between data points (continuous data generating).
Limit batching to 10 seconds and max batch size is set to 10MB. This translates to ~1 MB/second data load.
S3 files will be in gzip format.
input:
type: ALB
delay: 0s
batching: 10s
max_batch_size: 10000000
output:
type: s3
config:
s3_bucket: "testing-bucket"
compression: "gzip"Generate VPC logs and limit to 2 data points. Then upload it to S3 in gzip format.
input:
type: VPC
delay: 1s
max_data_points: 2
output:
type: s3
config:
s3_bucket: "testing-bucket"
compression: "gzip"Generate CLOUDTRAIL logs and limit to generator runtime of 5 minutes.
input:
type: CLOUDTRAIL
delay: 10us # 10 microseconds between data points
batching: 10s
max_runtime: 5m # 5 minutes
output:
type: s3
config:
s3_bucket: "testing-bucket"
compression: "gzip"