Skip to content

Commit 531d23a

Browse files
authored
feat(demos): add eks benchmark demo (#89)
* feat(demos): add eks benchmark demo * remove extra comment * use eks profile * improve automq eks benchmark docs * add conclusion for benchmark * refactor: simplefy and consolidate the TF steps. * Validate and adjust the Terraform code * Complete and enhance the README. * Restore the mistakenly deleted code * Optimize the document following the suggestions. * Add a description of how to access the observability system. * add eks access method in readme * add random resource_suffix * Optimize the Readme based on actual deployment experience. * Launch the Grafana dashboard with one click using TF. * Adjust the README according to the actual creation situation. * Modify the Terraform configuration file for automatically creating Grafana dashboards * Add an automation script to modify and create automq configurations. * Optimize the readme by changing the automq node group to an ondemand instance. * Docs(benchmark): Improve testing guide and clarify endpoint usage Refactored the benchmark documentation to enhance clarity and provide better guidance for testers: * Specify mandatory `endpoint` parameter. * Documented the current throughput limit for client expectation setting. * Added steps for traffic repetition. * Outlined required validation checks: Dashboard EKS (Step 1) and subsequent checks (Step 2). * Optimized path instructions in the README.
1 parent bf809a0 commit 531d23a

File tree

25 files changed

+1555
-2
lines changed

25 files changed

+1555
-2
lines changed
Lines changed: 256 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,256 @@
1+
# AutoMQ Quick Setup & Benchmark
2+
3+
Deploying a complete AutoMQ cluster on AWS traditionally involves multiple, complex steps, from setting up the control
4+
and data planes to manually configuring a separate observability environment and benchmarking tools.
5+
6+
This project eliminates that complexity. It is designed to provide a seamless, one-click solution using Terraform to
7+
automatically provision an entire AutoMQ ecosystem on AWS.
8+
9+
The primary goal is to empower users to effortlessly spin up a fully operational, observable, and testable AutoMQ
10+
cluster, drastically reducing setup time and manual configuration.
11+
12+
## Overview
13+
14+
This project follows a simple, three-step end-to-end flow to go from infrastructure to benchmarking with minimal manual
15+
work:
16+
17+
The resources that need to be installed this time include EKS, along with three corresponding node groups, about 10 EC2
18+
instance, and their AutoMQ Console nodes.
19+
20+
1) Provision with Terraform: bring up the required components — `EKS`, `AutoMQ Console` (BYOC control plane), and the
21+
observability stack (Prometheus/Grafana). After this step, the Kubernetes cluster and monitoring environment are
22+
ready.
23+
24+
2) Configure in AutoMQ Console and create the cluster: create the required `Profile` and credentials in the Console,
25+
then create or connect your `AutoMQ Cluster` (BYOC). Use these values in the subsequent Terraform/Helm configuration
26+
to enable connectivity.
27+
28+
3) Run benchmarks via the provided Helm chart: go to `automq-benchmark-chart`, set connection details and workload
29+
parameters (topics, partitions, message size, concurrency, etc.), deploy the benchmark Job, and observe throughput
30+
and latency in Grafana.
31+
32+
## Architecture
33+
34+
![architecture](./architecture.png)
35+
36+
## Prerequisites
37+
38+
Before using this project, ensure you have:
39+
40+
### Required Tools
41+
42+
- **Terraform** (>= 1.0)
43+
- **kubectl** configured for your EKS cluster
44+
- **Helm** (>= 3.0)
45+
- **AWS CLI** configured with appropriate permissions
46+
47+
### Environment Setup
48+
49+
To ensure all commands execute correctly, set the `BASE_DIR` environment variable to the root directory of this
50+
repository:
51+
52+
```bash
53+
export BASE_DIR=$(pwd)
54+
```
55+
56+
### Required Permissions
57+
58+
- EKS cluster management permissions
59+
- EC2 instance and networking permissions
60+
- IAM role management permissions
61+
- S3 bucket access (for AutoMQ data storage)
62+
63+
## Quick Start
64+
65+
### Step 1: Deploy Benchmark Infrastructure
66+
67+
This step provisions and integrates everything via Terraform in `./terraform`:
68+
69+
- EKS cluster (creating and configuring required `VPC`, subnets, `Security Group`, `IAM`, and related
70+
networking/permission resources)
71+
- AutoMQ BYOC Console (deployed in the same VPC public subnet, with access and security integrated to the EKS cluster)
72+
- Observability stack (Prometheus/Grafana) installed via Helm `kube-prometheus-stack` for collecting and visualizing
73+
benchmark metrics
74+
75+
All necessary cloud resources (including networking and object storage such as `S3`) will be newly created and wired up
76+
in this step.
77+
78+
1. Plan the Deployment Run terraform plan to preview the resources that will be created.
79+
80+
Tip: To control resource naming and avoid conflicts, set `resource_suffix` in `./terraform/variables.tf`.
81+
82+
```bash
83+
cd $BASE_DIR/cloudservice-setup/aws/eks-benchmark/terraform
84+
terraform init
85+
terraform plan
86+
```
87+
88+
2. Apply the Deployment After reviewing the plan, execute terraform apply to begin the deployment. This process may take
89+
25-30 minutes.
90+
91+
```bash
92+
cd $BASE_DIR/cloudservice-setup/aws/eks-benchmark/terraform
93+
terraform apply
94+
```
95+
96+
Enter yes at the prompt to confirm.
97+
98+
Upon successful deployment, Terraform will display the following outputs. You can also retrieve them at any time using
99+
the `terraform output` command:
100+
101+
| Name | Description |
102+
|-----------------------------------|------------------------------------------------------------|
103+
| `console_endpoint` | The endpoint URL for the AutoMQ BYOC Console. |
104+
| `initial_username` | The initial username for logging into the Console. |
105+
| `initial_password` | The initial password for logging into the Console. |
106+
| `cluster_name` | The name of the created EKS cluster. |
107+
| `node_group_instance_profile_arn` | The IAM Instance Profile ARN used by the EKS node group. |
108+
| `dns_zone_id` | The Route 53 DNS Zone ID created for the BYOC environment. |
109+
| `vpc_id` | The ID of the VPC created for the environment. |
110+
| `env_id` | The ID of the AutoMQ environment. |
111+
| `data_bucket` | The S3 data bucket of the AutoMQ environment. |
112+
113+
Terraform will initiate the corresponding EKS-related nodes and the AutoMQ control plane, and create an AutoMQ cluster
114+
within EKS.
115+
116+
Please follow the steps below to ensure that all newly created resources can be accessed normally before proceeding to
117+
the next step.
118+
119+
#### Access Control Panel
120+
121+
You can use console_endpoint and initial_username/initial_password to log in to the AutoMQ Console.
122+
123+
#### Access EKS Cluster
124+
125+
To access the EKS cluster using this command, and the placeholders in the command can be replaced with the actual values
126+
obtained from the output above.
127+
128+
```bash
129+
cd $BASE_DIR/cloudservice-setup/aws/eks-benchmark/terraform
130+
REGION=$(terraform output -raw region)
131+
CLUSTER_NAME=$(terraform output -raw cluster_name)
132+
133+
aws eks update-kubeconfig --region $REGION --name $CLUSTER_NAME
134+
```
135+
136+
#### Access Grafana Dashboard
137+
138+
To visit the observability stack, use the following command to obtain the public endpoint of Grafana.
139+
The username is admin, and the password can be obtained through the command below. If you wish to change it, you can
140+
configure it in the `./terraform/monitoring/prometheus.yaml` file.
141+
142+
AutoMQ provides [grafana official dashboards](https://www.automq.com/docs/automq/observability/dashboard-configuration).
143+
In this example, Grafana dashboards come pre-installed with broker, topic, group, and cluster dashboards.
144+
145+
Terraform will help you create these dashboards in Grafana. If you need further guidance, please feel free
146+
to [contact the AutoMQ team](https://www.automq.com/contact).
147+
148+
```bash
149+
# Get the public endpoint of Grafana. Please make sure to use the HTTP protocol for access.
150+
kubectl get service prometheus-grafana -n monitoring
151+
152+
# Get the Grafana password
153+
cd $BASE_DIR/cloudservice-setup/aws/eks-benchmark/terraform
154+
kubectl get secret prometheus-grafana -n monitoring -o jsonpath="{.data.admin-password}" | base64 --decode
155+
```
156+
157+
### Step 2: Deploy AutoMQ Instance
158+
159+
1.
160+
161+
Follow [Create a Service Account](https://www.automq.com/docs/automq-cloud/manage-identities-and-access/service-accounts#create-a-service-account)
162+
to create a Service Account and obtain the `Client ID` and `Client Secret` (Remember to save these two pieces of
163+
information, as you will need to enter them in the subsequent installation script).
164+
165+
For this service account, you need to select EnvironmentAdmin to easily create and manage resources.
166+
167+
2. In the AutoMQ Console, create a Deploy Profile named `eks` for the EKS environment.
168+
169+
Kubernetes Cluster, bucket name, DNS ZoneId and Node pool IAM Role ARN are all obtained from the output of the previous
170+
step.
171+
172+
When creating `Deploy Profiles`, in the second step `Configure IAM Authorization`, you do not need to perform the first
173+
and second sub-steps. You can directly copy the content of `node_group_instance_profile_arn` from the output into the
174+
input box.
175+
176+
Reference: [Create a Deploy Profile](https://www.automq.com/docs/automq-cloud/deploy-automq-on-kubernetes/deploy-to-aws-eks#step-12%3A-access-the-environment-console-and-create-deployment-configuration).
177+
178+
3. Fill variables `automq/terraform.tfvars` and apply Terraform to create the AutoMQ cluster with observability
179+
integration. You may need to wait approximately 5 to 10 minutes for the cluster to be fully created.
180+
181+
We have prepared a script for you, `modify-automq-tf-config.sh`, which automatically fills in the required variables.
182+
The file is located in the root directory of this example. You can execute this script, and it will automatically
183+
populate the necessary parameter information for you.
184+
185+
If you need further configuration, you can also refer to the comments and modify `automq/terraform.tfvars` directly.
186+
187+
```bash
188+
$BASE_DIR/modify-automq-tf-config.sh
189+
cd $BASE_DIR/cloudservice-setup/aws/eks-benchmark/automq
190+
terraform init
191+
terraform plan
192+
terraform apply
193+
```
194+
195+
### Step 3: Run Benchmark Tests
196+
197+
This step involves executing performance tests on your AutoMQ cluster using customizable workloads. The benchmark is
198+
designed to simulate Kafka usage patterns and allows you to adjust parameters like throughput, message size, topic
199+
configuration, and test duration. These tests generate comprehensive metrics that are automatically collected by your
200+
monitoring stack.
201+
202+
To begin, you need to update the bootstrapServer parameter to point to the endpoint of your current cluster, which can
203+
be found in the detailed cluster information from Step 2. In the values.yaml file, the default settings write 160
204+
messages per second, each 51 KiB in size (without batching), resulting in a write speed of 8 MiB/s.
205+
206+
For larger scale tests, you can modify the recordSize and sendRate parameters within the values.yaml file. For more
207+
details on Helm configuration options, please refer to the [README](./automq-benchmark-chart/README.md) located in the
208+
automq-benchmark folder.
209+
210+
With the current instance specifications and JVM parameter configurations, the setup can achieve approximately 200 MBps
211+
in a 1:1 production-to-consumption scenario, which should meet the performance testing needs of your 3-10 AKU AutoMQ
212+
cluster. If you need to further increase throughput, consider upgrading the machine type of the test node group and
213+
adjusting the JVM parameters. For more information, you can refer to the
214+
AutoMQ [blog](https://www.automq.com/blog/how-to-perform-a-performance-test-on-automq) or consult with AutoMQ product
215+
experts.
216+
217+
**Expected Result**: Benchmark jobs will run and generate load against the AutoMQ cluster. Performance metrics including
218+
throughput, latency, and resource utilization will be collected and visible in Grafana dashboards. You should see data
219+
flowing through the system and performance characteristics of your AutoMQ deployment.
220+
221+
1. **Configure benchmark parameters**:
222+
223+
```bash
224+
cd $BASE_DIR/cloudservice-setup/aws/eks-benchmark/helm-chart/automq-benchmark
225+
```
226+
227+
2. **Deploy benchmark workload**:
228+
229+
```bash
230+
helm install automq-benchmark . \
231+
--namespace default \
232+
--values values.yaml
233+
```
234+
235+
3. **View results in Grafana**:
236+
237+
After completing the above steps, you can see the corresponding metrics on the Grafana dashboard. Adjust the stress test
238+
parameters according to the corresponding specifications to further understand the specifications and performance
239+
related to AutoMQ.
240+
241+
## Cleanup
242+
243+
To remove all deployed resources:
244+
245+
```bash
246+
# Remove benchmark workload
247+
helm uninstall automq-benchmark
248+
249+
# Remove AutoMQ instance
250+
cd $BASE_DIR/cloudservice-setup/aws/eks-benchmark/automq
251+
terraform destroy
252+
253+
# Remove EKS and AutoMQ Console
254+
cd $BASE_DIR/cloudservice-setup/aws/eks-benchmark/terraform
255+
terraform destroy
256+
```
39.3 KB
Loading
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
apiVersion: v2
2+
name: automq-benchmark
3+
description: A Helm chart for AutoMQ benchmark testing
4+
type: application
5+
version: 0.1.0
6+
appVersion: "latest"
7+
keywords:
8+
- automq
9+
- kafka
10+
- benchmark
11+
- performance
12+
home: https://github.com/AutoMQ/automq-labs
13+
sources:
14+
- https://github.com/AutoMQ/automq-labs
15+
maintainers:
16+
- name: AutoMQ Team
17+
email: support@automq.com
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# AutoMQ Benchmark Helm Chart
2+
3+
This Helm chart deploys an AutoMQ benchmark job on a Kubernetes cluster.
4+
5+
## Prerequisites
6+
7+
- Kubernetes 1.16+
8+
- Helm 3.0+
9+
- An AutoMQ cluster running in the same Kubernetes cluster
10+
11+
## Installing the Chart
12+
13+
To install the chart with the release name `automq-benchmark`:
14+
15+
```bash
16+
helm install automq-benchmark ./automq-benchmark-chart
17+
```
18+
19+
To install with custom values:
20+
21+
```bash
22+
helm install automq-benchmark ./automq-benchmark-chart -f custom-values.yaml
23+
```
24+
25+
**Note:** If you need to re-run the benchmark task, first uninstall the existing deployment using `helm uninstall automq-benchmark`, then reinstall the chart.
26+
27+
## Uninstalling the Chart
28+
29+
To uninstall/delete the `automq-benchmark` deployment:
30+
31+
```bash
32+
helm uninstall automq-benchmark
33+
```
34+
35+
## Configuration
36+
37+
The following table lists the configurable parameters of the AutoMQ benchmark chart and their default values.
38+
39+
| Parameter | Description | Default |
40+
|--------------------------------|------------------------------------------------|------------------------------------------------------|
41+
| `job.name` | Name of the benchmark job | `automq-benchmark` |
42+
| `job.completions` | Number of successful completions | `1` |
43+
| `job.parallelism` | Number of parallel pods | `1` |
44+
| `job.backoffLimit` | Number of retries before marking job as failed | `3` |
45+
| `job.restartPolicy` | Restart policy for the job | `Never` |
46+
| `image.repository` | AutoMQ image repository | `automqinc/automq` |
47+
| `image.tag` | AutoMQ image tag | `latest` |
48+
| `image.pullPolicy` | Image pull policy | `IfNotPresent` |
49+
| `automq.username` | AutoMQ username | `user1` |
50+
| `automq.password` | AutoMQ password | `MrCrSQTVoB` |
51+
| `automq.bootstrapServer` | AutoMQ bootstrap server | `automq-release-kafka.automq.svc.cluster.local:9092` |
52+
| `automq.securityProtocol` | Security protocol | `SASL_PLAINTEXT` |
53+
| `automq.saslMechanism` | SASL mechanism | `PLAIN` |
54+
| `benchmark.kafkaHeapOpts` | Kafka heap options | `-Xmx1g -Xms1g` |
55+
| `benchmark.producerConfigs` | Producer configurations | `batch.size=0` |
56+
| `benchmark.consumerConfigs` | Consumer configurations | `fetch.max.wait.ms=1000` |
57+
| `benchmark.topics` | Number of topics | `10` |
58+
| `benchmark.partitionsPerTopic` | Partitions per topic | `128` |
59+
| `benchmark.producersPerTopic` | Producers per topic | `1` |
60+
| `benchmark.groupsPerTopic` | Consumer groups per topic | `1` |
61+
| `benchmark.consumersPerGroup` | Consumers per group | `1` |
62+
| `benchmark.recordSize` | Record size in bytes | `52224` |
63+
| `benchmark.sendRate` | Send rate (messages/sec) | `160` |
64+
| `benchmark.warmupDuration` | Warmup duration in minutes | `3` |
65+
| `benchmark.testDuration` | Test duration in minutes | `3` |
66+
| `resources.requests.cpu` | CPU request | `500m` |
67+
| `resources.requests.memory` | Memory request | `2Gi` |
68+
| `resources.limits.cpu` | CPU limit | `2` |
69+
| `resources.limits.memory` | Memory limit | `4Gi` |
70+
71+
## Example Custom Values
72+
73+
```yaml
74+
# custom-values.yaml
75+
benchmark:
76+
topics: 20
77+
partitionsPerTopic: 256
78+
recordSize: 1024
79+
sendRate: 1000
80+
testDuration: 10
81+
82+
resources:
83+
requests:
84+
cpu: "1"
85+
memory: "4Gi"
86+
limits:
87+
cpu: "4"
88+
memory: "8Gi"
89+
90+
automq:
91+
bootstrapServer: "my-automq-cluster:9092"
92+
```
93+
94+
## Monitoring
95+
96+
After the job completes, you can check the results by viewing the job logs:
97+
98+
```bash
99+
kubectl logs job/automq-benchmark
100+
```
101+
102+
To check the job status:
103+
104+
```bash
105+
kubectl get jobs
106+
kubectl describe job automq-benchmark
107+
```
108+
109+
## Troubleshooting
110+
111+
1. **Job fails to start**: Check if the AutoMQ cluster is accessible and credentials are correct.
112+
2. **Pod crashes**: Check resource limits and AutoMQ cluster capacity.
113+
3. **Authentication errors**: Verify username, password, and security settings.
114+
115+
For more information, check the pod logs:
116+
117+
```bash
118+
kubectl logs -l app=automq-benchmark
119+
```

0 commit comments

Comments
 (0)