Skip to content

Conversation

@rtb-12
Copy link
Contributor

@rtb-12 rtb-12 commented Dec 30, 2025

[CI/CD] Add Victoria Metrics scraping to fuzzy load test workflow

Description

This PR adds Victoria Metrics (vmagent) integration to the fuzzy load test workflow, enabling automatic metrics collection from merod processes during long-running performance tests. The vmagent lifecycle and dynamic scrape configuration are handled via reusable scripts to reduce duplication and improve maintainability.
Dashboard : https://grafana.apps.dev.p2p.aws.calimero.network/d/merod-gha/merod-gha?orgId=1&from=2025-12-30T14:40:43.543Z&to=2025-12-30T15:16:06.991Z&timezone=browser&var-execution_platform=gha&var-execution_environment=vm&var-merod_name=kv-store-287a1be&var-rate_interval=5m&var-percentile=0.99&var-context_id=$__all

Key changes

  • New scripts (vmagent lifecycle management):

    • scripts/setup-vmagent.sh: Prepares vmagent (download/extract/configure) and emits outputs consumed by the workflow
    • scripts/run-vmagent.sh: Starts vmagent and periodically refreshes scrape configuration as merod processes come/go
    • scripts/cleanup-vmagent.sh: Gracefully shuts down vmagent and cleans up temp files
  • Workflow updates (.github/workflows/fuzzy-load-test.yml):

    • Uses the new vmagent scripts for both test jobs (kv-store and kv-store-with-handlers)
    • Workflow triggers: runs on push to master (scoped paths) and on pull_request (scoped to workflow + fuzzy-test configs), plus manual workflow_dispatch

Metrics collection

  • Discovery: Automatically discovers merod processes listening on ports 2420–2499
  • Dynamic scraping: Refreshes scrape targets periodically (e.g., every ~30s) as processes start/stop
  • Export: Sends metrics to VictoriaMetrics via vmagent, with labels for test case, commit SHA, branch/ref, and workflow run metadata
  • Artifacts: Uploads vmagent logs/config as workflow artifacts for debugging

Notes

  • Profiling remains supported via the existing workflow flag (enable_profiling) and profiling artifact uploads; vmagent metrics collection runs alongside the fuzzy tests.

Test plan

Documentation update


Note

Adds automated metrics scraping to long-running fuzzy tests via vmagent, with reusable scripts and workflow integration for both kv-store and kv-store-with-handlers.

  • New scripts: scripts/setup-vmagent.sh, scripts/run-vmagent.sh, scripts/cleanup-vmagent.sh to download/start/refresh/reload/stop vmagent and manage auth/bearer token
  • Workflow integration (.github/workflows/fuzzy-load-test.yml): setup vmagent, run tests with collection, cleanup, and upload vmagent logs/config as artifacts; simplify image pull; remove profiling-only decorations in summary
  • Scrape config: static targets based on predictable ports per node pattern (e.g., fuzzy-kv-node, fuzzy-handlers-node), periodic refresh, labeled with commit/branch/run metadata
  • Test tweaks: increase final KV store sync timeout to 180; remove end-of-test stats/assert blocks from handlers test; minor whitespace/formatting cleanup

Written by Cursor Bugbot for commit 5e7c7a4. This will update automatically on new commits. Configure here.

- Updated fuzzy load test workflows to use short commit hashes for naming.
- Improved logging in run-vmagent.sh to display process IDs for better tracking.
- Enhanced setup-vmagent.sh to safely extract the vmagent binary and verify its presence, including a fallback mechanism for different archive structures.
@github-actions
Copy link

github-actions bot commented Jan 7, 2026

This pull request has been automatically marked as stale. If this pull request is still relevant, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize reviewing it yet. Your contribution is very much appreciated.

@github-actions github-actions bot added the Stale label Jan 7, 2026
@github-actions
Copy link

"This pull request has been automatically closed because it has been inactive for more than 7 days. Please reopen and see this PR through its review if it is essential."

@github-actions github-actions bot closed this Jan 14, 2026
@rtb-12 rtb-12 reopened this Jan 15, 2026
- Adjusted timeout settings for KV Store and KV Store with Handlers tests from 60 to 180 seconds.
- Cleaned up comments and removed unnecessary echo statements in the workflow scripts.
- Enhanced the handling of Docker image pulling to avoid failure on pull errors.
- Updated summary generation to remove profiling-specific details, making it more concise.
- Changed the global scrape interval from 5s to 15s.
- Updated job-specific scrape intervals from 5s to 15s for consistency.
@github-actions github-actions bot removed the Stale label Jan 15, 2026
- Updated global and job-specific scrape intervals from 15s to 30s in VM agent configuration.
- Increased timeout setting for fuzzy tests from 180 to 360 seconds.
rtb-12 and others added 2 commits January 15, 2026 23:59
- Changed default test duration from 5 minutes to 45 minutes in the fuzzy load test workflow.
- Removed outdated TODO comments regarding workflow configuration.
- Cleaned up the fuzzy test YAML by removing unnecessary final handlers verification steps.
- Added execute permissions for setup, run, and cleanup scripts in the fuzzy load test workflow to ensure proper execution during the CI process.
@rtb-12 rtb-12 merged commit 5f19acc into master Jan 20, 2026
11 checks passed
@rtb-12 rtb-12 deleted the feat/fuzzy-workflow-metrics-scraping branch January 20, 2026 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants