feat(monitoring): add real-time web dashboard for monitoring benchmark progress #93

JoeXic · 2025-11-02T00:21:27Z

Describe this PR

Add real-time web monitoring dashboard for GAIA validation benchmark with progress tracking and visualization capabilities.

What changed?

Added run_gaia_with_monitor.py to run GAIA benchmark with integrated web monitoring
Added utils/progress_check/gaia_web_monitor.py - web dashboard for real-time progress tracking
Added utils/progress_check/generate_gaia_report.py - report generation utility
Updated main.py to support the new monitoring command
Web dashboard accessible at http://localhost:8080 during benchmark execution

Why?

Running long benchmarks like GAIA validation requires hours, and users need a way to:

Monitor real-time progress without constantly checking logs
Visualize task completion status
Track performance metrics during execution
Generate comprehensive reports after completion

- Add run-gaia-with-monitor command for running benchmark with real-time monitoring - Add web dashboard for monitoring benchmark progress (gaia_web_monitor.py) - Add generate_gaia_report.py to utils/progress_check/ for generating task reports

…into feature/monitor

JoeXic · 2025-11-10T22:45:29Z

Describe this PR

Refactor monitoring system from GAIA-specific to generic benchmark monitoring, supporting GAIA, FutureX, xbench, and FinSearchComp benchmarks with real-time web dashboards.

What changed?

Core Changes

Replaced run_gaia_with_monitor.py → run_benchmark_with_monitor.py (generic benchmark runner)
Replaced utils/progress_check/gaia_web_monitor.py → utils/progress_check/benchmark_monitor.py (generic monitor)
Replaced utils/progress_check/generate_gaia_report.py → utils/progress_check/generate_benchmark_report.py (generic report generator)
Updated main.py to use the new generic monitoring system
Updated utils/progress_check/check_finsearchcomp_progress.py (fixed type annotation)

New Features

Auto-detect benchmark type from log folder path
Support benchmark-specific metrics:
- GAIA/FinSearchComp: Correctness evaluation (accuracy)
- FutureX/xbench: Prediction tracking (prediction rate)
- FinSearchComp: Task type breakdown (T1/T2/T3) and regional analysis
Extract attempt number from log filename for accurate report generation
Suppress verbose HTTP logs in web dashboard
Automatic port conflict resolution

Documentation

Added monitor_guide.md - Web monitoring dashboard guide

Why?

Running long benchmarks (GAIA, FutureX, xbench, FinSearchComp) requires hours, and users need a way to:

Monitor real-time progress without constantly checking logs
Visualize task completion status with benchmark-specific metrics
Track performance metrics during execution (accuracy for GAIA, prediction rate for FutureX/xbench)
Generate comprehensive reports after completion
Use a unified monitoring system across all benchmarks instead of benchmark-specific solutions

JoeXic added 7 commits October 24, 2025 17:47

Add guidance

c0243c1

spacing

a177523

Remove guide.md from tracking

33c96d2

add guide.md

fc18fef

Remove guide.md

3507659

style: format code with ruff

cb853d6

JoeXic closed this Nov 2, 2025

JoeXic reopened this Nov 2, 2025

JoeXic and others added 6 commits November 10, 2025 22:27

Merge branch 'MiroMindAI:main' into feature/monitor

66948f5

generalize monitoring system to support all benchmarks

5a05241

Merge branch 'feature/monitor' of https://github.com/JoeXic/MiroFlow …

4dd3a97

…into feature/monitor

update main.py

81b32bc

delete deprecated files

787454c

add import

b569c18

JoeXic changed the title ~~feat(monitoring): add real-time web dashboard for GAIA benchmark progress~~ feat(monitoring): add real-time web dashboard for monitoring benchmark progress Nov 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(monitoring): add real-time web dashboard for monitoring benchmark progress #93

feat(monitoring): add real-time web dashboard for monitoring benchmark progress #93

JoeXic commented Nov 2, 2025

Uh oh!

JoeXic commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat(monitoring): add real-time web dashboard for monitoring benchmark progress #93

Are you sure you want to change the base?

feat(monitoring): add real-time web dashboard for monitoring benchmark progress #93

Conversation

JoeXic commented Nov 2, 2025

Describe this PR

What changed?

Why?

Uh oh!

JoeXic commented Nov 10, 2025

Describe this PR

What changed?

Core Changes

New Features

Documentation

Why?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant