GPU Cluster Monitoring (GCM): Large-Scale AI Research Cluster Monitoring
-
Updated
Dec 25, 2025 - Python
GPU Cluster Monitoring (GCM): Large-Scale AI Research Cluster Monitoring
Simulate NVIDIA GPUs for testing. 7 behavior profiles, scale to 1000+ GPUs, Docker-ready Prometheus exporter using DCGM
Add a description, image, and links to the dcgm topic page so that developers can more easily learn about it.
To associate your repository with the dcgm topic, visit your repo's landing page and select "manage topics."