-
Notifications
You must be signed in to change notification settings - Fork 329
Add Monitoring for DocSum on Xeon and Gaudi (Docker only) #2316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yi Yao <yi.a.yao@intel.com>
Signed-off-by: Yi Yao <yi.a.yao@intel.com>
Signed-off-by: Joshua Yao <yi.a.yao@intel.com>
Dependency Review✅ No vulnerabilities or license issues found.Scanned FilesNone |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds monitoring capabilities to DocSum deployments on both Intel Xeon and Gaudi platforms. The changes integrate Prometheus, Grafana, and node exporters to provide comprehensive metrics collection and visualization for DocSum services.
Key changes:
- Added monitoring infrastructure with Prometheus, Grafana, and metrics exporters
- Updated test scripts to include monitoring compose files
- Created platform-specific environment configurations and monitoring compose files
Reviewed Changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| DocSum/tests/test_compose_tgi_on_xeon.sh | Updated to use Xeon-specific environment and include monitoring compose file |
| DocSum/tests/test_compose_tgi_on_gaudi.sh | Updated to use Gaudi-specific environment and include monitoring compose file |
| DocSum/tests/test_compose_on_xeon.sh | Updated to use Xeon-specific environment and include monitoring compose file |
| DocSum/tests/test_compose_on_gaudi.sh | Updated to use Gaudi-specific environment, removed duplicate configs, and added monitoring |
| DocSum/docker_compose/intel/hpu/gaudi/set_env.sh | New Gaudi-specific environment configuration with monitoring setup |
| DocSum/docker_compose/intel/hpu/gaudi/prometheus.yaml | Prometheus configuration for Gaudi platform monitoring |
| DocSum/docker_compose/intel/hpu/gaudi/grafana/provisioning/datasources/datasource.yml | Grafana datasource configuration |
| DocSum/docker_compose/intel/hpu/gaudi/grafana/provisioning/dashboards/local.yaml | Grafana dashboard provisioning configuration |
| DocSum/docker_compose/intel/hpu/gaudi/grafana/dashboards/download_opea_dashboard.sh | Script to download OPEA dashboard configurations |
| DocSum/docker_compose/intel/hpu/gaudi/compose.monitoring.yaml | Docker compose file for Gaudi monitoring services |
| DocSum/docker_compose/intel/hpu/gaudi/README.md | Updated documentation with monitoring deployment options |
| DocSum/docker_compose/intel/cpu/xeon/set_env.sh | Updated Xeon environment configuration with monitoring setup |
| DocSum/docker_compose/intel/cpu/xeon/prometheus.yaml | Prometheus configuration for Xeon platform monitoring |
| DocSum/docker_compose/intel/cpu/xeon/grafana/provisioning/datasources/datasource.yml | Grafana datasource configuration for Xeon |
| DocSum/docker_compose/intel/cpu/xeon/grafana/provisioning/dashboards/local.yaml | Grafana dashboard provisioning for Xeon |
| DocSum/docker_compose/intel/cpu/xeon/grafana/dashboards/download_opea_dashboard.sh | Dashboard download script for Xeon |
| DocSum/docker_compose/intel/cpu/xeon/compose.monitoring.yaml | Docker compose file for Xeon monitoring services |
| DocSum/docker_compose/intel/cpu/xeon/README.md | Updated Xeon documentation with monitoring options |
| ChatQnA/docker_compose/intel/hpu/gaudi/compose.telemetry.yaml | Fixed deprecated node-exporter parameter |
Comments suppressed due to low confidence (1)
DocSum/tests/test_compose_on_gaudi.sh:1
- The
no_proxyexport line was removed butip_addressis still referenced. This variable may not be defined, which could cause the command to fail or behave unexpectedly.
#!/bin/bash
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
for more information, see https://pre-commit.ci
Signed-off-by: Joshua Yao <yi.a.yao@intel.com>
Signed-off-by: Yi Yao <yi.a.yao@intel.com> Signed-off-by: Joshua Yao <yi.a.yao@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Yao, Qing <qing.yao@intel.com>
Description
Add monitoring for DocSum on Xeon and Gaudi deployed by Docker compose.
Issues
n/a.Type of change
List the type of change like below. Please delete options that are not relevant.