Skip to content

Conversation

@joshuayao
Copy link
Collaborator

@joshuayao joshuayao commented Oct 29, 2025

Description

Add monitoring for DocSum on Xeon and Gaudi deployed by Docker compose.

Issues

n/a.

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Signed-off-by: Yi Yao <yi.a.yao@intel.com>
Signed-off-by: Yi Yao <yi.a.yao@intel.com>
Signed-off-by: Joshua Yao <yi.a.yao@intel.com>
Copilot AI review requested due to automatic review settings October 29, 2025 01:41
@github-actions
Copy link

github-actions bot commented Oct 29, 2025

Dependency Review

✅ No vulnerabilities or license issues found.

Scanned Files

None

@joshuayao joshuayao added the WIP label Oct 29, 2025
@joshuayao joshuayao added this to OPEA Oct 29, 2025
@joshuayao joshuayao added this to the v1.5 milestone Oct 29, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds monitoring capabilities to DocSum deployments on both Intel Xeon and Gaudi platforms. The changes integrate Prometheus, Grafana, and node exporters to provide comprehensive metrics collection and visualization for DocSum services.

Key changes:

  • Added monitoring infrastructure with Prometheus, Grafana, and metrics exporters
  • Updated test scripts to include monitoring compose files
  • Created platform-specific environment configurations and monitoring compose files

Reviewed Changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
DocSum/tests/test_compose_tgi_on_xeon.sh Updated to use Xeon-specific environment and include monitoring compose file
DocSum/tests/test_compose_tgi_on_gaudi.sh Updated to use Gaudi-specific environment and include monitoring compose file
DocSum/tests/test_compose_on_xeon.sh Updated to use Xeon-specific environment and include monitoring compose file
DocSum/tests/test_compose_on_gaudi.sh Updated to use Gaudi-specific environment, removed duplicate configs, and added monitoring
DocSum/docker_compose/intel/hpu/gaudi/set_env.sh New Gaudi-specific environment configuration with monitoring setup
DocSum/docker_compose/intel/hpu/gaudi/prometheus.yaml Prometheus configuration for Gaudi platform monitoring
DocSum/docker_compose/intel/hpu/gaudi/grafana/provisioning/datasources/datasource.yml Grafana datasource configuration
DocSum/docker_compose/intel/hpu/gaudi/grafana/provisioning/dashboards/local.yaml Grafana dashboard provisioning configuration
DocSum/docker_compose/intel/hpu/gaudi/grafana/dashboards/download_opea_dashboard.sh Script to download OPEA dashboard configurations
DocSum/docker_compose/intel/hpu/gaudi/compose.monitoring.yaml Docker compose file for Gaudi monitoring services
DocSum/docker_compose/intel/hpu/gaudi/README.md Updated documentation with monitoring deployment options
DocSum/docker_compose/intel/cpu/xeon/set_env.sh Updated Xeon environment configuration with monitoring setup
DocSum/docker_compose/intel/cpu/xeon/prometheus.yaml Prometheus configuration for Xeon platform monitoring
DocSum/docker_compose/intel/cpu/xeon/grafana/provisioning/datasources/datasource.yml Grafana datasource configuration for Xeon
DocSum/docker_compose/intel/cpu/xeon/grafana/provisioning/dashboards/local.yaml Grafana dashboard provisioning for Xeon
DocSum/docker_compose/intel/cpu/xeon/grafana/dashboards/download_opea_dashboard.sh Dashboard download script for Xeon
DocSum/docker_compose/intel/cpu/xeon/compose.monitoring.yaml Docker compose file for Xeon monitoring services
DocSum/docker_compose/intel/cpu/xeon/README.md Updated Xeon documentation with monitoring options
ChatQnA/docker_compose/intel/hpu/gaudi/compose.telemetry.yaml Fixed deprecated node-exporter parameter
Comments suppressed due to low confidence (1)

DocSum/tests/test_compose_on_gaudi.sh:1

  • The no_proxy export line was removed but ip_address is still referenced. This variable may not be defined, which could cause the command to fail or behave unexpectedly.
#!/bin/bash

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

joshuayao and others added 2 commits October 29, 2025 09:47
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@joshuayao joshuayao removed the WIP label Oct 29, 2025
@joshuayao joshuayao changed the title [WIP] Add Monitoring for DocSum on Xeon and Gaudi Add Monitoring for DocSum on Xeon and Gaudi (Docker only) Oct 29, 2025
Signed-off-by: Joshua Yao <yi.a.yao@intel.com>
@joshuayao joshuayao moved this to In review in OPEA Oct 30, 2025
@joshuayao joshuayao merged commit 72f2e01 into main Nov 4, 2025
86 of 97 checks passed
@joshuayao joshuayao deleted the josh/monitoring branch November 4, 2025 00:52
@github-project-automation github-project-automation bot moved this from In review to Done in OPEA Nov 4, 2025
yao531441 pushed a commit that referenced this pull request Nov 4, 2025
Signed-off-by: Yi Yao <yi.a.yao@intel.com>
Signed-off-by: Joshua Yao <yi.a.yao@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Yao, Qing <qing.yao@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants