Skip to content

Commit 97f38e8

Browse files
joshuayaopre-commit-ci[bot]Copilot
authored andcommitted
Add Monitoring for DocSum on Xeon and Gaudi (Docker only) (#2316)
Signed-off-by: Yi Yao <yi.a.yao@intel.com> Signed-off-by: Joshua Yao <yi.a.yao@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Yao, Qing <qing.yao@intel.com>
1 parent 3d439d5 commit 97f38e8

File tree

19 files changed

+519
-51
lines changed

19 files changed

+519
-51
lines changed

ChatQnA/docker_compose/intel/hpu/gaudi/compose.telemetry.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ services:
6262
command:
6363
- '--path.procfs=/host/proc'
6464
- '--path.sysfs=/host/sys'
65-
- --collector.filesystem.ignored-mount-points
65+
- --collector.filesystem.mount-points-exclude
6666
- "^/(sys|proc|dev|host|etc|rootfs/var/lib/docker/containers|rootfs/var/lib/docker/overlay2|rootfs/run/docker/netns|rootfs/var/lib/docker/aufs)($$|/)"
6767
ports:
6868
- 9100:9100

DocSum/docker_compose/intel/cpu/xeon/README.md

Lines changed: 47 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,26 @@ This example includes the following sections:
1313

1414
This section describes how to quickly deploy and test the DocSum service manually on an Intel Xeon platform. The basic steps are:
1515

16-
1. [Access the Code](#access-the-code)
17-
2. [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token)
18-
3. [Configure the Deployment Environment](#configure-the-deployment-environment)
19-
4. [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose)
20-
5. [Check the Deployment Status](#check-the-deployment-status)
21-
6. [Test the Pipeline](#test-the-pipeline)
22-
7. [Cleanup the Deployment](#cleanup-the-deployment)
16+
- [Example DocSum deployments on Intel Xeon Processor](#example-docsum-deployments-on-intel-xeon-processor)
17+
- [DocSum Quick Start Deployment](#docsum-quick-start-deployment)
18+
- [Access the Code and Set Up Environment](#access-the-code-and-set-up-environment)
19+
- [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token)
20+
- [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose)
21+
- [Option #1](#option-1)
22+
- [Option #2](#option-2)
23+
- [Check the Deployment Status](#check-the-deployment-status)
24+
- [Test the Pipeline](#test-the-pipeline)
25+
- [Cleanup the Deployment](#cleanup-the-deployment)
26+
- [DocSum Docker Compose Files](#docsum-docker-compose-files)
27+
- [Running LLM models with remote endpoints](#running-llm-models-with-remote-endpoints)
28+
- [DocSum Detailed Usage](#docsum-detailed-usage)
29+
- [Query with text](#query-with-text)
30+
- [Query with audio and video](#query-with-audio-and-video)
31+
- [Query with long context](#query-with-long-context)
32+
- [Launch the UI](#launch-the-ui)
33+
- [Gradio UI](#gradio-ui)
34+
- [Launch the Svelte UI](#launch-the-svelte-ui)
35+
- [Launch the React UI (Optional)](#launch-the-react-ui-optional)
2336

2437
### Access the Code and Set Up Environment
2538

@@ -28,7 +41,7 @@ Clone the GenAIExample repository and access the ChatQnA Intel Xeon platform Doc
2841
```bash
2942
git clone https://github.com/opea-project/GenAIExamples.git
3043
cd GenAIExamples/DocSum/docker_compose
31-
source intel/set_env.sh
44+
source intel/cpu/xeon/set_env.sh
3245
```
3346

3447
> NOTE: by default vLLM does "warmup" at start, to optimize its performance for the specified model and the underlying platform, which can take long time. For development (and e.g. autoscaling) it can be skipped with `export VLLM_SKIP_WARMUP=true`.
@@ -47,13 +60,26 @@ Some HuggingFace resources, such as some models, are only accessible if you have
4760

4861
### Deploy the Services Using Docker Compose
4962

63+
#### Option #1
64+
5065
To deploy the DocSum services, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute:
5166

5267
```bash
5368
cd intel/cpu/xeon/
5469
docker compose up -d
5570
```
5671

72+
#### Option #2
73+
74+
> NOTE : To enable monitoring, `compose.monitoring.yaml` file need to be merged along with default `compose.yaml` file.
75+
76+
To deploy with monitoring:
77+
78+
```bash
79+
cd intel/cpu/xeon/
80+
docker compose -f compose.yaml -f compose.monitoring.yaml up -d
81+
```
82+
5783
**Note**: developers should build docker image from source when:
5884

5985
- Developing off the git main branch (as the container's ports in the repo may be different from the published docker image).
@@ -109,17 +135,25 @@ To stop the containers associated with the deployment, execute the following com
109135
docker compose -f compose.yaml down
110136
```
111137

138+
If mornitoring is enabled, execute the following command:
139+
140+
```bash
141+
cd intel/cpu/xeon/
142+
docker compose -f compose.yaml -f compose.monitoring.yaml down
143+
```
144+
112145
All the DocSum containers will be stopped and then removed on completion of the "down" command.
113146

114147
## DocSum Docker Compose Files
115148

116149
In the context of deploying a DocSum pipeline on an Intel® Xeon® platform, we can pick and choose different large language model serving frameworks. The table below outlines the various configurations that are available as part of the application.
117150

118-
| File | Description |
119-
| -------------------------------------------- | -------------------------------------------------------------------------------------- |
120-
| [compose.yaml](./compose.yaml) | Default compose file using vllm as serving framework |
121-
| [compose_tgi.yaml](./compose_tgi.yaml) | The LLM serving framework is TGI. All other configurations remain the same as default |
122-
| [compose_remote.yaml](./compose_remote.yaml) | Uses remote inference endpoints for LLMs. All other configurations are same as default |
151+
| File | Description |
152+
| ---------------------------------------------------- | -------------------------------------------------------------------------------------- |
153+
| [compose.yaml](./compose.yaml) | Default compose file using vllm as serving framework |
154+
| [compose_tgi.yaml](./compose_tgi.yaml) | The LLM serving framework is TGI. All other configurations remain the same as default |
155+
| [compose_remote.yaml](./compose_remote.yaml) | Uses remote inference endpoints for LLMs. All other configurations are same as default |
156+
| [compose.monitoring.yaml](./compose.monitoring.yaml) | Helper file for monitoring features. Can be used along with any compose files |
123157

124158
### Running LLM models with remote endpoints
125159

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
services:
5+
prometheus:
6+
image: prom/prometheus:v2.52.0
7+
container_name: opea_prometheus
8+
user: root
9+
volumes:
10+
- ./prometheus.yaml:/etc/prometheus/prometheus.yaml
11+
- ./prometheus_data:/prometheus
12+
command:
13+
- '--config.file=/etc/prometheus/prometheus.yaml'
14+
ports:
15+
- '9090:9090'
16+
ipc: host
17+
restart: unless-stopped
18+
19+
grafana:
20+
image: grafana/grafana:11.0.0
21+
container_name: grafana
22+
volumes:
23+
- ./grafana_data:/var/lib/grafana
24+
- ./grafana/dashboards:/var/lib/grafana/dashboards
25+
- ./grafana/provisioning:/etc/grafana/provisioning
26+
user: root
27+
environment:
28+
GF_SECURITY_ADMIN_PASSWORD: admin
29+
GF_RENDERING_CALLBACK_URL: http://grafana:3000/
30+
GF_LOG_FILTERS: rendering:debug
31+
no_proxy: ${no_proxy}
32+
host_ip: ${host_ip}
33+
depends_on:
34+
- prometheus
35+
ports:
36+
- '3000:3000'
37+
ipc: host
38+
restart: unless-stopped
39+
40+
node-exporter:
41+
image: prom/node-exporter
42+
container_name: node-exporter
43+
volumes:
44+
- /proc:/host/proc:ro
45+
- /sys:/host/sys:ro
46+
- /:/rootfs:ro
47+
command:
48+
- '--path.procfs=/host/proc'
49+
- '--path.sysfs=/host/sys'
50+
- --collector.filesystem.ignored-mount-points
51+
- "^/(sys|proc|dev|host|etc|rootfs/var/lib/docker/containers|rootfs/var/lib/docker/overlay2|rootfs/run/docker/netns|rootfs/var/lib/docker/aufs)($$|/)"
52+
environment:
53+
no_proxy: ${no_proxy}
54+
ports:
55+
- 9100:9100
56+
ipc: host
57+
restart: always
58+
deploy:
59+
mode: global
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
#!/bin/bash
2+
# Copyright (C) 2025 Intel Corporation
3+
# SPDX-License-Identifier: Apache-2.0
4+
if ls *.json 1> /dev/null 2>&1; then
5+
rm *.json
6+
fi
7+
8+
wget https://raw.githubusercontent.com/opea-project/GenAIEval/refs/heads/main/evals/benchmark/grafana/vllm_grafana.json
9+
wget https://raw.githubusercontent.com/opea-project/GenAIEval/refs/heads/main/evals/benchmark/grafana/tgi_grafana.json
10+
wget https://raw.githubusercontent.com/opea-project/GenAIEval/refs/heads/main/evals/benchmark/grafana/docsum_megaservice_grafana.json
11+
wget https://raw.githubusercontent.com/opea-project/GenAIEval/refs/heads/main/evals/benchmark/grafana/node_grafana.json
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Copyright (C) 2025 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
apiVersion: 1
5+
6+
providers:
7+
- name: 'default'
8+
orgId: 1
9+
folder: ''
10+
type: file
11+
disableDeletion: false
12+
updateIntervalSeconds: 10 #how often Grafana will scan for changed dashboards
13+
options:
14+
path: /var/lib/grafana/dashboards
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Copyright (C) 2025 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
# config file version
5+
apiVersion: 1
6+
7+
# list of datasources that should be deleted from the database
8+
deleteDatasources:
9+
- name: Prometheus
10+
orgId: 1
11+
12+
# list of datasources to insert/update depending
13+
# what's available in the database
14+
datasources:
15+
# <string, required> name of the datasource. Required
16+
- name: Prometheus
17+
# <string, required> datasource type. Required
18+
type: prometheus
19+
# <string, required> access mode. direct or proxy. Required
20+
access: proxy
21+
# <int> org id. will default to orgId 1 if not specified
22+
orgId: 1
23+
# <string> url
24+
url: http://$host_ip:9090
25+
# <string> database password, if used
26+
password:
27+
# <string> database user, if used
28+
user:
29+
# <string> database name, if used
30+
database:
31+
# <bool> enable/disable basic auth
32+
basicAuth: false
33+
# <string> basic auth username, if used
34+
basicAuthUser:
35+
# <string> basic auth password, if used
36+
basicAuthPassword:
37+
# <bool> enable/disable with credentials headers
38+
withCredentials:
39+
# <bool> mark as default datasource. Max one per org
40+
isDefault: true
41+
# <map> fields that will be converted to json and stored in json_data
42+
jsonData:
43+
httpMethod: GET
44+
graphiteVersion: "1.1"
45+
tlsAuth: false
46+
tlsAuthWithCACert: false
47+
# <string> json object of data that will be encrypted.
48+
secureJsonData:
49+
tlsCACert: "..."
50+
tlsClientCert: "..."
51+
tlsClientKey: "..."
52+
version: 1
53+
# <bool> allow users to edit datasources from the UI.
54+
editable: true
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Copyright (C) 2025 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
# [IP_ADDR]:{PORT_OUTSIDE_CONTAINER} -> {PORT_INSIDE_CONTAINER} / {PROTOCOL}
4+
global:
5+
scrape_interval: 5s
6+
external_labels:
7+
monitor: "my-monitor"
8+
scrape_configs:
9+
- job_name: "prometheus"
10+
static_configs:
11+
- targets: ["opea_prometheus:9090"]
12+
- job_name: "vllm"
13+
metrics_path: /metrics
14+
static_configs:
15+
- targets: ["docsum-xeon-vllm-service:80"]
16+
- job_name: "tgi"
17+
metrics_path: /metrics
18+
static_configs:
19+
- targets: ["docsum-xeon-tgi-server:80"]
20+
- job_name: "docsum-backend-server"
21+
metrics_path: /metrics
22+
static_configs:
23+
- targets: ["docsum-xeon-backend-server:8888"]
24+
- job_name: "prometheus-node-exporter"
25+
metrics_path: /metrics
26+
static_configs:
27+
- targets: ["node-exporter:9100"]

DocSum/docker_compose/intel/set_env.sh renamed to DocSum/docker_compose/intel/cpu/xeon/set_env.sh

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,14 @@
22

33
# Copyright (C) 2024 Intel Corporation
44
# SPDX-License-Identifier: Apache-2.0
5-
SCRIPT_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
6-
pushd "${SCRIPT_DIR}/../../.." > /dev/null
5+
6+
SCRIPT_DIR=$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &> /dev/null && pwd)
7+
8+
pushd "$SCRIPT_DIR/../../../../../" > /dev/null
79
source .set_env.sh
810
popd > /dev/null
911

1012
export host_ip=$(hostname -I | awk '{print $1}') # Example: host_ip="192.168.1.1"
11-
export no_proxy="${no_proxy},${host_ip}" # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
12-
export http_proxy=$http_proxy
13-
export https_proxy=$https_proxy
1413
export HF_TOKEN=${HF_TOKEN}
1514

1615
export LLM_ENDPOINT_PORT=8008
@@ -41,3 +40,13 @@ export NUM_CARDS=1
4140
export BLOCK_SIZE=128
4241
export MAX_NUM_SEQS=256
4342
export MAX_SEQ_LEN_TO_CAPTURE=2048
43+
44+
# Download Grafana configurations
45+
pushd "${SCRIPT_DIR}/grafana/dashboards" > /dev/null
46+
source download_opea_dashboard.sh
47+
popd > /dev/null
48+
49+
# Set network proxy settings
50+
export no_proxy="${no_proxy},${host_ip},docsum-xeon-vllm-service,docsum-xeon-tgi-server,docsum-xeon-backend-server,opea_prometheus,grafana,node-exporter,$JAEGER_IP" # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
51+
export http_proxy=$http_proxy
52+
export https_proxy=$https_proxy

DocSum/docker_compose/intel/hpu/gaudi/README.md

Lines changed: 38 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,25 @@ This example includes the following sections:
1515

1616
This section describes how to quickly deploy and test the DocSum service manually on an Intel® Gaudi® platform. The basic steps are:
1717

18-
1. [Access the Code](#access-the-code)
19-
2. [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token)
20-
3. [Configure the Deployment Environment](#configure-the-deployment-environment)
21-
4. [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose)
22-
5. [Check the Deployment Status](#check-the-deployment-status)
23-
6. [Test the Pipeline](#test-the-pipeline)
24-
7. [Cleanup the Deployment](#cleanup-the-deployment)
18+
- [Example DocSum deployments on Intel® Gaudi® Platform](#example-docsum-deployments-on-intel-gaudi-platform)
19+
- [DocSum Quick Start Deployment](#docsum-quick-start-deployment)
20+
- [Access the Code and Set Up Environment](#access-the-code-and-set-up-environment)
21+
- [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token)
22+
- [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose)
23+
- [Option #1](#option-1)
24+
- [Option #2](#option-2)
25+
- [Check the Deployment Status](#check-the-deployment-status)
26+
- [Test the Pipeline](#test-the-pipeline)
27+
- [Cleanup the Deployment](#cleanup-the-deployment)
28+
- [DocSum Docker Compose Files](#docsum-docker-compose-files)
29+
- [DocSum Detailed Usage](#docsum-detailed-usage)
30+
- [Query with text](#query-with-text)
31+
- [Query with audio and video](#query-with-audio-and-video)
32+
- [Query with long context](#query-with-long-context)
33+
- [Launch the UI](#launch-the-ui)
34+
- [Gradio UI](#gradio-ui)
35+
- [Launch the Svelte UI](#launch-the-svelte-ui)
36+
- [Launch the React UI (Optional)](#launch-the-react-ui-optional)
2537

2638
### Access the Code and Set Up Environment
2739

@@ -30,7 +42,7 @@ Clone the GenAIExample repository and access the DocSum Intel® Gaudi® platform
3042
```bash
3143
git clone https://github.com/opea-project/GenAIExamples.git
3244
cd GenAIExamples/DocSum/docker_compose
33-
source intel/set_env.sh
45+
source intel/hpu/gaudi/set_env.sh
3446
```
3547

3648
> NOTE: by default vLLM does "warmup" at start, to optimize its performance for the specified model and the underlying platform, which can take long time. For development (and e.g. autoscaling) it can be skipped with `export VLLM_SKIP_WARMUP=true`.
@@ -49,13 +61,26 @@ Some HuggingFace resources, such as some models, are only accessible if you have
4961

5062
### Deploy the Services Using Docker Compose
5163

64+
#### Option #1
65+
5266
To deploy the DocSum services, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute:
5367

5468
```bash
5569
cd intel/hpu/gaudi/
5670
docker compose up -d
5771
```
5872

73+
#### Option #2
74+
75+
> NOTE : To enable monitoring, `compose.monitoring.yaml` file need to be merged along with default `compose.yaml` file.
76+
77+
To deploy with monitoring:
78+
79+
```bash
80+
cd intel/cpu/xeon/
81+
docker compose -f compose.yaml -f compose.monitoring.yaml up -d
82+
```
83+
5984
**Note**: developers should build docker image from source when:
6085

6186
- Developing off the git main branch (as the container's ports in the repo may be different from the published docker image).
@@ -117,10 +142,11 @@ All the DocSum containers will be stopped and then removed on completion of the
117142

118143
In the context of deploying a DocSum pipeline on an Intel® Gaudi® platform, the allocation and utilization of Gaudi devices across different services are important considerations for optimizing performance and resource efficiency. Each of the example deployments, defined by the example Docker compose yaml files, demonstrates a unique approach to leveraging Gaudi hardware, reflecting different priorities and operational strategies.
119144

120-
| File | Description |
121-
| -------------------------------------- | ----------------------------------------------------------------------------------------- |
122-
| [compose.yaml](./compose.yaml) | Default compose file using vllm as serving framework |
123-
| [compose_tgi.yaml](./compose_tgi.yaml) | The LLM serving framework is TGI. All other configurations remain the same as the default |
145+
| File | Description |
146+
| ---------------------------------------------------- | ----------------------------------------------------------------------------------------- |
147+
| [compose.yaml](./compose.yaml) | Default compose file using vllm as serving framework |
148+
| [compose_tgi.yaml](./compose_tgi.yaml) | The LLM serving framework is TGI. All other configurations remain the same as the default |
149+
| [compose.monitoring.yaml](./compose.monitoring.yaml) | Helper file for monitoring features. Can be used along with any compose files |
124150

125151
## DocSum Detailed Usage
126152

0 commit comments

Comments
 (0)