Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ services:
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- --collector.filesystem.ignored-mount-points
- --collector.filesystem.mount-points-exclude
- "^/(sys|proc|dev|host|etc|rootfs/var/lib/docker/containers|rootfs/var/lib/docker/overlay2|rootfs/run/docker/netns|rootfs/var/lib/docker/aufs)($$|/)"
ports:
- 9100:9100
Expand Down
60 changes: 47 additions & 13 deletions DocSum/docker_compose/intel/cpu/xeon/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,26 @@ This example includes the following sections:

This section describes how to quickly deploy and test the DocSum service manually on an Intel Xeon platform. The basic steps are:

1. [Access the Code](#access-the-code)
2. [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token)
3. [Configure the Deployment Environment](#configure-the-deployment-environment)
4. [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose)
5. [Check the Deployment Status](#check-the-deployment-status)
6. [Test the Pipeline](#test-the-pipeline)
7. [Cleanup the Deployment](#cleanup-the-deployment)
- [Example DocSum deployments on Intel Xeon Processor](#example-docsum-deployments-on-intel-xeon-processor)
- [DocSum Quick Start Deployment](#docsum-quick-start-deployment)
- [Access the Code and Set Up Environment](#access-the-code-and-set-up-environment)
- [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token)
- [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose)
- [Option #1](#option-1)
- [Option #2](#option-2)
- [Check the Deployment Status](#check-the-deployment-status)
- [Test the Pipeline](#test-the-pipeline)
- [Cleanup the Deployment](#cleanup-the-deployment)
- [DocSum Docker Compose Files](#docsum-docker-compose-files)
- [Running LLM models with remote endpoints](#running-llm-models-with-remote-endpoints)
- [DocSum Detailed Usage](#docsum-detailed-usage)
- [Query with text](#query-with-text)
- [Query with audio and video](#query-with-audio-and-video)
- [Query with long context](#query-with-long-context)
- [Launch the UI](#launch-the-ui)
- [Gradio UI](#gradio-ui)
- [Launch the Svelte UI](#launch-the-svelte-ui)
- [Launch the React UI (Optional)](#launch-the-react-ui-optional)

### Access the Code and Set Up Environment

Expand All @@ -28,7 +41,7 @@ Clone the GenAIExample repository and access the ChatQnA Intel Xeon platform Doc
```bash
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/DocSum/docker_compose
source intel/set_env.sh
source intel/cpu/xeon/set_env.sh
```

> NOTE: by default vLLM does "warmup" at start, to optimize its performance for the specified model and the underlying platform, which can take long time. For development (and e.g. autoscaling) it can be skipped with `export VLLM_SKIP_WARMUP=true`.
Expand All @@ -47,13 +60,26 @@ Some HuggingFace resources, such as some models, are only accessible if you have

### Deploy the Services Using Docker Compose

#### Option #1

To deploy the DocSum services, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute:

```bash
cd intel/cpu/xeon/
docker compose up -d
```

#### Option #2

> NOTE : To enable monitoring, `compose.monitoring.yaml` file need to be merged along with default `compose.yaml` file.

To deploy with monitoring:

```bash
cd intel/cpu/xeon/
docker compose -f compose.yaml -f compose.monitoring.yaml up -d
```

**Note**: developers should build docker image from source when:

- Developing off the git main branch (as the container's ports in the repo may be different from the published docker image).
Expand Down Expand Up @@ -109,17 +135,25 @@ To stop the containers associated with the deployment, execute the following com
docker compose -f compose.yaml down
```

If mornitoring is enabled, execute the following command:

```bash
cd intel/cpu/xeon/
docker compose -f compose.yaml -f compose.monitoring.yaml down
```

All the DocSum containers will be stopped and then removed on completion of the "down" command.

## DocSum Docker Compose Files

In the context of deploying a DocSum pipeline on an Intel® Xeon® platform, we can pick and choose different large language model serving frameworks. The table below outlines the various configurations that are available as part of the application.

| File | Description |
| -------------------------------------------- | -------------------------------------------------------------------------------------- |
| [compose.yaml](./compose.yaml) | Default compose file using vllm as serving framework |
| [compose_tgi.yaml](./compose_tgi.yaml) | The LLM serving framework is TGI. All other configurations remain the same as default |
| [compose_remote.yaml](./compose_remote.yaml) | Uses remote inference endpoints for LLMs. All other configurations are same as default |
| File | Description |
| ---------------------------------------------------- | -------------------------------------------------------------------------------------- |
| [compose.yaml](./compose.yaml) | Default compose file using vllm as serving framework |
| [compose_tgi.yaml](./compose_tgi.yaml) | The LLM serving framework is TGI. All other configurations remain the same as default |
| [compose_remote.yaml](./compose_remote.yaml) | Uses remote inference endpoints for LLMs. All other configurations are same as default |
| [compose.monitoring.yaml](./compose.monitoring.yaml) | Helper file for monitoring features. Can be used along with any compose files |

### Running LLM models with remote endpoints

Expand Down
59 changes: 59 additions & 0 deletions DocSum/docker_compose/intel/cpu/xeon/compose.monitoring.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

services:
prometheus:
image: prom/prometheus:v2.52.0
container_name: opea_prometheus
user: root
volumes:
- ./prometheus.yaml:/etc/prometheus/prometheus.yaml
- ./prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yaml'
ports:
- '9090:9090'
ipc: host
restart: unless-stopped

grafana:
image: grafana/grafana:11.0.0
container_name: grafana
volumes:
- ./grafana_data:/var/lib/grafana
- ./grafana/dashboards:/var/lib/grafana/dashboards
- ./grafana/provisioning:/etc/grafana/provisioning
user: root
environment:
GF_SECURITY_ADMIN_PASSWORD: admin
GF_RENDERING_CALLBACK_URL: http://grafana:3000/
GF_LOG_FILTERS: rendering:debug
no_proxy: ${no_proxy}
host_ip: ${host_ip}
depends_on:
- prometheus
ports:
- '3000:3000'
ipc: host
restart: unless-stopped

node-exporter:
image: prom/node-exporter
container_name: node-exporter
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- --collector.filesystem.ignored-mount-points
- "^/(sys|proc|dev|host|etc|rootfs/var/lib/docker/containers|rootfs/var/lib/docker/overlay2|rootfs/run/docker/netns|rootfs/var/lib/docker/aufs)($$|/)"
environment:
no_proxy: ${no_proxy}
ports:
- 9100:9100
ipc: host
restart: always
deploy:
mode: global
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
if ls *.json 1> /dev/null 2>&1; then
rm *.json
fi

wget https://raw.githubusercontent.com/opea-project/GenAIEval/refs/heads/main/evals/benchmark/grafana/vllm_grafana.json
wget https://raw.githubusercontent.com/opea-project/GenAIEval/refs/heads/main/evals/benchmark/grafana/tgi_grafana.json
wget https://raw.githubusercontent.com/opea-project/GenAIEval/refs/heads/main/evals/benchmark/grafana/docsum_megaservice_grafana.json
wget https://raw.githubusercontent.com/opea-project/GenAIEval/refs/heads/main/evals/benchmark/grafana/node_grafana.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

apiVersion: 1

providers:
- name: 'default'
orgId: 1
folder: ''
type: file
disableDeletion: false
updateIntervalSeconds: 10 #how often Grafana will scan for changed dashboards
options:
path: /var/lib/grafana/dashboards
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

# config file version
apiVersion: 1

# list of datasources that should be deleted from the database
deleteDatasources:
- name: Prometheus
orgId: 1

# list of datasources to insert/update depending
# what's available in the database
datasources:
# <string, required> name of the datasource. Required
- name: Prometheus
# <string, required> datasource type. Required
type: prometheus
# <string, required> access mode. direct or proxy. Required
access: proxy
# <int> org id. will default to orgId 1 if not specified
orgId: 1
# <string> url
url: http://$host_ip:9090
# <string> database password, if used
password:
# <string> database user, if used
user:
# <string> database name, if used
database:
# <bool> enable/disable basic auth
basicAuth: false
# <string> basic auth username, if used
basicAuthUser:
# <string> basic auth password, if used
basicAuthPassword:
# <bool> enable/disable with credentials headers
withCredentials:
# <bool> mark as default datasource. Max one per org
isDefault: true
# <map> fields that will be converted to json and stored in json_data
jsonData:
httpMethod: GET
graphiteVersion: "1.1"
tlsAuth: false
tlsAuthWithCACert: false
# <string> json object of data that will be encrypted.
secureJsonData:
tlsCACert: "..."
tlsClientCert: "..."
tlsClientKey: "..."
version: 1
# <bool> allow users to edit datasources from the UI.
editable: true
27 changes: 27 additions & 0 deletions DocSum/docker_compose/intel/cpu/xeon/prometheus.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
# [IP_ADDR]:{PORT_OUTSIDE_CONTAINER} -> {PORT_INSIDE_CONTAINER} / {PROTOCOL}
global:
scrape_interval: 5s
external_labels:
monitor: "my-monitor"
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["opea_prometheus:9090"]
- job_name: "vllm"
metrics_path: /metrics
static_configs:
- targets: ["docsum-xeon-vllm-service:80"]
- job_name: "tgi"
metrics_path: /metrics
static_configs:
- targets: ["docsum-xeon-tgi-server:80"]
- job_name: "docsum-backend-server"
metrics_path: /metrics
static_configs:
- targets: ["docsum-xeon-backend-server:8888"]
- job_name: "prometheus-node-exporter"
metrics_path: /metrics
static_configs:
- targets: ["node-exporter:9100"]
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,14 @@

# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
SCRIPT_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
pushd "${SCRIPT_DIR}/../../.." > /dev/null

SCRIPT_DIR=$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &> /dev/null && pwd)

pushd "$SCRIPT_DIR/../../../../../" > /dev/null
source .set_env.sh
popd > /dev/null

export host_ip=$(hostname -I | awk '{print $1}') # Example: host_ip="192.168.1.1"
export no_proxy="${no_proxy},${host_ip}" # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export http_proxy=$http_proxy
export https_proxy=$https_proxy
export HF_TOKEN=${HF_TOKEN}

export LLM_ENDPOINT_PORT=8008
Expand Down Expand Up @@ -41,3 +40,13 @@ export NUM_CARDS=1
export BLOCK_SIZE=128
export MAX_NUM_SEQS=256
export MAX_SEQ_LEN_TO_CAPTURE=2048

# Download Grafana configurations
pushd "${SCRIPT_DIR}/grafana/dashboards" > /dev/null
source download_opea_dashboard.sh
popd > /dev/null

# Set network proxy settings
export no_proxy="${no_proxy},${host_ip},docsum-xeon-vllm-service,docsum-xeon-tgi-server,docsum-xeon-backend-server,opea_prometheus,grafana,node-exporter,$JAEGER_IP" # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export http_proxy=$http_proxy
export https_proxy=$https_proxy
50 changes: 38 additions & 12 deletions DocSum/docker_compose/intel/hpu/gaudi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,25 @@ This example includes the following sections:

This section describes how to quickly deploy and test the DocSum service manually on an Intel® Gaudi® platform. The basic steps are:

1. [Access the Code](#access-the-code)
2. [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token)
3. [Configure the Deployment Environment](#configure-the-deployment-environment)
4. [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose)
5. [Check the Deployment Status](#check-the-deployment-status)
6. [Test the Pipeline](#test-the-pipeline)
7. [Cleanup the Deployment](#cleanup-the-deployment)
- [Example DocSum deployments on Intel® Gaudi® Platform](#example-docsum-deployments-on-intel-gaudi-platform)
- [DocSum Quick Start Deployment](#docsum-quick-start-deployment)
- [Access the Code and Set Up Environment](#access-the-code-and-set-up-environment)
- [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token)
- [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose)
- [Option #1](#option-1)
- [Option #2](#option-2)
- [Check the Deployment Status](#check-the-deployment-status)
- [Test the Pipeline](#test-the-pipeline)
- [Cleanup the Deployment](#cleanup-the-deployment)
- [DocSum Docker Compose Files](#docsum-docker-compose-files)
- [DocSum Detailed Usage](#docsum-detailed-usage)
- [Query with text](#query-with-text)
- [Query with audio and video](#query-with-audio-and-video)
- [Query with long context](#query-with-long-context)
- [Launch the UI](#launch-the-ui)
- [Gradio UI](#gradio-ui)
- [Launch the Svelte UI](#launch-the-svelte-ui)
- [Launch the React UI (Optional)](#launch-the-react-ui-optional)

### Access the Code and Set Up Environment

Expand All @@ -30,7 +42,7 @@ Clone the GenAIExample repository and access the DocSum Intel® Gaudi® platform
```bash
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/DocSum/docker_compose
source intel/set_env.sh
source intel/hpu/gaudi/set_env.sh
```

> NOTE: by default vLLM does "warmup" at start, to optimize its performance for the specified model and the underlying platform, which can take long time. For development (and e.g. autoscaling) it can be skipped with `export VLLM_SKIP_WARMUP=true`.
Expand All @@ -49,13 +61,26 @@ Some HuggingFace resources, such as some models, are only accessible if you have

### Deploy the Services Using Docker Compose

#### Option #1

To deploy the DocSum services, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute:

```bash
cd intel/hpu/gaudi/
docker compose up -d
```

#### Option #2

> NOTE : To enable monitoring, `compose.monitoring.yaml` file need to be merged along with default `compose.yaml` file.

To deploy with monitoring:

```bash
cd intel/cpu/xeon/
docker compose -f compose.yaml -f compose.monitoring.yaml up -d
```

**Note**: developers should build docker image from source when:

- Developing off the git main branch (as the container's ports in the repo may be different from the published docker image).
Expand Down Expand Up @@ -117,10 +142,11 @@ All the DocSum containers will be stopped and then removed on completion of the

In the context of deploying a DocSum pipeline on an Intel® Gaudi® platform, the allocation and utilization of Gaudi devices across different services are important considerations for optimizing performance and resource efficiency. Each of the example deployments, defined by the example Docker compose yaml files, demonstrates a unique approach to leveraging Gaudi hardware, reflecting different priorities and operational strategies.

| File | Description |
| -------------------------------------- | ----------------------------------------------------------------------------------------- |
| [compose.yaml](./compose.yaml) | Default compose file using vllm as serving framework |
| [compose_tgi.yaml](./compose_tgi.yaml) | The LLM serving framework is TGI. All other configurations remain the same as the default |
| File | Description |
| ---------------------------------------------------- | ----------------------------------------------------------------------------------------- |
| [compose.yaml](./compose.yaml) | Default compose file using vllm as serving framework |
| [compose_tgi.yaml](./compose_tgi.yaml) | The LLM serving framework is TGI. All other configurations remain the same as the default |
| [compose.monitoring.yaml](./compose.monitoring.yaml) | Helper file for monitoring features. Can be used along with any compose files |

## DocSum Detailed Usage

Expand Down
Loading
Loading