Skip to content

Commit 0b117ce

Browse files
authored
Omni: Add SGLang Diffusion for XPU (#194)
1 parent f0019a1 commit 0b117ce

File tree

9 files changed

+3601
-18
lines changed

9 files changed

+3601
-18
lines changed

Releases.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,13 @@
6868
## LLM-Scaler-Omni
6969

7070
### Latest Beta Release
71+
* `intel/llm-scaler-omni:0.1.0-b4` [12/10/2025]:
72+
- More workflows support:
73+
- Z-Image-Turbo
74+
- Hunyuan-Video-1.5 T2V/I2V with multi-XPU support
75+
- Initial support for SGLang Diffusion. 10% perf improvement compared to ComfyUI in 1*B60 scenario.
76+
77+
### Previous Releases
7178
* `intel/llm-scaler-omni:0.1.0-b3` [11/19/2025]:
7279
- More workflows support:
7380
- Hunyuan 3D 2.1
@@ -76,7 +83,6 @@
7683
- AnimateDiff lightning
7784
- Add Windows installation
7885

79-
### Previous Releases
8086
* `intel/llm-scaler-omni:0.1.0-b2` [10/17/2025]:
8187
- Fix issues:
8288
- Fix ComfyUI interpolate issue

omni/README.md

Lines changed: 71 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,18 @@
66

77
1. [Getting Started with Omni Docker Image](#getting-started-with-omni-docker-image)
88
2. [ComfyUI](#comfyui)
9-
3. [XInference](#xinference)
10-
4. [Stand-alone Examples](#stand-alone-examples)
11-
5. [ComfyUI for Windows (experimental)](#comfyui-for-windows-experimental)
9+
3. [SGLang Diffusion](#sglang-diffusion-experimental)
10+
4. [XInference](#xinference)
11+
5. [Stand-alone Examples](#stand-alone-examples)
12+
6. [ComfyUI for Windows (experimental)](#comfyui-for-windows-experimental)
1213

1314
---
1415

1516
## Getting Started with Omni Docker Image
1617

1718
Pull docker image from dockerhub:
1819
```bash
19-
docker pull intel/llm-scaler-omni:0.1.0-b3
20+
docker pull intel/llm-scaler-omni:0.1.0-b4
2021
```
2122

2223
Or build docker image:
@@ -282,6 +283,72 @@ This workflow synthesizes new speech using a single reference audio file for voi
282283
3. **Run the Workflow**
283284
- Execute the workflow to generate the speech.
284285

286+
## SGLang Diffusion (experimental)
287+
288+
SGLang Diffusion provides OpenAI-compatible API for image/video generation models.
289+
290+
### 1. CLI Generation
291+
292+
```bash
293+
sglang generate --model-path /llm/models/Wan2.1-T2V-1.3B-Diffusers \
294+
--text-encoder-cpu-offload --pin-cpu-memory \
295+
--prompt "A curious raccoon" \
296+
--save-output
297+
```
298+
299+
### 2. OpenAI API Server
300+
301+
**Start the server:**
302+
303+
```bash
304+
# Configure proxy if needed
305+
export http_proxy=<your_http_proxy>
306+
export https_proxy=<your_https_proxy>
307+
export no_proxy=localhost,127.0.0.1
308+
309+
# Start server
310+
sglang serve --model-path /llm/models/Z-Image-Turbo/ \
311+
--vae-cpu-offload --pin-cpu-memory \
312+
--num-gpus 1 --port 30010
313+
```
314+
315+
Or use the provided script:
316+
317+
```bash
318+
bash /llm/entrypoints/start_sgl_diffusion.sh
319+
```
320+
321+
**cURL example:**
322+
323+
```bash
324+
curl http://localhost:30010/v1/images/generations \
325+
-H "Content-Type: application/json" \
326+
-d '{
327+
"model": "Z-Image-Turbo",
328+
"prompt": "A beautiful sunset over the ocean",
329+
"size": "1024x1024"
330+
}'
331+
```
332+
333+
**Python example (OpenAI SDK):**
334+
335+
```python
336+
from openai import OpenAI
337+
import base64
338+
339+
client = OpenAI(base_url="http://localhost:30010/v1", api_key="EMPTY")
340+
341+
response = client.images.generate(
342+
model="Z-Image-Turbo",
343+
prompt="A beautiful sunset over the ocean",
344+
size="1024x1024",
345+
)
346+
347+
# Save image from base64 response
348+
with open("output.png", "wb") as f:
349+
f.write(base64.b64decode(response.data[0].b64_json))
350+
```
351+
285352
## XInference
286353

287354
```bash

omni/build.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@ set -x
33
export HTTP_PROXY=<your_http_proxy>
44
export HTTPS_PROXY=<your_https_proxy>
55

6-
docker build -f ./docker/Dockerfile . -t intel/llm-scaler-omni:0.1.0-b3 --build-arg https_proxy=$HTTPS_PROXY --build-arg http_proxy=$HTTP_PROXY
6+
docker build -f ./docker/Dockerfile . -t intel/llm-scaler-omni:0.1.0-b4 --build-arg https_proxy=$HTTPS_PROXY --build-arg http_proxy=$HTTP_PROXY

omni/docker/Dockerfile

Lines changed: 18 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ COPY ./patches/xinference_device_utils.patch /tmp/
1515
COPY ./patches/comfyui_for_multi_arc.patch /tmp/
1616
COPY ./patches/comfyui_voxcpm_for_xpu.patch /tmp/
1717
COPY ./patches/comfyui_hunyuan3d_for_xpu.patch /tmp/
18+
COPY ./patches/sglang_diffusion_for_multi_arc.patch /tmp/
1819

1920

2021
# Add Intel oneAPI repo and PPA for GPU support
@@ -86,24 +87,26 @@ RUN wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRO
8687
git apply /tmp/comfyui_voxcpm_for_xpu.patch && \
8788
pip install -r requirements.txt && \
8889
cd .. && \
89-
git clone https://github.com/visualbruno/ComfyUI-Hunyuan3d-2-1.git && \
90-
cd ComfyUI-Hunyuan3d-2-1 && \
91-
git checkout 9d7ef32509101495a7840b3ae8e718c8d1183305 && \
92-
git apply /tmp/comfyui_hunyuan3d_for_xpu.patch && \
93-
pip install bigdl-core==2.4.0b1 rembg realesrgan && \
94-
pip install -r requirements.txt && \
95-
cd hy3dpaint/custom_rasterizer && \
96-
python setup.py install && \
97-
cd ../DifferentiableRenderer && \
98-
python setup.py install && \
99-
cd /llm/ComfyUI/custom_nodes && \
10090
git clone https://github.com/billwuhao/ComfyUI_IndexTTS.git && \
10191
cd ComfyUI_IndexTTS && \
10292
pip install -r requirements.txt && \
10393
# Install Xinference
10494
pip install "xinference[transformers]" && \
10595
patch /usr/local/lib/python3.10/dist-packages/xinference/device_utils.py < /tmp/xinference_device_utils.patch && \
10696
pip install kokoro Jinja2==3.1.6 jieba ordered-set pypinyin cn2an pypinyin-dict && \
97+
# Install SGlang Diffusion
98+
cd /llm && \
99+
git clone https://github.com/sgl-project/sglang.git && \
100+
cd sglang && \
101+
git checkout 236a7c237002250b148c79bd93780d870b8b50d2 && \
102+
git apply /tmp/sglang_diffusion_for_multi_arc.patch && \
103+
pip install -e "python[diffusion]" && \
104+
pip install triton==3.5.0 && \
105+
pip install pytorch-triton-xpu==3.5.0 --index-url https://download.pytorch.org/whl/xpu --force-reinstall && \
106+
cd /llm && \
107+
git clone https://github.com/sgl-project/sgl-kernel-xpu.git && \
108+
cd sgl-kernel-xpu && \
109+
pip install -v . && \
107110
# Clean
108111
rm -rf /tmp/*
109112
RUN cd /llm/ComfyUI/custom_nodes && \
@@ -114,5 +117,8 @@ RUN cd /llm/ComfyUI/custom_nodes && \
114117
COPY ./workflows/* /llm/ComfyUI/user/default/workflows/
115118
COPY ./example_inputs/* /llm/ComfyUI/input/
116119
COPY ./tools/* /llm/tools/
120+
COPY ./entrypoints/* /llm/entrypoints/
121+
122+
RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 1
117123

118-
WORKDIR /llm/ComfyUI
124+
WORKDIR /llm/entrypoints

omni/entrypoints/start_comfyui.sh

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
export http_proxy=<your_http_proxy>
2+
export https_proxy=<your_https_proxy>
3+
export no_proxy=localhost,127.0.0.1
4+
5+
python /llm/ComfyUI/main.py --listen 0.0.0.0 --port 8188
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
export http_proxy=<your_http_proxy>
2+
export https_proxy=<your_https_proxy>
3+
export no_proxy=localhost,127.0.0.1
4+
5+
export model="/llm/models/Z-Image-Turbo/"
6+
7+
SERVER_ARGS=(
8+
--model-path $model
9+
--vae-cpu-offload
10+
--pin-cpu-memory
11+
--num-gpus 1
12+
--ulysses-degree=1
13+
--ring-degree=1
14+
--port 30010
15+
)
16+
17+
sglang serve "${SERVER_ARGS[@]}" 2>&1 | tee sglang.log

0 commit comments

Comments
 (0)