Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,28 @@ stream.prepare(

The delta has a moderating effect on the effectiveness of RCFG.

## Additional Feature Documentation

Comprehensive documentation is available in the [docs folder](src/streamdiffusion/docs/).

- [Core Concepts](src/streamdiffusion/docs/hooks.md)
- [Modules](src/streamdiffusion/docs/modules/)
- [Preprocessing](src/streamdiffusion/docs/preprocessing/)
- [Pipeline](src/streamdiffusion/docs/pipeline.md)
- [Parameter Updater](src/streamdiffusion/docs/stream_parameter_updater.md)
- [Wrapper](src/streamdiffusion/docs/wrapper.md)
- [Config](src/streamdiffusion/docs/config.md)
- [TensorRT](src/streamdiffusion/docs/acceleration/tensorrt.md)

## Diagrams

- [Architecture Overview](src/streamdiffusion/docs/diagrams/overall_architecture.md)
- [Hooks Integration](src/streamdiffusion/docs/diagrams/hooks_integration.md)
- [Orchestrator Flow](src/streamdiffusion/docs/diagrams/orchestrator_flow.md)
- [Module Integration](src/streamdiffusion/docs/diagrams/module_integration.md)
- [Parameter Updating](src/streamdiffusion/docs/diagrams/parameter_updating.md)
- [TensorRT Pipeline](src/streamdiffusion/docs/diagrams/tensorrt_pipeline.md)

## Development Team

[Aki](https://twitter.com/cumulo_autumn),
Expand Down
31 changes: 31 additions & 0 deletions src/streamdiffusion/docs/acceleration/tensorrt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# TensorRT Acceleration

## Overview

TensorRT acceleration optimizes StreamDiffusion for realtime performance by compiling PyTorch models to TensorRT engines, supporting dynamic batch/resolution (384-1024), FP16, and CUDA graphs. Engines are built for UNet, VAE (encoder/decoder), ControlNet, Safety Checker. The system auto-fallbacks to PyTorch on OOM, with engine pooling for ControlNet.

Key components:
- **EngineBuilder**: Exports ONNX, optimizes, builds TRT (static/dynamic shapes).
- **EngineManager**: Manages paths, compiles/loads engines (UNet/VAE/ControlNet).
- **Runtime Engines**: UNet2DConditionModelEngine, AutoencoderKLEngine, ControlNetModelEngine (infer with shape cache).
- **Export Wrappers**: UnifiedExportWrapper for UNet+ControlNet+IPAdapter (handles kwargs, scales).
- **Utilities**: Engine class (buffers, infer), preprocess/decode helpers.

Files: [`builder.py`](../../../acceleration/tensorrt/builder.py), [`engine_manager.py`](../../../acceleration/tensorrt/engine_manager.py), [`utilities.py`](../../../acceleration/tensorrt/utilities.py), wrappers in `export_wrappers/`.

## Usage

### Engine Building

Use `EngineManager` in wrapper init (build_engines_if_missing=True):

```python
from streamdiffusion import StreamDiffusionWrapper

wrapper = StreamDiffusionWrapper(
model_id_or_path="runwayml/stable-diffusion-v1-5",
acceleration="tensorrt",
engine_dir="engines", # Output dir
build_engines_if_missing=True # Compile if missing
)
# Builds: unet.engine, vae_encoder.engine, vae_decoder
44 changes: 44 additions & 0 deletions src/streamdiffusion/docs/config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Config Management

## Overview

Config management in StreamDiffusion uses YAML/JSON files to define model, pipeline, blending, and module settings. The `config.py` module provides `load_config`/`save_config` for file I/O, validation for types/fields, and helpers like `create_wrapper_from_config` to instantiate `StreamDiffusionWrapper` from dicts. Supports legacy single prompts and new blending (prompt_list, seed_list), with normalization, interpolation methods.

Key functions:
- `load_config(path)`: Loads YAML/JSON, validates.
- `save_config(config, path)`: Writes validated config.
- `create_wrapper_from_config(config)`: Builds wrapper from dict, extracts params, handles blending.
- `create_prompt_blending_config`/`create_seed_blending_config`: Helpers for blending.
- `set_normalize_weights_config`: Sets normalization flags.
- Validation: Ensures model_id, controlnets/ipadapters lists, hook processors (type, enabled, params), blending lists.

Configs are loaded at startup; runtime updates via `update_stream_params` ([doc](../stream_parameter_updater.md)). Files: [`config.py`](../../../config.py).

## File Format (YAML Example)

```yaml
model_id: "runwayml/stable-diffusion-v1-5"
t_index_list: [0, 999]
width: 512
height: 512
mode: "img2img"
output_type: "pil"
device: "cuda"
dtype: "float16"
use_controlnet: true
controlnets:
- model_id: "lllyasviel/sd-controlnet-canny"
preprocessor: "canny"
conditioning_scale: 1.0
enabled: true
preprocessor_params:
threshold_low: 100
threshold_high: 200
use_ipadapter: true
ipadapters:
- ipadapter_model_path: "h94/IP-Adapter"
image_encoder_path: "openai/clip-vit-large-patch14"
scale: 0.8
type: "regular"
prompt_blending:
prompt
16 changes: 16 additions & 0 deletions src/streamdiffusion/docs/diagrams/hooks_integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Hooks Integration

```mermaid
graph LR
A[Image Preprocessing Hooks] --> B[Latent Preprocessing Hooks]
B --> C[UNet Hooks: e.g., ControlNet/IPAdapter]
C --> D[Latent Postprocessing Hooks]
D --> E[Image Postprocessing Hooks]

F[Embedding Hooks: Custom Embedding Mods] -.->|Before UNet| C
G[Config] -->|Register Hooks| A
G -->|Register Hooks| B
G -->|Register Hooks| C
G -->|Register Hooks| D
G -->|Register Hooks| E
G -->|Register Hooks| F
29 changes: 29 additions & 0 deletions src/streamdiffusion/docs/diagrams/module_integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Module Integration

```mermaid
graph TD
A[Input Image] --> B[Image Preprocessing Hooks]
B --> C[VAE Encode]
C --> D[Latent Preprocessing Hooks]
D --> E[UNet Forward]

E --> F{ControlNet Active?}
F -->|Yes| G[Add Residuals: Down/Mid Blocks]
F -->|No| H{IPAdapter Active?}
H -->|Yes| I[Set IPAdapter Scale Vector]
H -->|No| J[Standard UNet Call]
G --> J
I --> J

J --> K[Latent Postprocessing Hooks]
K --> L[VAE Decode]
L --> M[Image Postprocessing Hooks]
M --> N[Output Image]

O[StreamParameterUpdater] -.->|Update Scales| I
P[Config] -->|Enable Modules| F
P -->|Enable Modules| H
P -->|Enable Modules| B
P -->|Enable Modules| D
P -->|Enable Modules| K
P -->|Enable Modules| M
123 changes: 123 additions & 0 deletions src/streamdiffusion/docs/diagrams/orchestrator_flow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Orchestrator Flow

```mermaid
graph TB
subgraph "Input Sources"
A["Raw Images<br/>(ControlNet/IPAdapter)"]
B["Pipeline Tensors<br/>(Hook Stages)"]
C["Generated Images<br/>(VAE Output)"]
end

subgraph "PreprocessingOrchestrator"
D["Group Similar Processors"]
E["Parallel Processing<br/>(Multiple ControlNets)"]
F["Cache Results<br/>(Reuse Across Frames)"]
D --> E --> F
end

subgraph "PipelinePreprocessingOrchestrator"
G["Sequential Chain<br/>(Ordered Dependencies)"]
H["Process Each Stage<br/>(Latent Modifications)"]
G --> H
end

subgraph "PostprocessingOrchestrator"
I["Cache Check<br/>(Identical Inputs)"]
J["Sequential Enhancement<br/>(Upscale → Sharpen)"]
I --> J
end

subgraph "BaseOrchestrator (Foundation)"
K{"Feedback Required?"}
L["Sync Processing<br/>(Immediate)"]
M["Pipelined Processing<br/>(Background Thread)"]
K -->|Yes| L
K -->|No| M
end

subgraph "Integration"
N["OrchestratorUser<br/>(Shared Instances)"]
O["StreamParameterUpdater<br/>(Runtime Updates)"]
end

A --> D
B --> G
C --> I

F --> K
H --> K
J --> K

L --> P["Output"]
M --> P

N -.->|"Manages"| D
N -.->|"Manages"| G
N -.->|"Manages"| I

O -.->|"Updates"| D
O -.->|"Updates"| G
O -.->|"Updates"| I
```

## Frame Lifecycle & Parallelism

The orchestrators enable real-time performance through both **intraframe** and **interframe** parallelism:

### Temporal Pipeline
Frame lifecycle: `{[Preprocess N+1] || Diffuse N || [Postprocess N-1]}`
- `{}` = interframe sequencing
- `[]` = intraframe parallelism
- `||` = concurrent execution across temporal stages

```mermaid
gantt
title Frame Pipeline: Concurrent Temporal Stages
dateFormat X
axisFormat %s

section Frame N-1
Preprocessing N-1 :done, prep-n1, 0, 1s
Diffusion N-1 :done, diff-n1, 1, 2s
Postprocessing N-1 :active, post-n1, 2, 3s

section Frame N
Preprocessing N :done, prep-n, 1, 2s
Diffusion N :active, diff-n, 2, 3s
Postprocessing N :post-n, 3, 4s

section Frame N+1
Preprocessing N+1 :active, prep-n1-next, 2, 3s
Diffusion N+1 :diff-n1-next, 3, 4s
Postprocessing N+1 :post-n1-next, 4, 5s
```

### Parallelism Types

```mermaid
graph TB
subgraph "Intraframe Parallelism (Within Single Frame)"
A1["Depth Detection"]
A2["Canny Detection"]
A3["Pose Detection"]
A1 -.->|"Parallel"| A2
A2 -.->|"Parallel"| A3
A1 --> B1["Grouped Results"]
A2 --> B1
A3 --> B1
end

subgraph "Interframe Parallelism (Across Time)"
C1["Frame N-1<br/>Postprocess"]
C2["Frame N<br/>Diffusion"]
C3["Frame N+1<br/>Preprocess"]
C1 -.->|"Concurrent"| C2
C2 -.->|"Concurrent"| C3
end

subgraph "Combined Effect"
D["Pipeline Throughput:<br/>3x Frame Overlap +<br/>Nx Processor Parallelism"]
end

B1 --> D
C3 --> D
74 changes: 74 additions & 0 deletions src/streamdiffusion/docs/diagrams/overall_architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Overall Architecture

```mermaid
graph TB
subgraph "Input"
A["Input: Image/Prompt/Control Image"]
end

subgraph "Preprocessing"
B["Preprocessing Orchestrators"]
C["Processors: Edge Detection (Canny/HED), Pose (OpenPose), Depth (MiDaS)"]
D["Parallel Execution via ThreadPool"]
end

subgraph "Pipeline Core"
E["StreamDiffusion.prepare: Embeddings/Timesteps/Noise"]
F["UNet Steps with Hooks"]
G["ControlNet/IPAdapter Injection"]
H["Orchestrator Calls: Latent/Image Hooks"]
end

subgraph "Decoding"
I["VAE Decode"]
J["Postprocessing Orchestrators"]
end

subgraph "Output"
K["Output: Image"]
end

subgraph "Runtime Control"
L1["StreamDiffusionWrapper"]
L2["update_stream_params()"]
L3["update_control_image()"]
L4["update_style_image()"]
end

subgraph "Management"
L["StreamParameterUpdater: Blending/Caching"]
M["Config Loader: YAML/JSON"]
end

subgraph "Acceleration"
N["TensorRT Engines: UNet/VAE/ControlNet"]
O["Runtime Inference"]
end

A --> B
B --> C
C --> D
D --> E
E --> F
F --> G
G --> H
H --> I
I --> J
J --> K

L1 --> L2
L1 --> L3
L1 --> L4
L2 -.->|"Runtime Updates"| L
L3 -.->|"via Orchestrators"| B
L4 -.->|"via Orchestrators"| B

L -.->|"Updates"| E
L -.->|"Updates"| F
M -.->|"Setup"| B
M -.->|"Setup"| J
M -.->|"Setup"| L
N -.->|"Optimized"| F
N -.->|"Optimized"| I
O -.->|"Fallback PyTorch"| F
O -.->|"Fallback PyTorch"| I
Loading