livepeer · ryanontheinside · Sep 11, 2025 · Sep 12, 2025 · Sep 12, 2025 · Sep 12, 2025
diff --git a/README.md b/README.md
@@ -369,6 +369,28 @@ stream.prepare(
 
 The delta has a moderating effect on the effectiveness of RCFG.
 
+## Additional Feature Documentation
+
+Comprehensive documentation is available in the [docs folder](src/streamdiffusion/docs/).
+
+- [Core Concepts](src/streamdiffusion/docs/hooks.md)
+- [Modules](src/streamdiffusion/docs/modules/)
+- [Preprocessing](src/streamdiffusion/docs/preprocessing/)
+- [Pipeline](src/streamdiffusion/docs/pipeline.md)
+- [Parameter Updater](src/streamdiffusion/docs/stream_parameter_updater.md)
+- [Wrapper](src/streamdiffusion/docs/wrapper.md)
+- [Config](src/streamdiffusion/docs/config.md)
+- [TensorRT](src/streamdiffusion/docs/acceleration/tensorrt.md)
+
+## Diagrams
+
+- [Architecture Overview](src/streamdiffusion/docs/diagrams/overall_architecture.md)
+- [Hooks Integration](src/streamdiffusion/docs/diagrams/hooks_integration.md)
+- [Orchestrator Flow](src/streamdiffusion/docs/diagrams/orchestrator_flow.md)
+- [Module Integration](src/streamdiffusion/docs/diagrams/module_integration.md)
+- [Parameter Updating](src/streamdiffusion/docs/diagrams/parameter_updating.md)
+- [TensorRT Pipeline](src/streamdiffusion/docs/diagrams/tensorrt_pipeline.md)
+
 ## Development Team
 
 [Aki](https://twitter.com/cumulo_autumn),

diff --git a/src/streamdiffusion/docs/acceleration/tensorrt.md b/src/streamdiffusion/docs/acceleration/tensorrt.md
@@ -0,0 +1,31 @@
+# TensorRT Acceleration
+
+## Overview
+
+TensorRT acceleration optimizes StreamDiffusion for realtime performance by compiling PyTorch models to TensorRT engines, supporting dynamic batch/resolution (384-1024), FP16, and CUDA graphs. Engines are built for UNet, VAE (encoder/decoder), ControlNet, Safety Checker. The system auto-fallbacks to PyTorch on OOM, with engine pooling for ControlNet.
+
+Key components:
+- **EngineBuilder**: Exports ONNX, optimizes, builds TRT (static/dynamic shapes).
+- **EngineManager**: Manages paths, compiles/loads engines (UNet/VAE/ControlNet).
+- **Runtime Engines**: UNet2DConditionModelEngine, AutoencoderKLEngine, ControlNetModelEngine (infer with shape cache).
+- **Export Wrappers**: UnifiedExportWrapper for UNet+ControlNet+IPAdapter (handles kwargs, scales).
+- **Utilities**: Engine class (buffers, infer), preprocess/decode helpers.
+
+Files: [`builder.py`](../../../acceleration/tensorrt/builder.py), [`engine_manager.py`](../../../acceleration/tensorrt/engine_manager.py), [`utilities.py`](../../../acceleration/tensorrt/utilities.py), wrappers in `export_wrappers/`.
+
+## Usage
+
+### Engine Building
+
+Use `EngineManager` in wrapper init (build_engines_if_missing=True):
+
+```python
+from streamdiffusion import StreamDiffusionWrapper
+
+wrapper = StreamDiffusionWrapper(
+    model_id_or_path="runwayml/stable-diffusion-v1-5",
+    acceleration="tensorrt",
+    engine_dir="engines",  # Output dir
+    build_engines_if_missing=True  # Compile if missing
+)
+# Builds: unet.engine, vae_encoder.engine, vae_decoder
diff --git a/src/streamdiffusion/docs/config.md b/src/streamdiffusion/docs/config.md
@@ -0,0 +1,44 @@
+# Config Management
+
+## Overview
+
+Config management in StreamDiffusion uses YAML/JSON files to define model, pipeline, blending, and module settings. The `config.py` module provides `load_config`/`save_config` for file I/O, validation for types/fields, and helpers like `create_wrapper_from_config` to instantiate `StreamDiffusionWrapper` from dicts. Supports legacy single prompts and new blending (prompt_list, seed_list), with normalization, interpolation methods.
+
+Key functions:
+- `load_config(path)`: Loads YAML/JSON, validates.
+- `save_config(config, path)`: Writes validated config.
+- `create_wrapper_from_config(config)`: Builds wrapper from dict, extracts params, handles blending.
+- `create_prompt_blending_config`/`create_seed_blending_config`: Helpers for blending.
+- `set_normalize_weights_config`: Sets normalization flags.
+- Validation: Ensures model_id, controlnets/ipadapters lists, hook processors (type, enabled, params), blending lists.
+
+Configs are loaded at startup; runtime updates via `update_stream_params` ([doc](../stream_parameter_updater.md)). Files: [`config.py`](../../../config.py).
+
+## File Format (YAML Example)
+
+```yaml
+model_id: "runwayml/stable-diffusion-v1-5"
+t_index_list: [0, 999]
+width: 512
+height: 512
+mode: "img2img"
+output_type: "pil"
+device: "cuda"
+dtype: "float16"
+use_controlnet: true
+controlnets:
+  - model_id: "lllyasviel/sd-controlnet-canny"
+    preprocessor: "canny"
+    conditioning_scale: 1.0
+    enabled: true
+    preprocessor_params:
+      threshold_low: 100
+      threshold_high: 200
+use_ipadapter: true
+ipadapters:
+  - ipadapter_model_path: "h94/IP-Adapter"
+    image_encoder_path: "openai/clip-vit-large-patch14"
+    scale: 0.8
+    type: "regular"
+prompt_blending:
+  prompt
diff --git a/src/streamdiffusion/docs/diagrams/hooks_integration.md b/src/streamdiffusion/docs/diagrams/hooks_integration.md
@@ -0,0 +1,16 @@
+# Hooks Integration
+
+```mermaid
+graph LR
+    A[Image Preprocessing Hooks] --> B[Latent Preprocessing Hooks]
+    B --> C[UNet Hooks: e.g., ControlNet/IPAdapter]
+    C --> D[Latent Postprocessing Hooks]
+    D --> E[Image Postprocessing Hooks]
+
+    F[Embedding Hooks: Custom Embedding Mods] -.->|Before UNet| C
+    G[Config] -->|Register Hooks| A
+    G -->|Register Hooks| B
+    G -->|Register Hooks| C
+    G -->|Register Hooks| D
+    G -->|Register Hooks| E
+    G -->|Register Hooks| F
diff --git a/src/streamdiffusion/docs/diagrams/module_integration.md b/src/streamdiffusion/docs/diagrams/module_integration.md
@@ -0,0 +1,29 @@
+# Module Integration
+
+```mermaid
+graph TD
+    A[Input Image] --> B[Image Preprocessing Hooks]
+    B --> C[VAE Encode]
+    C --> D[Latent Preprocessing Hooks]
+    D --> E[UNet Forward]
+
+    E --> F{ControlNet Active?}
+    F -->|Yes| G[Add Residuals: Down/Mid Blocks]
+    F -->|No| H{IPAdapter Active?}
+    H -->|Yes| I[Set IPAdapter Scale Vector]
+    H -->|No| J[Standard UNet Call]
+    G --> J
+    I --> J
+
+    J --> K[Latent Postprocessing Hooks]
+    K --> L[VAE Decode]
+    L --> M[Image Postprocessing Hooks]
+    M --> N[Output Image]
+
+    O[StreamParameterUpdater] -.->|Update Scales| I
+    P[Config] -->|Enable Modules| F
+    P -->|Enable Modules| H
+    P -->|Enable Modules| B
+    P -->|Enable Modules| D
+    P -->|Enable Modules| K
+    P -->|Enable Modules| M
diff --git a/src/streamdiffusion/docs/diagrams/orchestrator_flow.md b/src/streamdiffusion/docs/diagrams/orchestrator_flow.md
@@ -0,0 +1,123 @@
+# Orchestrator Flow
+
+```mermaid
+graph TB
+    subgraph "Input Sources"
+        A["Raw Images<br/>(ControlNet/IPAdapter)"]
+        B["Pipeline Tensors<br/>(Hook Stages)"]
+        C["Generated Images<br/>(VAE Output)"]
+    end
+
+    subgraph "PreprocessingOrchestrator"
+        D["Group Similar Processors"]
+        E["Parallel Processing<br/>(Multiple ControlNets)"]
+        F["Cache Results<br/>(Reuse Across Frames)"]
+        D --> E --> F
+    end
+
+    subgraph "PipelinePreprocessingOrchestrator"
+        G["Sequential Chain<br/>(Ordered Dependencies)"]
+        H["Process Each Stage<br/>(Latent Modifications)"]
+        G --> H
+    end
+
+    subgraph "PostprocessingOrchestrator"
+        I["Cache Check<br/>(Identical Inputs)"]
+        J["Sequential Enhancement<br/>(Upscale → Sharpen)"]
+        I --> J
+    end
+
+    subgraph "BaseOrchestrator (Foundation)"
+        K{"Feedback Required?"}
+        L["Sync Processing<br/>(Immediate)"]
+        M["Pipelined Processing<br/>(Background Thread)"]
+        K -->|Yes| L
+        K -->|No| M
+    end
+
+    subgraph "Integration"
+        N["OrchestratorUser<br/>(Shared Instances)"]
+        O["StreamParameterUpdater<br/>(Runtime Updates)"]
+    end
+
+    A --> D
+    B --> G
+    C --> I
+
+    F --> K
+    H --> K
+    J --> K
+
+    L --> P["Output"]
+    M --> P
+
+    N -.->|"Manages"| D
+    N -.->|"Manages"| G
+    N -.->|"Manages"| I
+
+    O -.->|"Updates"| D
+    O -.->|"Updates"| G
+    O -.->|"Updates"| I
+```
+
+## Frame Lifecycle & Parallelism
+
+The orchestrators enable real-time performance through both **intraframe** and **interframe** parallelism:
+
+### Temporal Pipeline
+Frame lifecycle: `{[Preprocess N+1] || Diffuse N || [Postprocess N-1]}`
+- `{}` = interframe sequencing
+- `[]` = intraframe parallelism  
+- `||` = concurrent execution across temporal stages
+
+```mermaid
+gantt
+    title Frame Pipeline: Concurrent Temporal Stages
+    dateFormat X
+    axisFormat %s
+
+    section Frame N-1
+    Preprocessing N-1    :done, prep-n1, 0, 1s
+    Diffusion N-1       :done, diff-n1, 1, 2s
+    Postprocessing N-1  :active, post-n1, 2, 3s
+
+    section Frame N
+    Preprocessing N     :done, prep-n, 1, 2s
+    Diffusion N        :active, diff-n, 2, 3s
+    Postprocessing N   :post-n, 3, 4s
+
+    section Frame N+1
+    Preprocessing N+1  :active, prep-n1-next, 2, 3s
+    Diffusion N+1     :diff-n1-next, 3, 4s
+    Postprocessing N+1 :post-n1-next, 4, 5s
+```
+
+### Parallelism Types
+
+```mermaid
+graph TB
+    subgraph "Intraframe Parallelism (Within Single Frame)"
+        A1["Depth Detection"]
+        A2["Canny Detection"]
+        A3["Pose Detection"]
+        A1 -.->|"Parallel"| A2
+        A2 -.->|"Parallel"| A3
+        A1 --> B1["Grouped Results"]
+        A2 --> B1
+        A3 --> B1
+    end
+
+    subgraph "Interframe Parallelism (Across Time)"
+        C1["Frame N-1<br/>Postprocess"]
+        C2["Frame N<br/>Diffusion"]
+        C3["Frame N+1<br/>Preprocess"]
+        C1 -.->|"Concurrent"| C2
+        C2 -.->|"Concurrent"| C3
+    end
+
+    subgraph "Combined Effect"
+        D["Pipeline Throughput:<br/>3x Frame Overlap +<br/>Nx Processor Parallelism"]
+    end
+
+    B1 --> D
+    C3 --> D
diff --git a/src/streamdiffusion/docs/diagrams/overall_architecture.md b/src/streamdiffusion/docs/diagrams/overall_architecture.md
@@ -0,0 +1,74 @@
+# Overall Architecture
+
+```mermaid
+graph TB
+    subgraph "Input"
+        A["Input: Image/Prompt/Control Image"]
+    end
+
+    subgraph "Preprocessing"
+        B["Preprocessing Orchestrators"]
+        C["Processors: Edge Detection (Canny/HED), Pose (OpenPose), Depth (MiDaS)"]
+        D["Parallel Execution via ThreadPool"]
+    end
+
+    subgraph "Pipeline Core"
+        E["StreamDiffusion.prepare: Embeddings/Timesteps/Noise"]
+        F["UNet Steps with Hooks"]
+        G["ControlNet/IPAdapter Injection"]
+        H["Orchestrator Calls: Latent/Image Hooks"]
+    end
+
+    subgraph "Decoding"
+        I["VAE Decode"]
+        J["Postprocessing Orchestrators"]
+    end
+
+    subgraph "Output"
+        K["Output: Image"]
+    end
+
+    subgraph "Runtime Control"
+        L1["StreamDiffusionWrapper"]
+        L2["update_stream_params()"]
+        L3["update_control_image()"]
+        L4["update_style_image()"]
+    end
+
+    subgraph "Management"
+        L["StreamParameterUpdater: Blending/Caching"]
+        M["Config Loader: YAML/JSON"]
+    end
+
+    subgraph "Acceleration"
+        N["TensorRT Engines: UNet/VAE/ControlNet"]
+        O["Runtime Inference"]
+    end
+
+    A --> B
+    B --> C
+    C --> D
+    D --> E
+    E --> F
+    F --> G
+    G --> H
+    H --> I
+    I --> J
+    J --> K
+
+    L1 --> L2
+    L1 --> L3
+    L1 --> L4
+    L2 -.->|"Runtime Updates"| L
+    L3 -.->|"via Orchestrators"| B
+    L4 -.->|"via Orchestrators"| B
+
+    L -.->|"Updates"| E
+    L -.->|"Updates"| F
+    M -.->|"Setup"| B
+    M -.->|"Setup"| J
+    M -.->|"Setup"| L
+    N -.->|"Optimized"| F
+    N -.->|"Optimized"| I
+    O -.->|"Fallback PyTorch"| F
+    O -.->|"Fallback PyTorch"| I