From 4586bcec182b939fdf0d4ec2cda9ff573a7a3a3f Mon Sep 17 00:00:00 2001
From: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>
Date: Thu, 11 Sep 2025 13:04:27 -0400
Subject: [PATCH 1/6] draft

---
 README.md                                     |  22 +
 .../docs/acceleration/tensorrt.md             |  31 +
 src/streamdiffusion/docs/config.md            |  44 ++
 .../docs/diagrams/hooks_integration.md        |  13 +
 .../docs/diagrams/module_integration.md       |  29 +
 .../docs/diagrams/orchestrator_flow.md        |  72 +++
 .../docs/diagrams/overall_architecture.md     |  60 ++
 .../docs/diagrams/parameter_updating.md       |  56 ++
 .../docs/diagrams/sdxl_vs_sd15.md             | 124 ++++
 .../docs/diagrams/tensorrt_pipeline.md        |  15 +
 src/streamdiffusion/docs/hooks.md             | 127 ++++
 src/streamdiffusion/docs/index.md             |  38 ++
 .../docs/modules/controlnet.md                | 194 ++++++
 .../docs/modules/image_processing.md          | 218 +++++++
 src/streamdiffusion/docs/modules/ipadapter.md | 222 +++++++
 .../docs/modules/latent_processing.md         | 281 +++++++++
 src/streamdiffusion/docs/pipeline.md          | 130 ++++
 .../docs/preprocessing/orchestrators.md       |  20 +
 .../docs/preprocessing/processors.md          | 237 +++++++
 src/streamdiffusion/docs/runtime_control.md   | 584 ++++++++++++++++++
 .../docs/stream_parameter_updater.md          | 145 +++++
 .../preprocessing/processors/dinov3           |   1 +
 22 files changed, 2663 insertions(+)
 create mode 100644 src/streamdiffusion/docs/acceleration/tensorrt.md
 create mode 100644 src/streamdiffusion/docs/config.md
 create mode 100644 src/streamdiffusion/docs/diagrams/hooks_integration.md
 create mode 100644 src/streamdiffusion/docs/diagrams/module_integration.md
 create mode 100644 src/streamdiffusion/docs/diagrams/orchestrator_flow.md
 create mode 100644 src/streamdiffusion/docs/diagrams/overall_architecture.md
 create mode 100644 src/streamdiffusion/docs/diagrams/parameter_updating.md
 create mode 100644 src/streamdiffusion/docs/diagrams/sdxl_vs_sd15.md
 create mode 100644 src/streamdiffusion/docs/diagrams/tensorrt_pipeline.md
 create mode 100644 src/streamdiffusion/docs/hooks.md
 create mode 100644 src/streamdiffusion/docs/index.md
 create mode 100644 src/streamdiffusion/docs/modules/controlnet.md
 create mode 100644 src/streamdiffusion/docs/modules/image_processing.md
 create mode 100644 src/streamdiffusion/docs/modules/ipadapter.md
 create mode 100644 src/streamdiffusion/docs/modules/latent_processing.md
 create mode 100644 src/streamdiffusion/docs/pipeline.md
 create mode 100644 src/streamdiffusion/docs/preprocessing/orchestrators.md
 create mode 100644 src/streamdiffusion/docs/preprocessing/processors.md
 create mode 100644 src/streamdiffusion/docs/runtime_control.md
 create mode 100644 src/streamdiffusion/docs/stream_parameter_updater.md
 create mode 160000 src/streamdiffusion/preprocessing/processors/dinov3

diff --git a/README.md b/README.md
index dcda263f..bf14493a 100644
--- a/README.md
+++ b/README.md
@@ -369,6 +369,28 @@ stream.prepare(
 
 The delta has a moderating effect on the effectiveness of RCFG.
 
+## Additional Feature Documentation
+
+Comprehensive documentation is available in the [docs folder](src/streamdiffusion/docs/).
+
+- [Core Concepts](src/streamdiffusion/docs/hooks.md)
+- [Modules](src/streamdiffusion/docs/modules/)
+- [Preprocessing](src/streamdiffusion/docs/preprocessing/)
+- [Pipeline](src/streamdiffusion/docs/pipeline.md)
+- [Parameter Updater](src/streamdiffusion/docs/stream_parameter_updater.md)
+- [Wrapper](src/streamdiffusion/docs/wrapper.md)
+- [Config](src/streamdiffusion/docs/config.md)
+- [TensorRT](src/streamdiffusion/docs/acceleration/tensorrt.md)
+
+## Diagrams
+
+- [Architecture Overview](src/streamdiffusion/docs/diagrams/overall_architecture.md)
+- [Hooks Integration](src/streamdiffusion/docs/diagrams/hooks_integration.md)
+- [Orchestrator Flow](src/streamdiffusion/docs/diagrams/orchestrator_flow.md)
+- [Module Integration](src/streamdiffusion/docs/diagrams/module_integration.md)
+- [Parameter Updating](src/streamdiffusion/docs/diagrams/parameter_updating.md)
+- [TensorRT Pipeline](src/streamdiffusion/docs/diagrams/tensorrt_pipeline.md)
+
 ## Development Team
 
 [Aki](https://twitter.com/cumulo_autumn),
diff --git a/src/streamdiffusion/docs/acceleration/tensorrt.md b/src/streamdiffusion/docs/acceleration/tensorrt.md
new file mode 100644
index 00000000..f3171bcd
--- /dev/null
+++ b/src/streamdiffusion/docs/acceleration/tensorrt.md
@@ -0,0 +1,31 @@
+# TensorRT Acceleration
+
+## Overview
+
+TensorRT acceleration optimizes StreamDiffusion for realtime performance by compiling PyTorch models to TensorRT engines, supporting dynamic batch/resolution (384-1024), FP16, and CUDA graphs. Engines are built for UNet, VAE (encoder/decoder), ControlNet, Safety Checker. The system auto-fallbacks to PyTorch on OOM, with engine pooling for ControlNet.
+
+Key components:
+- **EngineBuilder**: Exports ONNX, optimizes, builds TRT (static/dynamic shapes).
+- **EngineManager**: Manages paths, compiles/loads engines (UNet/VAE/ControlNet).
+- **Runtime Engines**: UNet2DConditionModelEngine, AutoencoderKLEngine, ControlNetModelEngine (infer with shape cache).
+- **Export Wrappers**: UnifiedExportWrapper for UNet+ControlNet+IPAdapter (handles kwargs, scales).
+- **Utilities**: Engine class (buffers, infer), preprocess/decode helpers.
+
+Files: [`builder.py`](../../../acceleration/tensorrt/builder.py), [`engine_manager.py`](../../../acceleration/tensorrt/engine_manager.py), [`utilities.py`](../../../acceleration/tensorrt/utilities.py), wrappers in `export_wrappers/`.
+
+## Usage
+
+### Engine Building
+
+Use `EngineManager` in wrapper init (build_engines_if_missing=True):
+
+```python
+from streamdiffusion import StreamDiffusionWrapper
+
+wrapper = StreamDiffusionWrapper(
+    model_id_or_path="runwayml/stable-diffusion-v1-5",
+    acceleration="tensorrt",
+    engine_dir="engines",  # Output dir
+    build_engines_if_missing=True  # Compile if missing
+)
+# Builds: unet.engine, vae_encoder.engine, vae_decoder
\ No newline at end of file
diff --git a/src/streamdiffusion/docs/config.md b/src/streamdiffusion/docs/config.md
new file mode 100644
index 00000000..4fe07cf3
--- /dev/null
+++ b/src/streamdiffusion/docs/config.md
@@ -0,0 +1,44 @@
+# Config Management
+
+## Overview
+
+Config management in StreamDiffusion uses YAML/JSON files to define model, pipeline, blending, and module settings. The `config.py` module provides `load_config`/`save_config` for file I/O, validation for types/fields, and helpers like `create_wrapper_from_config` to instantiate `StreamDiffusionWrapper` from dicts. Supports legacy single prompts and new blending (prompt_list, seed_list), with normalization, interpolation methods.
+
+Key functions:
+- `load_config(path)`: Loads YAML/JSON, validates.
+- `save_config(config, path)`: Writes validated config.
+- `create_wrapper_from_config(config)`: Builds wrapper from dict, extracts params, handles blending.
+- `create_prompt_blending_config`/`create_seed_blending_config`: Helpers for blending.
+- `set_normalize_weights_config`: Sets normalization flags.
+- Validation: Ensures model_id, controlnets/ipadapters lists, hook processors (type, enabled, params), blending lists.
+
+Configs are loaded at startup; runtime updates via `update_stream_params` ([doc](../stream_parameter_updater.md)). Files: [`config.py`](../../../config.py).
+
+## File Format (YAML Example)
+
+```yaml
+model_id: "runwayml/stable-diffusion-v1-5"
+t_index_list: [0, 999]
+width: 512
+height: 512
+mode: "img2img"
+output_type: "pil"
+device: "cuda"
+dtype: "float16"
+use_controlnet: true
+controlnets:
+  - model_id: "lllyasviel/sd-controlnet-canny"
+    preprocessor: "canny"
+    conditioning_scale: 1.0
+    enabled: true
+    preprocessor_params:
+      threshold_low: 100
+      threshold_high: 200
+use_ipadapter: true
+ipadapters:
+  - ipadapter_model_path: "h94/IP-Adapter"
+    image_encoder_path: "openai/clip-vit-large-patch14"
+    scale: 0.8
+    type: "regular"
+prompt_blending:
+  prompt
\ No newline at end of file
diff --git a/src/streamdiffusion/docs/diagrams/hooks_integration.md b/src/streamdiffusion/docs/diagrams/hooks_integration.md
new file mode 100644
index 00000000..9386eafd
--- /dev/null
+++ b/src/streamdiffusion/docs/diagrams/hooks_integration.md
@@ -0,0 +1,13 @@
+# Hooks Integration
+
+```mermaid
+graph LR
+    A[Pipeline Stages] --> B[Embedding Hooks: Prompt Blending]
+    B --> C[UNet Hooks: ControlNet/IPAdapter]
+    C --> D[Orchestrator Calls: Processors]
+    D --> E[Latent/Image Hooks: Pre/Post Processing]
+    
+    F[StreamParameterUpdater] -.->|Update Configs| C
+    G[Config] -->|Register Hooks| B
+    G -->|Register Hooks| C
+    G -->|Register Hooks| E
\ No newline at end of file
diff --git a/src/streamdiffusion/docs/diagrams/module_integration.md b/src/streamdiffusion/docs/diagrams/module_integration.md
new file mode 100644
index 00000000..87e9f69f
--- /dev/null
+++ b/src/streamdiffusion/docs/diagrams/module_integration.md
@@ -0,0 +1,29 @@
+# Module Integration
+
+```mermaid
+graph TD
+    A[Input Image] --> B[Image Preprocessing Hooks]
+    B --> C[VAE Encode]
+    C --> D[Latent Preprocessing Hooks]
+    D --> E[UNet Forward]
+    
+    E --> F{ControlNet Active?}
+    F -->|Yes| G[Add Residuals: Down/Mid Blocks]
+    F -->|No| H{IPAdapter Active?}
+    H -->|Yes| I[Set IPAdapter Scale Vector]
+    H -->|No| J[Standard UNet Call]
+    G --> J
+    I --> J
+    
+    J --> K[Latent Postprocessing Hooks]
+    K --> L[VAE Decode]
+    L --> M[Image Postprocessing Hooks]
+    M --> N[Output Image]
+    
+    O[StreamParameterUpdater] -.->|Update Scales| I
+    P[Config] -->|Enable Modules| F
+    P -->|Enable Modules| H
+    P -->|Enable Modules| B
+    P -->|Enable Modules| D
+    P -->|Enable Modules| K
+    P -->|Enable Modules| M
\ No newline at end of file
diff --git a/src/streamdiffusion/docs/diagrams/orchestrator_flow.md b/src/streamdiffusion/docs/diagrams/orchestrator_flow.md
new file mode 100644
index 00000000..c412cd0b
--- /dev/null
+++ b/src/streamdiffusion/docs/diagrams/orchestrator_flow.md
@@ -0,0 +1,72 @@
+# Orchestrator Flow
+
+```mermaid
+graph TB
+    subgraph "Input Layer - Distinct Preprocessing Types"
+        A["ControlNet/IPAdapter Inputs: Raw Images for Module Preprocessing"]
+        B["Pipeline Hooks: Latent/Image Tensors for Hook Stages"]
+        C["Postprocessing: VAE Output Images for Enhancement"]
+    end
+    
+    subgraph "PreprocessingOrchestrator (ControlNet/IPAdapter - Intraframe Parallelism)"
+        D["Raw Images: Multiple ControlNets/IPAdapters"]
+        E["Group by Processor Type: e.g., All Canny Processors Grouped"]
+        F["Intraframe Parallel: ThreadPoolExecutor per Group"]
+        F --> G["Process Group in Parallel: e.g., Canny for CN1 and CN2 Simultaneously"]
+        G --> H["Merge/Broadcast Group Results to Specific Modules e.g. Canny to CN1 and CN2"]
+        I["Intraframe Sequential: Unique Processors Single Thread"]
+        H --> J["Cache by Type: Reuse Across Modules/Frames"]
+        I --> J
+        J --> K["Output Distinct Tensors for Each ControlNet/IPAdapter"]
+    end
+    
+    subgraph "PipelinePreprocessingOrchestrator (Hook Stages - Sequential Chain)"
+        L["Latent/Image Tensors from Pipeline Hooks"]
+        M["Sequential Chain: _execute_pipeline_chain"]
+        M --> N["Single Processor Application: e.g., Latent Feedback Sequential"]
+        N --> O["Next Processor in Order (order attr)"]
+        O --> P["Chain Continues: No Parallelism Within Chain"]
+        P --> M
+        Q["Output Processed Tensor to Next Pipeline Hook/Stage"]
+    end
+    
+    subgraph "PostprocessingOrchestrator (Output - Cached Sequential)"
+        R["VAE Decoded Images"]
+        S["Sequential with Cache Check: _apply_single_postprocessor"]
+        S --> T{"Cache Hit for Identical Input?"}
+        T -->|Yes| U["Reuse Cached: e.g., Same Upscale Params"]
+        T -->|No| V["Process Sequential: Realesrgan_trt then Sharpen"]
+        U --> W["Output Enhanced Image"]
+        V --> W
+    end
+    
+    subgraph "BaseOrchestrator (All Types - Interframe Pipelining)"
+        X{"Use Sync Processing? (Feedback/Temporal Config)"}
+        X -->|Yes| Y["Process Sync: Sequential/Immediate (No Lag, Low Throughput)"]
+        X -->|No| Z["Background Thread: Pipelined/1-Frame Lag (High Throughput)"]
+        Y --> AA["Apply Current Frame Results"]
+        Z --> AA
+        AA --> BB["Output to Pipeline/Next Orchestrator/Stage"]
+    end
+    
+    subgraph "Shared Resources & Integration"
+        CC["OrchestratorUser Mixin: Attach Shared Orchestrators to Modules/Hooks"]
+        DD["StreamParameterUpdater: Runtime Param Updates to Processors"]
+        EE["Thread Lock: Ensure Thread-Safe Parallel & Pipelined Execution"]
+    end
+    
+    A --> E
+    B --> M
+    C --> S
+    E --> X
+    M --> X
+    S --> X
+    CC -.->|"Shared Orchestrators"| E
+    CC -.->|"Shared Orchestrators"| M
+    CC -.->|"Shared Orchestrators"| S
+    DD -.->|"Dynamic Params"| E
+    DD -.->|"Dynamic Params"| M
+    DD -.->|"Dynamic Params"| S
+    EE -.->|"Protect"| F
+    EE -.->|"Protect"| M
+    EE -.->|"Protect"| S
\ No newline at end of file
diff --git a/src/streamdiffusion/docs/diagrams/overall_architecture.md b/src/streamdiffusion/docs/diagrams/overall_architecture.md
new file mode 100644
index 00000000..08757b45
--- /dev/null
+++ b/src/streamdiffusion/docs/diagrams/overall_architecture.md
@@ -0,0 +1,60 @@
+# Overall Architecture
+
+```mermaid
+graph TB
+    subgraph "Input"
+        A["Input: Image/Prompt/Control Image"]
+    end
+    
+    subgraph "Preprocessing"
+        B["Preprocessing Orchestrators"]
+        C["Processors: Edge Detection (Canny/HED), Pose (OpenPose), Depth (MiDaS)"]
+        D["Parallel Execution via ThreadPool"]
+    end
+    
+    subgraph "Pipeline Core"
+        E["StreamDiffusion.prepare: Embeddings/Timesteps/Noise"]
+        F["UNet Steps with Hooks"]
+        G["ControlNet/IPAdapter Injection"]
+        H["Orchestrator Calls: Latent/Image Hooks"]
+    end
+    
+    subgraph "Decoding"
+        I["VAE Decode"]
+        J["Postprocessing Orchestrators"]
+    end
+    
+    subgraph "Output"
+        K["Output: Image"]
+    end
+    
+    subgraph "Management"
+        L["StreamParameterUpdater: Blending/Caching"]
+        M["Config Loader: YAML/JSON"]
+    end
+    
+    subgraph "Acceleration"
+        N["TensorRT Engines: UNet/VAE/ControlNet"]
+        O["Runtime Inference"]
+    end
+    
+    A --> B
+    B --> C
+    C --> D
+    D --> E
+    E --> F
+    F --> G
+    G --> H
+    H --> I
+    I --> J
+    J --> K
+    
+    L -.->|"Updates"| E
+    L -.->|"Updates"| F
+    M -.->|"Setup"| B
+    M -.->|"Setup"| J
+    M -.->|"Setup"| L
+    N -.->|"Optimized"| F
+    N -.->|"Optimized"| I
+    O -.->|"Fallback PyTorch"| F
+    O -.->|"Fallback PyTorch"| I
\ No newline at end of file
diff --git a/src/streamdiffusion/docs/diagrams/parameter_updating.md b/src/streamdiffusion/docs/diagrams/parameter_updating.md
new file mode 100644
index 00000000..d9da38e5
--- /dev/null
+++ b/src/streamdiffusion/docs/diagrams/parameter_updating.md
@@ -0,0 +1,56 @@
+# Parameter Updating
+
+```mermaid
+graph TD
+    subgraph "Runtime Update Entry Point"
+        A["update_stream_params Call"]
+        A --> B["Thread Lock: _update_lock"]
+    end
+    
+    subgraph "Parameter Branches"
+        B --> C{"Prompt List Provided?"}
+        C -->|Yes| D["_cache_prompt_embeddings: Cache/Encode Prompts"]
+        C -->|No| E{"Seed List Provided?"}
+        E -->|Yes| F["_cache_seed_noise: Cache/Generate Noise"]
+        E -->|No| G{"ControlNet Config Provided?"}
+        G -->|Yes| H["Diff Current vs Desired: Add/Remove/Update Scales/Enabled"]
+        H --> I["Update ControlNet Pipeline: reorder/add/remove/update_scale"]
+        G -->|No| J{"IPAdapter Config Provided?"}
+        J -->|Yes| K["Update Scale: Uniform or Per-Layer Vector"]
+        K --> L["Set Weight Type: Linear/SLERP for Layers/Steps"]
+        J -->|No| M{"Hook Config Provided? e.g., Image/Latent Pre/Post"}
+        M -->|Yes| N["Diff Current vs Desired: Modify/Add/Remove Processors In-Place"]
+        N --> O["Update Processor Params/Enabled/Order"]
+        M -->|No| P["Update Timestep/Resolution: Recalc Scalings/Batches"]
+    end
+    
+    subgraph "Blending & Caching Layer"
+        D --> Q["_apply_prompt_blending: Linear/SLERP"]
+        F --> R["_apply_seed_blending: Linear/SLERP"]
+        I --> S["Cache Stats: Hits/Misses for Monitoring"]
+        L --> S
+        O --> S
+        P --> S
+        Q --> T["Update Pipeline Tensors: prompt_embeds/init_noise"]
+        R --> T
+        S --> T
+    end
+    
+    subgraph "Pipeline Integration"
+        T --> U["Pipeline Uses Updated Tensors/Hooks"]
+    end
+    
+    subgraph "Shared Utilities"
+        V["Normalize Weights: Sum to 1.0 (Optional)"]
+        W["Thread-Safe Lock: Prevent Race Conditions"]
+        X["Cache Reindexing: Handle Add/Remove"]
+    end
+    
+    C -.->|"Use"| V
+    E -.->|"Use"| V
+    B -.->|"Protect"| W
+    D -.->|"Use"| X
+    F -.->|"Use"| X
+    H -.->|"Use"| X
+    J -.->|"Use"| X
+    M -.->|"Use"| X
\ No newline at end of file
diff --git a/src/streamdiffusion/docs/diagrams/sdxl_vs_sd15.md b/src/streamdiffusion/docs/diagrams/sdxl_vs_sd15.md
new file mode 100644
index 00000000..4562c360
--- /dev/null
+++ b/src/streamdiffusion/docs/diagrams/sdxl_vs_sd15.md
@@ -0,0 +1,124 @@
+# SDXL vs SD1.5 Pipeline Comparison
+
+```mermaid
+graph TB
+    subgraph "Model Detection & Architecture"
+        A[Input Model] --> B{Model Detection}
+        B -->|SD1.5/SD2.1| C["Single Text Encoder<br/>CLIP ViT-L: 768 dim"]
+        B -->|SDXL| D["Dual Text Encoders<br/>CLIP ViT-L: 768 dim<br/>OpenCLIP ViT-bigG: 1280 dim"]
+        
+        C --> E["UNet Architecture<br/>4 Down Blocks<br/>12 ControlNet outputs"]
+        D --> F["UNet Architecture<br/>3 Down Blocks<br/>9 ControlNet outputs"]
+    end
+    
+    subgraph "Text Encoding Phase"
+        G[Prompt Input] --> H{Model Type?}
+        H -->|SD1.5| I["Single encode_prompt()<br/>Returns: 2 tensors<br/>- prompt_embeds [B, 77, 768]<br/>- negative_prompt_embeds [B, 77, 768]"]
+        H -->|SDXL| J["Dual encode_prompt()<br/>Returns: 4 tensors<br/>- prompt_embeds [B, 77, 2048]<br/>- negative_prompt_embeds [B, 77, 2048]<br/>- pooled_prompt_embeds [B, 1280]<br/>- negative_pooled_prompt_embeds [B, 1280]"]
+        
+        I --> K["Concatenated Embeddings<br/>Context Dim: 768"]
+        J --> L["Concatenated Embeddings<br/>Context Dim: 2048<br/>+ Micro-conditioning"]
+    end
+    
+    subgraph "SDXL Micro-Conditioning"
+        M[Size/Crop Info] --> N["Time IDs Creation<br/>[original_size, crops, target_size]"]
+        N --> O["Added Cond Kwargs<br/>text_embeds: [B, 1280]<br/>time_ids: [B, 6]"]
+        O --> P["Conditioning Cache<br/>Per batch/CFG type"]
+    end
+    
+    subgraph "UNet Calling Conventions"
+        Q[UNet Forward Call] --> R{Model Type?}
+        R -->|SD1.5| S["Positional Arguments<br/>unet(sample, timestep, encoder_hidden_states)<br/>+ return_dict=False"]
+        R -->|SDXL| T["Named Arguments<br/>unet(sample, timestep, encoder_hidden_states,<br/>added_cond_kwargs=conditioning)<br/>+ return_dict=False"]
+        
+        S --> U["Standard UNet Output<br/>[noise_prediction]"]
+        T --> V["Standard UNet Output<br/>[noise_prediction]"]
+    end
+    
+    subgraph "ControlNet Integration"
+        W[ControlNet Input] --> X{Model Type?}
+        X -->|SD1.5| Y["12 Down Block Residuals<br/>+ 1 Mid Block Residual<br/>Standard ControlNet"]
+        X -->|SDXL| Z["9 Down Block Residuals<br/>+ 1 Mid Block Residual<br/>SDXL ControlNet + added_cond_kwargs"]
+        
+        Y --> AA["Residual Injection<br/>down_block_additional_residuals<br/>mid_block_additional_residual"]
+        Z --> BB["Residual Injection + Conditioning<br/>down_block_additional_residuals<br/>mid_block_additional_residual<br/>+ added_cond_kwargs"]
+    end
+    
+    subgraph "TensorRT Export Differences"
+        CC[ONNX Export] --> DD{Model Type?}
+        DD -->|SD1.5| EE["Standard Export<br/>Inputs: sample, timestep, encoder_hidden_states<br/>Outputs: noise_prediction"]
+        DD -->|SDXL| FF["SDXL Export<br/>Inputs: sample, timestep, encoder_hidden_states,<br/>text_embeds, time_ids<br/>Outputs: noise_prediction"]
+        
+        EE --> GG["TensorRT Engine<br/>Standard UNet"]
+        FF --> HH["TensorRT Engine<br/>SDXL UNet + Conditioning"]
+    end
+    
+    subgraph "Memory & Performance"
+        II[Memory Usage] --> JJ{Model Type?}
+        JJ -->|SD1.5| KK["Lower Memory<br/>- Single text encoder<br/>- Smaller embeddings (768 dim)<br/>- Standard UNet"]
+        JJ -->|SDXL| LL["Higher Memory<br/>- Dual text encoders<br/>- Larger embeddings (2048 dim)<br/>- Micro-conditioning cache<br/>- Larger UNet"]
+        
+        MM[Performance] --> NN{Model Type?}
+        NN -->|SD1.5| OO["Faster Inference<br/>- Simpler architecture<br/>- Less conditioning overhead"]
+        NN -->|SDXL| PP["Slower Inference<br/>- More complex conditioning<br/>- Larger model size<br/>- Additional tensor operations"]
+    end
+    
+    subgraph "Configuration Differences"
+        QQ[Config Parameters] --> RR{Model Type?}
+        RR -->|SD1.5| SS["Standard Config<br/>- model_id<br/>- t_index_list<br/>- width/height<br/>- cfg_type"]
+        RR -->|SDXL| TT["SDXL Config<br/>- model_id (SDXL specific)<br/>- t_index_list<br/>- width/height (1024x1024 typical)<br/>- cfg_type<br/>- Micro-conditioning params"]
+    end
+    
+    %% Connections
+    C --> I
+    D --> J
+    L --> P
+    P --> T
+    K --> S
+    L --> T
+    E --> Y
+    F --> Z
+    AA --> S
+    BB --> T
+    EE --> GG
+    FF --> HH
+    
+    %% Styling
+    classDef sdxl fill:#e1f5fe,stroke:#01579b,stroke-width:2px
+    classDef sd15 fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
+    classDef common fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px
+    
+    class D,F,J,L,O,P,T,Z,BB,FF,HH,LL,PP,TT sdxl
+    class C,E,I,K,S,Y,AA,EE,GG,KK,OO,SS sd15
+    class A,B,G,M,Q,W,CC,II,MM,QQ common
+```
+
+## Key Differences Summary
+
+### **Text Encoding**
+- **SD1.5**: Single CLIP ViT-L encoder (768 dim), 2 output tensors
+- **SDXL**: Dual encoders (CLIP ViT-L + OpenCLIP ViT-bigG), 4 output tensors (2048 dim total)
+
+### **UNet Architecture**
+- **SD1.5**: 4 down blocks, 12 ControlNet residual outputs
+- **SDXL**: 3 down blocks, 9 ControlNet residual outputs
+
+### **Conditioning**
+- **SD1.5**: Basic text conditioning only
+- **SDXL**: Text + micro-conditioning (size, crop, target resolution)
+
+### **UNet Calling**
+- **SD1.5**: Positional arguments, simple interface
+- **SDXL**: Named arguments with `added_cond_kwargs` for micro-conditioning
+
+### **Memory & Performance**
+- **SD1.5**: Lower memory, faster inference
+- **SDXL**: Higher memory, more complex but better quality
+
+### **TensorRT Integration**
+- **SD1.5**: Standard export with 3 inputs
+- **SDXL**: Extended export with 5 inputs (including conditioning)
+
+---
+
+*See [Overall Architecture](overall_architecture.md) for complete pipeline flow.*
diff --git a/src/streamdiffusion/docs/diagrams/tensorrt_pipeline.md b/src/streamdiffusion/docs/diagrams/tensorrt_pipeline.md
new file mode 100644
index 00000000..6aa0e202
--- /dev/null
+++ b/src/streamdiffusion/docs/diagrams/tensorrt_pipeline.md
@@ -0,0 +1,15 @@
+# TensorRT Pipeline
+
+```mermaid
+graph TD
+    A[PyTorch Model] --> B[ONNX Export: UnifiedWrapper]
+    B --> C[Optimize ONNX: Graph Surgeon]
+    C --> D[Build TRT Engine: Dynamic Shapes]
+    D --> E[Runtime Engine: Infer with Buffers]
+    E --> F[Shape Cache: Reuse Buffers]
+    F --> G[Output: Optimized Pred]
+    
+    H[EngineManager] -->|Compile/Load| D
+    I[ControlNet/IPAdapter] -.->|Wrappers| B
+    J[Config] -->|Params| H
+    K[Runtime] -->|Fallback PyTorch| E
\ No newline at end of file
diff --git a/src/streamdiffusion/docs/hooks.md b/src/streamdiffusion/docs/hooks.md
new file mode 100644
index 00000000..ce55ea94
--- /dev/null
+++ b/src/streamdiffusion/docs/hooks.md
@@ -0,0 +1,127 @@
+# Hook-Module System
+
+## Overview
+
+The Hook-Module System in StreamDiffusion provides a flexible mechanism for extending and customizing the diffusion pipeline without modifying the core implementation. Hooks are callable functions that can be injected at specific stages of the generation process, such as embedding preparation, UNet denoising steps, image/latent processing, and more. This system promotes modularity, allowing users to add features like custom conditioning, post-processing effects, or dynamic parameter adjustments.
+
+Hooks are particularly useful for integrating advanced modules (e.g., ControlNet, IPAdapter) or implementing realtime adaptations in streaming scenarios. They operate on context objects that carry relevant tensors and metadata, enabling non-destructive modifications via return values (e.g., deltas to augment UNet kwargs).
+
+The system is defined in [`hooks.py`](hooks.py) and integrated into the main [`pipeline.py`](pipeline.py) and [`StreamParameterUpdater`](stream_parameter_updater.py).
+
+## Key Concepts
+
+### Context Objects
+
+Hooks receive and return context dataclasses that encapsulate the state at each stage. These provide access to tensors (e.g., latents, embeddings) and metadata (e.g., timesteps, dimensions).
+
+- **`EmbedsCtx`**: Context for text embedding hooks.
+  - `prompt_embeds`: Torch tensor [batch, seq_len, dim] of positive prompt embeddings.
+  - `negative_prompt_embeds`: Optional torch tensor [batch, seq_len, dim] for negative prompts.
+  - *Purpose*: Modify or augment embeddings before UNet input (e.g., add custom noise or blending).
+
+- **`StepCtx`**: Context for UNet denoising step hooks.
+  - `x_t_latent`: Torch tensor of current latent (possibly CFG-expanded).
+  - `t_list`: Torch tensor of timesteps (possibly CFG-expanded).
+  - `step_index`: Optional int for current step in total steps.
+  - `guidance_mode`: String ("none", "full", "self", "initialize") indicating CFG mode.
+  - `sdxl_cond`: Optional dict with SDXL micro-conditioning tensors.
+  - *Purpose*: Inspect or alter state during each denoising iteration (e.g., inject dynamic guidance).
+
+- **`UnetKwargsDelta`**: Delta object returned by UNet hooks to modify UNet call arguments.
+  - `down_block_additional_residuals`: Optional list of torch tensors for down-block residuals.
+  - `mid_block_additional_residual`: Optional torch tensor for mid-block residual.
+  - `added_cond_kwargs`: Optional dict of additional conditioning kwargs (e.g., ControlNet outputs).
+  - `extra_unet_kwargs`: Optional dict for direct UNet kwargs (e.g., scales, adapters).
+  - *Purpose*: Non-invasively augment UNet forward pass without rewriting the model.
+
+- **`ImageCtx`**: Context for image-space processing hooks (pre/post VAE).
+  - `image`: Torch tensor [B, C, H, W] in pixel space.
+  - `width`: Image width (int).
+  - `height`: Image height (int).
+  - `step_index`: Optional int for multi-step processing.
+  - *Purpose*: Apply effects like sharpening or upscaling on decoded images.
+
+- **`LatentCtx`**: Context for latent-space processing hooks.
+  - `latent`: Torch tensor [B, C, H/8, W/8] in latent space.
+  - `timestep`: Optional torch tensor for diffusion context.
+  - `step_index`: Optional int for multi-step processing.
+  - *Purpose*: Modify latents before/after UNet (e.g., noise injection or feedback loops).
+
+### Hook Types
+
+Hooks are defined as type aliases for clarity:
+
+- `EmbeddingHook = Callable[[EmbedsCtx], EmbedsCtx]`: Modifies embedding contexts.
+- `UnetHook = Callable[[StepCtx], UnetKwargsDelta]`: Produces deltas for UNet steps.
+- `ImageHook = Callable[[ImageCtx], ImageCtx]`: Processes image tensors.
+- `LatentHook = Callable[[LatentCtx], LatentCtx]`: Processes latent tensors.
+
+Hooks can be pre- or post-processing (e.g., `_apply_image_preprocessing_hooks` in pipeline).
+
+## Usage
+
+### Defining a Hook
+
+Hooks are simple callables matching the type signature. Here's an example UnetHook that adds a custom residual:
+
+```python
+from streamdiffusion.hooks import StepCtx, UnetKwargsDelta
+import torch
+
+def custom_residual_hook(ctx: StepCtx) -> UnetKwargsDelta:
+    # Example: Add a simple residual based on timestep
+    if ctx.step_index is not None and ctx.step_index % 5 == 0:
+        residual = torch.zeros_like(ctx.x_t_latent) + 0.01 * ctx.t_list.unsqueeze(1)
+        return UnetKwargsDelta(
+            down_block_additional_residuals=[residual] * 4,  # Assuming 4 down blocks
+            mid_block_additional_residual=residual
+        )
+    return UnetKwargsDelta()  # No-op delta
+```
+
+### Registering Hooks
+
+Hooks are typically registered via configuration in `StreamParameterUpdater` or directly in the `StreamDiffusion` pipeline. For example, using config:
+
+In your YAML config (see [Config Management](../config.md)):
+
+```yaml
+pipeline_hooks:
+  unet:
+    - type: "custom"  # Or module path
+      class: "path.to.CustomUnetHook"
+      params:
+        scale: 0.5
+```
+
+Or programmatically in `StreamDiffusion`:
+
+```python
+from streamdiffusion import StreamDiffusion
+
+stream = StreamDiffusion(...)
+# Assuming pipeline supports direct registration; check pipeline.py for exact API
+stream.register_hook("unet", custom_residual_hook)
+```
+
+### Integration Points
+
+- **Embedding Stage**: Called in `prepare()` before UNet, via `_apply_embedding_hooks`.
+- **UNet Steps**: Invoked per step in `unet_step()`, accumulating deltas.
+- **Image/Latent Processing**: Applied in `encode_image/decode_image` and hook methods like `_apply_image_preprocessing_hooks`.
+- **Multi-Stage**: Supports chaining multiple hooks; order matters (pre- then post-).
+
+For advanced usage with modules like ControlNet, hooks handle injection of `added_cond_kwargs`. See [Pipeline Documentation](../pipeline.md) for full integration.
+
+### Best Practices
+
+- Keep hooks lightweight (no heavy computation; use TensorRT for speed).
+- Handle batching and device consistency (contexts are on GPU).
+- Return unmodified contexts/deltas for no-op cases.
+- Use locks in streaming scenarios to avoid race conditions (handled internally in updater).
+
+For examples with specific modules, refer to [ControlNet](../modules/controlnet.md) and [IPAdapter](../modules/ipadapter.md).
+
+---
+
+*See [Index](../index.md) for all documentation.*
\ No newline at end of file
diff --git a/src/streamdiffusion/docs/index.md b/src/streamdiffusion/docs/index.md
new file mode 100644
index 00000000..3e74b409
--- /dev/null
+++ b/src/streamdiffusion/docs/index.md
@@ -0,0 +1,38 @@
+# StreamDiffusion Documentation
+
+## Core Concepts
+
+- [Hook-Module System](hooks.md): Extensible pipeline hooks for modules.
+- [Multi-Stage Processing](pipeline.md): Pipeline stages and integration.
+- [StreamParameterUpdater](stream_parameter_updater.md): Runtime parameter blending/caching.
+- [Runtime Control Surface](runtime_control.md): Real-time control methods for live streaming.
+
+## Modules
+
+- [ControlNet Module](modules/controlnet.md): Conditional guidance.
+- [IPAdapter Module](modules/ipadapter.md): Style/reference adaptation.
+- [Image Processing Modules](modules/image_processing.md): Image domain preprocessing and postprocessing.
+- [Latent Processing Modules](modules/latent_processing.md): Latent domain preprocessing and postprocessing.
+
+## Preprocessing
+
+- [Orchestrators](preprocessing/orchestrators.md): Parallel/pipelined execution.
+- [Processors](preprocessing/processors.md): Edge/pose/depth utilities.
+
+## Configuration
+
+- [Config Management](config.md): YAML/JSON loading and validation.
+
+## Acceleration
+
+- [TensorRT](acceleration/tensorrt.md): Engine building and runtime.
+
+## Diagrams
+
+- [Overall Architecture](diagrams/overall_architecture.md)
+- [SDXL vs SD1.5 Comparison](diagrams/sdxl_vs_sd15.md)
+- [Hooks Integration](diagrams/hooks_integration.md)
+- [Orchestrator Flow](diagrams/orchestrator_flow.md)
+- [Module Integration](diagrams/module_integration.md)
+- [Parameter Updating](diagrams/parameter_updating.md)
+- [TensorRT Pipeline](diagrams/tensorrt_pipeline.md)
\ No newline at end of file
diff --git a/src/streamdiffusion/docs/modules/controlnet.md b/src/streamdiffusion/docs/modules/controlnet.md
new file mode 100644
index 00000000..78bed08f
--- /dev/null
+++ b/src/streamdiffusion/docs/modules/controlnet.md
@@ -0,0 +1,194 @@
+# ControlNet Module
+
+## Overview
+
+The ControlNet Module enables integration of ControlNet models into the StreamDiffusion pipeline, allowing conditional guidance from external control images (e.g., edge maps, poses, depth). It supports multiple ControlNets simultaneously, each with independent preprocessors, conditioning scales, and enable/disable states. The module is designed for realtime streaming, with efficient tensor preparation, caching for SDXL conditioning, and seamless fallback between PyTorch and TensorRT engines.
+
+Key features:
+- Dynamic addition/removal/reordering of ControlNets.
+- Automatic preprocessing via the [Preprocessing Orchestrator](../preprocessing/orchestrators.md).
+- UNet hook integration for injecting residuals without modifying core pipeline (see [Hook-Module System](../hooks.md)).
+- TensorRT acceleration for low-latency inference.
+- SDXL support with optimized micro-conditioning caching.
+
+The core implementation is in [`controlnet_module.py`](../../../modules/controlnet_module.py), with TensorRT support in [`controlnet_engine.py`](../../../acceleration/tensorrt/runtime_engines/controlnet_engine.py), export wrappers in [`controlnet_export.py`](../../../acceleration/tensorrt/export_wrappers/controlnet_export.py), and model definitions in [`controlnet_models.py`](../../../acceleration/tensorrt/models/controlnet_models.py).
+
+## Configuration
+
+ControlNets are configured via `ControlNetConfig`:
+
+- `model_id`: str - Path or HuggingFace ID of the ControlNet model (e.g., "lllyasviel/sd-controlnet-canny").
+- `preprocessor`: Optional[str] - Preprocessor name (e.g., "canny", "openpose"; see [Realtime Processors](../preprocessing/processors.md)).
+- `conditioning_scale`: float - Guidance strength (default: 1.0).
+- `enabled`: bool - Whether to use this ControlNet (default: True).
+- `conditioning_channels`: Optional[int] - Input channels for model (auto-detected).
+- `preprocessor_params`: Optional[Dict[str, Any]] - Params for preprocessor (e.g., thresholds).
+
+Configs are loaded via [Config Management](../config.md) and managed by `StreamParameterUpdater` for runtime updates.
+
+## Usage
+
+### Initialization and Installation
+
+The module is installed into a `StreamDiffusion` instance:
+
+```python
+from streamdiffusion import StreamDiffusion
+from streamdiffusion.modules import ControlNetModule, ControlNetConfig
+
+stream = StreamDiffusion(...)  # From config or manual setup
+controlnet_module = ControlNetModule(device="cuda", dtype=torch.float16)
+controlnet_module.install(stream)  # Registers UNet hook and exposes collections
+```
+
+### Adding ControlNets
+
+Add via config or programmatically:
+
+```python
+cfg = ControlNetConfig(
+    model_id="lllyasviel/sd-controlnet-canny",
+    preprocessor="canny",
+    conditioning_scale=1.0,
+    preprocessor_params={"threshold_low": 100, "threshold_high": 200}
+)
+controlnet_module.add_controlnet(cfg, control_image="path/to/image.jpg")
+```
+
+Supports TensorRT: If an engine exists for the `model_id`, it auto-switches.
+
+### Updating Control Images
+
+Efficient per-frame updates (pipelined or sync for feedback processors):
+
+```python
+# Update single index
+controlnet_module.update_control_image_efficient("new_control.jpg", index=0)
+
+# Bulk update all
+controlnet_module.update_control_image_efficient("new_stream.jpg")  # Applies to all active
+```
+
+Images are preprocessed (e.g., canny edges) and cached for batch/device alignment.
+
+### Managing Scales and State
+
+```python
+controlnet_module.update_controlnet_scale(0, 0.8)  # Reduce strength
+controlnet_module.update_controlnet_enabled(0, False)  # Disable
+controlnet_module.remove_controlnet(0)  # Remove
+controlnet_module.reorder_controlnets_by_model_ids(["canny", "pose"])  # Reorder
+```
+
+### Integration in Pipeline
+
+The module registers a `UnetHook` that computes residuals per step:
+
+- Inputs: Latent `x_t`, timesteps `t_list`, embeddings.
+- Outputs: `UnetKwargsDelta` with `down_block_additional_residuals` (list of tensors) and `mid_block_additional_residual`.
+- Multi-ControlNet: Residuals are summed for combined guidance.
+- Caching: Prepared tensors reused across steps; SDXL cond cached per frame.
+
+In [`pipeline.py`](../../../pipeline.py), the hook is called in `unet_step()` to augment UNet kwargs.
+
+## TensorRT Integration
+
+### Export
+
+For acceleration, export ControlNet to ONNX using `SDXLControlNetExportWrapper` (handles SDXL `added_cond_kwargs`):
+
+```python
+from streamdiffusion.acceleration.tensorrt.export_wrappers import SDXLControlNetExportWrapper
+import torch.onnx
+
+controlnet = ControlNetModel.from_pretrained("model_id")
+wrapper = SDXLControlNetExportWrapper(controlnet)
+
+# Sample inputs for SDXL (7 inputs)
+sample_input = (sample, timestep, encoder_hidden_states, controlnet_cond, 
+                conditioning_scale, text_embeds, time_ids)
+
+torch.onnx.export(wrapper, sample_input, "controlnet.onnx", 
+                  input_names=["sample", "timestep", "encoder_hidden_states", 
+                               "controlnet_cond", "conditioning_scale", 
+                               "text_embeds", "time_ids"],
+                  output_names=[f"down_block_{i:02d}" for i in range(9)] + ["mid_block"],
+                  dynamic_axes={...})  # See controlnet_models.py for full config
+```
+
+Outputs 9 down blocks (320/640/1280 channels, progressive downsampling) + mid block.
+
+### Runtime Engine
+
+The `ControlNetModelEngine` loads the TRT engine:
+
+```python
+from streamdiffusion.acceleration.tensorrt.runtime_engines import ControlNetModelEngine
+import polygraphy.cuda as cuda
+
+stream = cuda.Stream()
+engine = ControlNetModelEngine("controlnet.engine", stream, model_type="sdxl")
+
+# Inference (auto-handles inputs/outputs)
+down_blocks, mid_block = engine(sample, timestep, encoder_hidden_states, 
+                                controlnet_cond, scale=1.0, 
+                                text_embeds=..., time_ids=...)
+```
+
+- Dynamic shapes: Batch 1-4, resolutions 384-1024 (latent 48-128).
+- Caching: Shape resolution and buffer allocation.
+- Fallback: If no engine, uses PyTorch model.
+- Model defs: `ControlNetTRT`/`ControlNetSDXLTRT` in [`controlnet_models.py`](../../../acceleration/tensorrt/models/controlnet_models.py) for builder profiles.
+
+Engines are pooled by `model_id` in the module for auto-substitution.
+
+## Examples
+
+### Basic Canny Control
+
+```python
+# Config
+cfg = {
+    "pipeline_hooks": {
+        "controlnet": [
+            {"model_id": "lllyasviel/sd-controlnet-canny", "preprocessor": "canny", "conditioning_scale": 1.0}
+        ]
+    }
+}
+
+# Load and generate
+stream = StreamDiffusion.from_config("config.yaml")
+stream.update_control_image("edge_image.jpg")
+images = stream(batch_size=1)  # Guided by canny edges
+```
+
+### Multi-ControlNet with TensorRT
+
+```python
+# Add multiple
+controlnet_module.add_controlnet(ControlNetConfig("controlnet-canny-trt", preprocessor="canny"))
+controlnet_module.add_controlnet(ControlNetConfig("controlnet-openpose-trt", preprocessor="openpose"))
+
+# Stream with updates
+while streaming:
+    stream.update_control_image("current_pose.jpg")  # Updates both
+    images = stream()
+```
+
+### Realtime Feedback Loop
+
+Combine with feedback processors (e.g., latent_feedback):
+
+```python
+cfg = ControlNetConfig("controlnet-depth", preprocessor="depth_tensorrt")
+controlnet_module.add_controlnet(cfg)
+
+# In loop: Previous output feeds next input via orchestrator
+stream.update_control_image(previous_image)  # Auto-preprocessed to depth
+```
+
+For full streaming setup, see [StreamDiffusionWrapper](../wrapper.md) and [Multi-Stage Processing](../pipeline.md).
+
+---
+
+*See [Index](../index.md) for all documentation. For acceleration details, see [TensorRT Acceleration](../acceleration/tensorrt.md).*
\ No newline at end of file
diff --git a/src/streamdiffusion/docs/modules/image_processing.md b/src/streamdiffusion/docs/modules/image_processing.md
new file mode 100644
index 00000000..3bfcc1a4
--- /dev/null
+++ b/src/streamdiffusion/docs/modules/image_processing.md
@@ -0,0 +1,218 @@
+# Image Processing Modules
+
+## Overview
+
+The image processing modules provide a flexible framework for processing images at different stages of the StreamDiffusion pipeline. These modules operate in the image domain (pixel space) and support both preprocessing (before VAE encoding) and postprocessing (after VAE decoding) operations.
+
+## Architecture
+
+The image processing system consists of three main classes:
+
+- **`ImageProcessingModule`**: Base class providing shared functionality
+- **`ImagePreprocessingModule`**: Handles image processing before VAE encoding
+- **`ImagePostprocessingModule`**: Handles image processing after VAE decoding
+
+## Base Class: ImageProcessingModule
+
+The `ImageProcessingModule` serves as the foundation for all image domain processing modules.
+
+### Key Features
+
+- **Sequential Chain Execution**: Processes images through a chain of processors in order
+- **Processor Management**: Add, configure, and order processors dynamically
+- **Orchestrator Integration**: Uses preprocessing orchestrators for efficient processing
+- **Parameter Alignment**: Automatically aligns processor parameters with stream dimensions
+
+### Core Methods
+
+```python
+def add_processor(self, proc_config: Dict[str, Any]) -> None:
+    """Add a processor using the existing registry."""
+    
+def _process_image_chain(self, input_image: torch.Tensor) -> torch.Tensor:
+    """Execute sequential chain of processors in image domain."""
+    
+def _get_ordered_processors(self) -> List[Any]:
+    """Return enabled processors in execution order."""
+```
+
+### Processor Configuration
+
+Processors are added using configuration dictionaries:
+
+```python
+proc_config = {
+    'type': 'processor_name',      # Required: processor type from registry
+    'enabled': True,              # Optional: enable/disable processor
+    'order': 0,                   # Optional: execution order
+    'params': {                   # Optional: processor-specific parameters
+        'param1': 'value1',
+        'param2': 'value2'
+    }
+}
+```
+
+## ImagePreprocessingModule
+
+Processes images before VAE encoding in the pipeline.
+
+### Timing
+
+- **Execution Point**: After `image_processor.preprocess()`, before `similar_image_filter`
+- **Pipeline Stage**: Input preprocessing stage
+- **Performance**: Uses pipelined processing for optimization
+
+### Key Features
+
+- **Pipelined Processing**: Frame N-1 results while starting Frame N processing
+- **Performance Optimization**: Uses `PipelinePreprocessingOrchestrator`
+- **Fallback Support**: Falls back to synchronous processing when needed
+
+### Installation
+
+```python
+def install(self, stream) -> None:
+    """Install module by registering hook with stream and attaching orchestrators."""
+    self._stream = stream
+    self.attach_orchestrator(stream)  # Sequential chain processing (fallback)
+    self.attach_pipeline_preprocessing_orchestrator(stream)  # Pipelined processing
+    stream.image_preprocessing_hooks.append(self.build_image_hook())
+```
+
+### Usage Example
+
+```python
+# Create preprocessing module
+image_preproc = ImagePreprocessingModule()
+
+# Add processors
+image_preproc.add_processor({
+    'type': 'resize',
+    'params': {'width': 512, 'height': 512}
+})
+
+image_preproc.add_processor({
+    'type': 'normalize',
+    'params': {'mean': [0.5], 'std': [0.5]}
+})
+
+# Install in stream
+image_preproc.install(stream)
+```
+
+## ImagePostprocessingModule
+
+Processes images after VAE decoding in the pipeline.
+
+### Timing
+
+- **Execution Point**: After `decode_image()`, before returning final output
+- **Pipeline Stage**: Output postprocessing stage
+- **Performance**: Uses pipelined processing for optimization
+
+### Key Features
+
+- **Pipelined Processing**: Frame N-1 results while starting Frame N processing
+- **Performance Optimization**: Uses `PostprocessingOrchestrator`
+- **Fallback Support**: Falls back to synchronous processing when needed
+
+### Installation
+
+```python
+def install(self, stream) -> None:
+    """Install module by registering hook with stream and attaching orchestrators."""
+    self._stream = stream
+    self.attach_preprocessing_orchestrator(stream)  # Sequential chain processing (fallback)
+    self.attach_postprocessing_orchestrator(stream)  # Pipelined processing
+    stream.image_postprocessing_hooks.append(self.build_image_hook())
+```
+
+### Usage Example
+
+```python
+# Create postprocessing module
+image_postproc = ImagePostprocessingModule()
+
+# Add processors
+image_postproc.add_processor({
+    'type': 'upscale',
+    'params': {'scale_factor': 2}
+})
+
+image_postproc.add_processor({
+    'type': 'sharpen',
+    'params': {'strength': 0.5}
+})
+
+# Install in stream
+image_postproc.install(stream)
+```
+
+## Integration with Pipeline
+
+The image processing modules integrate with the StreamDiffusion pipeline through hooks:
+
+### Pipeline Integration Points
+
+1. **Image Preprocessing Hooks**: Applied after built-in preprocessing, before filtering
+2. **Image Postprocessing Hooks**: Applied after VAE decoding, before final output
+
+### Hook Execution Flow
+
+```python
+# In StreamDiffusion.__call__()
+x = self.image_processor.preprocess(x, self.height, self.width)
+x = self._apply_image_preprocessing_hooks(x)  # ImagePreprocessingModule
+# ... VAE encoding and diffusion ...
+x_output = self.decode_image(x_0_pred_out)
+x_output = self._apply_image_postprocessing_hooks(x_output)  # ImagePostprocessingModule
+```
+
+## Performance Considerations
+
+### Pipelined Processing
+
+Both preprocessing and postprocessing modules use pipelined processing for performance:
+
+- **Frame Overlap**: Process Frame N-1 results while starting Frame N
+- **Orchestrator Integration**: Uses specialized orchestrators for efficiency
+- **Fallback Support**: Graceful degradation to synchronous processing
+
+### Memory Management
+
+- **Stream Reference**: Modules store stream reference for dimension access
+- **Parameter Alignment**: Automatic alignment with stream resolution
+- **Processor Lifecycle**: Processors are managed through the orchestrator system
+
+## Common Use Cases
+
+### Image Preprocessing
+
+- **Resizing**: Adjust image dimensions to match model requirements
+- **Normalization**: Apply mean/std normalization
+- **Color Space Conversion**: Convert between different color spaces
+- **Format Conversion**: Convert between different image formats
+
+### Image Postprocessing
+
+- **Upscaling**: Increase image resolution
+- **Enhancement**: Apply sharpening, denoising, or other enhancements
+- **Format Conversion**: Convert output to desired format
+- **Quality Adjustment**: Apply final quality improvements
+
+## Error Handling
+
+The modules include robust error handling:
+
+- **Parameter Validation**: Validates processor configuration
+- **Graceful Degradation**: Falls back to synchronous processing on errors
+- **Exception Handling**: Catches and handles processor-specific errors
+- **Resource Management**: Proper cleanup of resources
+
+## Best Practices
+
+1. **Processor Ordering**: Use the `order` parameter to control execution sequence
+2. **Parameter Alignment**: Let the module handle dimension alignment automatically
+3. **Performance Testing**: Test with pipelined processing for optimal performance
+4. **Error Handling**: Implement proper error handling for production use
+5. **Resource Management**: Monitor memory usage with complex processor chains
diff --git a/src/streamdiffusion/docs/modules/ipadapter.md b/src/streamdiffusion/docs/modules/ipadapter.md
new file mode 100644
index 00000000..5ff48059
--- /dev/null
+++ b/src/streamdiffusion/docs/modules/ipadapter.md
@@ -0,0 +1,222 @@
+# IPAdapter Module
+
+## Overview
+
+The IPAdapter Module enables image-to-image adaptation in StreamDiffusion by injecting image features (embeddings) into the UNet's attention layers, allowing style transfer, reference image guidance, or face ID consistency without full retraining. It supports multiple IPAdapters (e.g., standard, plus, face ID), dynamic scales, and efficient realtime updates via embedding caching. The module integrates seamlessly with the hook system, passing `extra_unet_kwargs` like `ipadapter_scale` to the UNet forward pass.
+
+Key features:
+- Loading and management of multiple IPAdapter models (HuggingFace or local).
+- Automatic injection of custom attention processors into UNet (or TensorRT engines).
+- Embedding computation from style/reference images via dedicated preprocessor.
+- Face ID support using InsightFace for identity preservation.
+- Streaming optimizations: Per-frame embedding updates, batch caching, weight normalization.
+
+Core files: [`ipadapter_module.py`](../../../modules/ipadapter_module.py) for module logic, [`ipadapter_embedding.py`](../../../preprocessing/processors/ipadapter_embedding.py) for preprocessor, and [`unet_ipadapter_export.py`](../../../acceleration/tensorrt/export_wrappers/unet_ipadapter_export.py) for TensorRT export.
+
+## Configuration
+
+IPAdapters are configured similarly to ControlNets, via pipeline hooks in config:
+
+- `ipadapter_model_path`: str - IPAdapter model path/ID (e.g., "h94/IP-Adapter", "h94/IP-Adapter-FaceID").
+- `image_encoder_path`: str - Image encoder model path (e.g., "openai/clip-vit-large-patch14").
+- `scale`: float - Adapter strength (default: 1.0; can be list for multi-adapter).
+- `weight_type`: str - Weight computation ("linear" or "slerp" for multi-image blending).
+- `num_image_tokens`: int - Number of image tokens (default: 4 for standard, 16 for plus).
+- `is_faceid`: bool - Enable face ID mode (requires InsightFace).
+- `insightface_model_name`: str - InsightFace model name for face ID (optional).
+- Other: `device`, `dtype`, `cache_dir` for embeddings.
+
+See [Config Management](../config.md) for YAML examples.
+
+## Usage
+
+### Initialization and Installation
+
+```python
+from streamdiffusion import StreamDiffusion
+from streamdiffusion.modules import IPAdapterModule
+
+stream = StreamDiffusion(...)
+ipadapter_module = IPAdapterModule(device="cuda", dtype=torch.float16)
+ipadapter_module.install(stream)  # Injects into UNet, registers hook
+```
+
+### Adding IPAdapters
+
+```python
+from streamdiffusion.modules.ipadapter_module import IPAdapterConfig
+
+cfg = IPAdapterConfig(
+    ipadapter_model_path="h94/IP-Adapter",
+    image_encoder_path="openai/clip-vit-large-patch14",
+    scale=0.8,
+    num_image_tokens=4
+)
+ipadapter_module.add_ipadapter(cfg, style_image="style.jpg")  # Optional initial image
+```
+
+For Face ID:
+
+```python
+cfg_face = IPAdapterConfig(
+    ipadapter_model_path="h94/IP-Adapter-FaceID",
+    image_encoder_path="openai/clip-vit-large-patch14",
+    scale=1.0,
+    is_faceid=True,
+    insightface_model_name="buffalo_l",
+    style_image="reference_face.jpg"
+)
+ipadapter_module.add_ipadapter(cfg_face)
+```
+
+Supports TensorRT: Engines auto-substituted if available for the model.
+
+### Updating Style Images
+
+Efficient updates for streaming (computes embeddings, caches for reuse):
+
+```python
+# Single update
+ipadapter_module.update_style_image("new_style.jpg", is_stream=True)  # Streaming mode
+
+# Multi-image blending (weights normalized)
+ipadapter_module.update_style_image(["style1.jpg", "style2.jpg"], weights=[0.6, 0.4])
+```
+
+Embeddings are computed via `IPAdapterEmbedding` preprocessor (CLIP ViT-H, 768-dim), with face ID using InsightFace for portrait adapter. Caches avoid recompute per step.
+
+### Managing Adapters
+
+```python
+ipadapter_module.update_ipadapter_scale(0, 0.7)  # Adjust strength
+ipadapter_module.remove_ipadapter(0)  # Remove
+ipadapter_module.get_current_config()  # List active configs
+```
+
+### Integration in Pipeline
+
+The module provides a `UnetHook` (via `build_unet_hook()`) that injects:
+
+- `extra_unet_kwargs={"ipadapter_scale": scales_tensor}` into UNet call.
+- Attention processors replace standard ones in UNet for feature injection.
+- Multi-adapter: Layer weights blended (linear/SLERP) based on scales.
+
+In `unet_step()` ([pipeline.py`](../../../pipeline.py)), the hook ensures adapters are applied per denoising step. For TensorRT, the export wrapper preserves processor logic in ONNX.
+
+## TensorRT Integration
+
+### Export
+
+Export UNet with IPAdapter processors using `IPAdapterUNetExportWrapper`:
+
+```python
+from streamdiffusion.acceleration.tensorrt.export_wrappers import IPAdapterUNetExportWrapper
+import torch.onnx
+
+# Load UNet with IPAdapter
+unet = UNet2DConditionModel.from_pretrained("runwayml/stable-diffusion-v1-5")
+ipadapter = IPAdapter.from_pretrained("h94/IP-Adapter")  # Example
+unet.set_ipadapter(ipadapter, scale=1.0)  # Inject processor
+
+wrapper = IPAdapterUNetExportWrapper(unet)
+
+sample_input = (sample, timestep, encoder_hidden_states, added_time_ids)  # Standard UNet inputs
+torch.onnx.export(wrapper, sample_input, "unet_ipadapter.onnx",
+                  input_names=["sample", "timestep", "encoder_hidden_states", "added_time_ids"],
+                  output_names=["down_block_res_samples", "mid_block_res_sample", "time_embed", "time_text_embed"],
+                  dynamic_axes={...},  # Batch, height, width dynamic
+                  opset_version=17)
+```
+
+The wrapper handles multiple adapters, dynamic scales, and layer-specific weights during export.
+
+### Runtime
+
+During inference, if a TensorRT engine is loaded (via `engine_manager`), the module swaps the UNet for the engine, passing `ipadapter_scale` as extra kwarg. Embeddings are precomputed and injected via attention (preserved in TRT).
+
+## IPAdapter Embedding Processor
+
+The dedicated preprocessor (`ipadapter_embedding.py`) computes image embeddings:
+
+- **Standard**: CLIP ViT-H/14 on style image → 768-dim features.
+- **Face ID**: InsightFace + CLIP for identity embedding (portrait adapter).
+- **Realtime**: Caches embeddings per image key, supports batch/streaming, GPU acceleration.
+- Usage: Integrated automatically when `preprocessor="ipadapter_embedding"` in config.
+
+Example standalone:
+
+```python
+from streamdiffusion.preprocessing.processors import IPAdapterEmbedding
+
+preprocessor = IPAdapterEmbedding(pipeline_ref=stream)
+embedding = preprocessor.process_image("style.jpg")  # Returns torch.Tensor
+```
+
+Supports multi-image: Averages or blends embeddings.
+
+## Examples
+
+### Basic Style Transfer
+
+```python
+# Config
+cfg = {
+    "pipeline_hooks": {
+        "ipadapter": [
+            {
+                "ipadapter_model_path": "h94/IP-Adapter", 
+                "image_encoder_path": "openai/clip-vit-large-patch14",
+                "scale": 0.8, 
+                "num_image_tokens": 4
+            }
+        ]
+    }
+}
+
+stream = StreamDiffusion.from_config("config.yaml")
+stream.update_style_image("art_style.jpg")
+images = stream(prompt="A cat", batch_size=1)  # Cat in art style
+```
+
+### Face ID Consistency
+
+```python
+cfg = IPAdapterConfig(
+    ipadapter_model_path="h94/IP-Adapter-FaceID", 
+    image_encoder_path="openai/clip-vit-large-patch14",
+    scale=1.0, 
+    is_faceid=True,
+    insightface_model_name="buffalo_l"
+)
+ipadapter_module.add_ipadapter(cfg, "reference_face.jpg")
+
+# Generate consistent faces
+stream.update_style_image("new_pose.jpg")  # Pose change, face preserved
+images = stream(prompt="Portrait of person in different outfits")
+```
+
+### Multi-Adapter Blending
+
+```python
+ipadapter_module.add_ipadapter(IPAdapterConfig(
+    ipadapter_model_path="h94/IP-Adapter", 
+    image_encoder_path="openai/clip-vit-large-patch14",
+    scale=0.6
+))
+ipadapter_module.add_ipadapter(IPAdapterConfig(
+    ipadapter_model_path="h94/IP-Adapter-FaceID", 
+    image_encoder_path="openai/clip-vit-large-patch14",
+    scale=0.4, 
+    is_faceid=True,
+    insightface_model_name="buffalo_l"
+))
+
+# Blends styles with face consistency
+stream.update_style_image(["style1.jpg", "face_ref.jpg"])
+```
+
+For full integration with ControlNet or multi-stage, see [Multi-Stage Processing](../pipeline.md) and [StreamParameterUpdater](../stream_parameter_updater.md).
+
+---
+
+*See [Index](../index.md) for all documentation. For preprocessing details, see [Realtime Processors](../preprocessing/processors.md).*
\ No newline at end of file
diff --git a/src/streamdiffusion/docs/modules/latent_processing.md b/src/streamdiffusion/docs/modules/latent_processing.md
new file mode 100644
index 00000000..21223b9e
--- /dev/null
+++ b/src/streamdiffusion/docs/modules/latent_processing.md
@@ -0,0 +1,281 @@
+# Latent Processing Modules
+
+## Overview
+
+The latent processing modules provide a flexible framework for processing latent representations at different stages of the StreamDiffusion pipeline. These modules operate in the latent domain (compressed representation space) and support both preprocessing (after VAE encoding, before diffusion) and postprocessing (after diffusion, before VAE decoding) operations.
+
+## Architecture
+
+The latent processing system consists of three main classes:
+
+- **`LatentProcessingModule`**: Base class providing shared functionality
+- **`LatentPreprocessingModule`**: Handles latent processing after VAE encoding, before diffusion
+- **`LatentPostprocessingModule`**: Handles latent processing after diffusion, before VAE decoding
+
+## Base Class: LatentProcessingModule
+
+The `LatentProcessingModule` serves as the foundation for all latent domain processing modules.
+
+### Key Features
+
+- **Sequential Chain Execution**: Processes latents through a chain of processors in order
+- **Processor Management**: Add, configure, and order processors dynamically
+- **Orchestrator Integration**: Uses preprocessing orchestrators for efficient processing
+- **Pipeline Reference**: Automatically handles pipeline reference through factory functions
+
+### Core Methods
+
+```python
+def add_processor(self, proc_config: Dict[str, Any]) -> None:
+    """Add a processor using the existing registry."""
+    
+def _process_latent_chain(self, input_latent: torch.Tensor) -> torch.Tensor:
+    """Execute sequential chain of processors in latent domain."""
+    
+def _get_ordered_processors(self) -> List[Any]:
+    """Return enabled processors in execution order."""
+```
+
+### Processor Configuration
+
+Processors are added using configuration dictionaries:
+
+```python
+proc_config = {
+    'type': 'processor_name',      # Required: processor type from registry
+    'enabled': True,              # Optional: enable/disable processor
+    'order': 0,                   # Optional: execution order
+    'params': {                   # Optional: processor-specific parameters
+        'param1': 'value1',
+        'param2': 'value2'
+    }
+}
+```
+
+## LatentPreprocessingModule
+
+Processes latent representations after VAE encoding, before diffusion.
+
+### Timing
+
+- **Execution Point**: After `encode_image()`, before `predict_x0_batch()`
+- **Pipeline Stage**: Pre-diffusion processing stage
+- **Domain**: Latent space (compressed representation)
+
+### Key Features
+
+- **Sequential Processing**: Uses orchestrator for sequential chain execution
+- **Pipeline Integration**: Automatically handles pipeline reference
+- **Hook Registration**: Registers with `latent_preprocessing_hooks`
+
+### Installation
+
+```python
+def install(self, stream) -> None:
+    """Install module by registering hook with stream and attaching orchestrator."""
+    self.attach_orchestrator(stream)
+    self._stream = stream  # Store stream reference
+    stream.latent_preprocessing_hooks.append(self.build_latent_hook())
+```
+
+### Usage Example
+
+```python
+# Create preprocessing module
+latent_preproc = LatentPreprocessingModule()
+
+# Add processors
+latent_preproc.add_processor({
+    'type': 'latent_noise',
+    'params': {'strength': 0.1}
+})
+
+latent_preproc.add_processor({
+    'type': 'latent_scale',
+    'params': {'scale_factor': 1.1}
+})
+
+# Install in stream
+latent_preproc.install(stream)
+```
+
+## LatentPostprocessingModule
+
+Processes latent representations after diffusion, before VAE decoding.
+
+### Timing
+
+- **Execution Point**: After `predict_x0_batch()`, before `decode_image()`
+- **Pipeline Stage**: Post-diffusion processing stage
+- **Domain**: Latent space (compressed representation)
+
+### Key Features
+
+- **Sequential Processing**: Uses orchestrator for sequential chain execution
+- **Pipeline Integration**: Automatically handles pipeline reference
+- **Hook Registration**: Registers with `latent_postprocessing_hooks`
+
+### Installation
+
+```python
+def install(self, stream) -> None:
+    """Install module by registering hook with stream and attaching orchestrator."""
+    self.attach_orchestrator(stream)
+    self._stream = stream  # Store stream reference
+    stream.latent_postprocessing_hooks.append(self.build_latent_hook())
+```
+
+### Usage Example
+
+```python
+# Create postprocessing module
+latent_postproc = LatentPostprocessingModule()
+
+# Add processors
+latent_postproc.add_processor({
+    'type': 'latent_denoise',
+    'params': {'strength': 0.05}
+})
+
+latent_postproc.add_processor({
+    'type': 'latent_enhance',
+    'params': {'enhancement_factor': 1.2}
+})
+
+# Install in stream
+latent_postproc.install(stream)
+```
+
+## Integration with Pipeline
+
+The latent processing modules integrate with the StreamDiffusion pipeline through hooks:
+
+### Pipeline Integration Points
+
+1. **Latent Preprocessing Hooks**: Applied after VAE encoding, before diffusion
+2. **Latent Postprocessing Hooks**: Applied after diffusion, before VAE decoding
+
+### Hook Execution Flow
+
+```python
+# In StreamDiffusion.__call__()
+x_t_latent = self.encode_image(x)
+x_t_latent = self._apply_latent_preprocessing_hooks(x_t_latent)  # LatentPreprocessingModule
+x_0_pred_out = self.predict_x0_batch(x_t_latent)
+x_0_pred_out = self._apply_latent_postprocessing_hooks(x_0_pred_out)  # LatentPostprocessingModule
+x_output = self.decode_image(x_0_pred_out)
+```
+
+## Latent Domain Characteristics
+
+### Latent Space Properties
+
+- **Compressed Representation**: 4-channel latent tensors (typically 64x64 for 512x512 images)
+- **Normalized Values**: Latent values are typically normalized to [-1, 1] range
+- **Spatial Structure**: Maintains spatial relationships in compressed form
+- **Model-Specific**: Latent dimensions depend on the VAE and model architecture
+
+### Processing Considerations
+
+- **Memory Efficiency**: Latent processing is more memory-efficient than image processing
+- **Quality Impact**: Changes in latent space directly affect final image quality
+- **Model Compatibility**: Processors must be compatible with the specific VAE model
+- **Numerical Stability**: Careful handling of latent values to maintain stability
+
+## Common Use Cases
+
+### Latent Preprocessing
+
+- **Noise Addition**: Add controlled noise for variation
+- **Latent Scaling**: Adjust latent magnitude for different effects
+- **Conditional Processing**: Apply conditional transformations based on prompts
+- **Style Transfer**: Modify latent representations for style effects
+
+### Latent Postprocessing
+
+- **Denoising**: Remove artifacts or unwanted noise
+- **Enhancement**: Improve latent quality before decoding
+- **Correction**: Fix issues introduced during diffusion
+- **Fine-tuning**: Apply final adjustments to latent representations
+
+## Performance Considerations
+
+### Sequential Processing
+
+Latent processing modules use sequential processing:
+
+- **Orchestrator Integration**: Uses preprocessing orchestrators for efficiency
+- **Chain Execution**: Processes latents through ordered processor chains
+- **Pipeline Reference**: Automatic handling of pipeline context
+
+### Memory Management
+
+- **Stream Reference**: Modules store stream reference for context
+- **Processor Lifecycle**: Processors are managed through the orchestrator system
+- **Latent Caching**: Previous latent results are cached for feedback processors
+
+## Error Handling
+
+The modules include robust error handling:
+
+- **Parameter Validation**: Validates processor configuration
+- **Pipeline Reference**: Automatic handling of pipeline context
+- **Exception Handling**: Catches and handles processor-specific errors
+- **Resource Management**: Proper cleanup of resources
+
+## Best Practices
+
+1. **Processor Ordering**: Use the `order` parameter to control execution sequence
+2. **Latent Stability**: Ensure processors maintain numerical stability
+3. **Quality Testing**: Test processors thoroughly to avoid quality degradation
+4. **Memory Monitoring**: Monitor memory usage with complex processor chains
+5. **Model Compatibility**: Ensure processors work with your specific VAE model
+
+## Advanced Usage
+
+### Custom Latent Processors
+
+Create custom processors for specific latent operations:
+
+```python
+class CustomLatentProcessor:
+    def __init__(self, custom_param):
+        self.custom_param = custom_param
+    
+    def __call__(self, latent):
+        # Custom latent processing logic
+        return processed_latent
+```
+
+### Feedback Processing
+
+Use previous latent results for feedback-based processing:
+
+```python
+# Access previous latent result in processor
+def process_with_feedback(latent, prev_latent):
+    # Use previous latent for feedback processing
+    return processed_latent
+```
+
+## Integration with Other Modules
+
+### ControlNet Integration
+
+Latent processing can work alongside ControlNet modules:
+
+```python
+# Latent preprocessing before ControlNet influence
+latent_preproc.add_processor({'type': 'latent_enhance'})
+controlnet_module.install(stream)
+```
+
+### IPAdapter Integration
+
+Latent processing can enhance IPAdapter effects:
+
+```python
+# Latent postprocessing after IPAdapter
+ipadapter_module.install(stream)
+latent_postproc.add_processor({'type': 'latent_refine'})
+```
diff --git a/src/streamdiffusion/docs/pipeline.md b/src/streamdiffusion/docs/pipeline.md
new file mode 100644
index 00000000..02d570ca
--- /dev/null
+++ b/src/streamdiffusion/docs/pipeline.md
@@ -0,0 +1,130 @@
+# Multi-Stage Processing
+
+## Overview
+
+Multi-stage processing in StreamDiffusion refers to the modular, extensible diffusion pipeline that supports realtime streaming generation. The core `StreamDiffusion` class ([`pipeline.py`](../../../pipeline.py)) orchestrates stages: preparation (prompt/timestep setup), denoising (UNet steps with CFG), and decoding (VAE), with hooks for [modules](../modules/) and [orchestrators](../preprocessing/orchestrators.md). It handles batching for efficiency, SDXL/Turbo models, LoRA/LCM, and TensorRT acceleration.
+
+Key features:
+- **Streaming**: Frame-by-frame generation with buffer for multi-frame consistency.
+- **CFG Modes**: "none", "full", "self", "initialize" for guidance (self-attention based).
+- **Batching**: Denoising batch for speed (use_denoising_batch), frame buffer.
+- **Hooks**: Integration points for ControlNet/IPAdapter residuals, IPAdapter scales.
+- **Model Detection**: Auto-detects SD1.5/SDXL/Turbo for conditional kwargs.
+- **Optimizations**: EMA timing, similar image filter, SDXL cond caching.
+
+The pipeline supports img2img/txt2img via `__call__` and `txt2img`. For wrapper usage, see [StreamDiffusionWrapper](../wrapper.md).
+
+## Stages
+
+### 1. Preparation (`prepare()`)
+
+Sets up embeddings, timesteps, noise:
+
+- **Prompt Encoding**: Text to embeds (SD1.5: 2 tensors; SDXL: 4 with pooled/time IDs).
+  - CFG: Uncond/cond cat or repeat based on mode.
+  - Hooks: `embedding_hooks` modify `EmbedsCtx` (e.g., blending).
+- **Timesteps**: Subset from scheduler (LCM), alpha/beta scalings.
+- **Noise**: Randn init, stock for self-CFG.
+- **SDXL Cond**: Pooled embeds/time IDs (orig/target size, crops), cached per batch/CFG.
+
+Example:
+
+```python
+stream.prepare(prompt="A cat", guidance_scale=7.5, cfg_type="self", seed=42)
+```
+
+### 2. Denoising (`predict_x0_batch()` / `unet_step()`)
+
+Core generation loop:
+
+- **Input**: Latent noise (txt2img) or encoded image (img2img).
+- **Batch**: Repeat timesteps/noise for multi-frame (frame_buffer_size).
+- **UNet Call**: Per step/timestep:
+  - Inputs: Sample, timestep, embeds, SDXL cond (`added_cond_kwargs`).
+  - Hooks: `unet_hooks` add residuals/scales (`UnetKwargsDelta` with down/mid residuals, extra like ipadapter_scale).
+  - CFG: Uncond/cond blending ("full": cat batch; "self": stock noise; "initialize": uncond first).
+  - Output: Model pred, denoised via scheduler (x0 from xt - beta*pred / alpha).
+- **Loop**: Batch for speed (use_denoising_batch), or single-step for low VRAM.
+- **TensorRT**: Auto-detects engine, passes extras/residuals.
+
+Multi-stage: Pre-latent hooks (orchestrators), per-step UNet with modules, post-latent hooks.
+
+Example in `__call__`:
+
+```python
+x_t_latent = torch.randn(...)  # Or encode_image(x)
+x_0_pred_out = stream.predict_x0_batch(x_t_latent)
+```
+
+### 3. Decoding (`decode_image()` / VAE)
+
+- Scales latent by vae_scale_factor, decodes to pixels.
+- Post-image hooks (orchestrators for upscale/sharpen).
+- Similar filter skips if duplicate (realtime opt).
+
+Full flow in `__call__`:
+
+```python
+x = stream.image_processor.preprocess(image)  # If img2img
+x = stream._apply_image_preprocessing_hooks(x)  # Orchestrators
+if similar_filter: x = filter(x)
+x_t_latent = encode_image(x)
+x_t_latent = _apply_latent_preprocessing_hooks(x_t_latent)  # Orchestrators
+x_0_pred = predict_x0_batch(x_t_latent)
+x_0_pred = _apply_latent_postprocessing_hooks(x_0_pred)  # Orchestrators
+x_out = decode_image(x_0_pred)
+x_out = _apply_image_postprocessing_hooks(x_out)  # Orchestrators
+return x_out
+```
+
+## Integration
+
+- **Modules**: ControlNet/IPAdapter register `unet_hooks` for residuals/scales in `unet_step`.
+- **Orchestrators**: Hooks call orchestrators (e.g., `_apply_latent_preprocessing_hooks` → PipelinePreprocessingOrchestrator).
+- **Updater**: `StreamParameterUpdater` ([doc](../stream_parameter_updater.md)) manages prompt/seed blending, controlnet/ipadapter configs.
+- **TensorRT**: UNet engine in `unet_step` (positional args + extras).
+- **LoRA/LCM**: `load_lora/fuse_lora` before prepare; LCM scheduler.
+- **Multi-Stage**: Stages chain hooks/orchestrators; feedback via latent_feedback.py.
+
+CFG Modes:
+- "none": No guidance (guidance_scale=1).
+- "full": Uncond/cond per sample.
+- "self": Stock noise for uncond (efficient).
+- "initialize": Uncond first, then cond.
+
+SDXL: Added cond (text_embeds/time_ids), detected via model_detection.py.
+
+## Usage Examples
+
+### Basic Txt2Img
+
+```python
+from streamdiffusion import StreamDiffusion
+from diffusers import StableDiffusionPipeline
+
+pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
+stream = StreamDiffusion(pipe, t_index_list=[0, 999], width=512, height=512)
+stream.prepare("A cat", guidance_scale=7.5)
+images = stream.txt2img(batch_size=1)  # Or stream()
+```
+
+### Img2Img Streaming
+
+```python
+stream.prepare("Cat in style", cfg_type="self")
+while streaming:
+    prev_img = ...  # From previous frame
+    images = stream(prev_img)  # Encodes, denoises, decodes with hooks
+```
+
+### With Modules/Orchestrators
+
+Modules install hooks; orchestrators chain processors in hooks.
+
+For custom stages, extend `__call__` or add hooks.
+
+See [Config](../config.md) for t_index_list/CFG setup.
+
+---
+
+*See [Index](../index.md) for all documentation. For parameters, see [StreamParameterUpdater](../stream_parameter_updater.md).*
\ No newline at end of file
diff --git a/src/streamdiffusion/docs/preprocessing/orchestrators.md b/src/streamdiffusion/docs/preprocessing/orchestrators.md
new file mode 100644
index 00000000..39f4f8a7
--- /dev/null
+++ b/src/streamdiffusion/docs/preprocessing/orchestrators.md
@@ -0,0 +1,20 @@
+# Preprocessing Orchestrators
+
+## Overview
+
+The Preprocessing Orchestrators manage the execution of preprocessors and postprocessors in StreamDiffusion, enabling efficient, parallelized, and pipelined processing for realtime streaming. They handle input preparation (e.g., edge detection, pose estimation), pipeline integration (e.g., latent modifications), and output enhancement (e.g., upscaling), with optimizations like caching, thread pooling, and CUDA streams to minimize latency.
+
+Key orchestrators:
+- **BaseOrchestrator**: Generic base for pipelined processing with sync fallback for feedback loops.
+- **PreprocessingOrchestrator**: Handles module inputs (ControlNet/IPAdapter), parallelizes across multiple preprocessors, caches for identical frames.
+- **PipelinePreprocessingOrchestrator**: Processes tensors in pipeline hooks (pre/post UNet/VAE), sequential for dependencies.
+- **PostprocessingOrchestrator**: Applies enhancements to generated images, with input caching for repeated outputs.
+- **OrchestratorUser**: Mixin for modules to attach shared orchestrators.
+
+Orchestrators are lazily created and shared across modules via `StreamDiffusion` instances. Core files: [`base_orchestrator.py`](../../../preprocessing/base_orchestrator.py), [`preprocessing_orchestrator.py`](../../../preprocessing/preprocessing_orchestrator.py), [`pipeline_preprocessing_orchestrator.py`](../../../preprocessing/pipeline_preprocessing_orchestrator.py), [`postprocessing_orchestrator.py`](../../../preprocessing/postprocessing_orchestrator.py), [`orchestrator_user.py`](../../../preprocessing/orchestrator_user.py).
+
+## BaseOrchestrator
+
+Generic foundation for all orchestrators:
+
+- **Pipelining**: Background thread processing for next frame while
\ No newline at end of file
diff --git a/src/streamdiffusion/docs/preprocessing/processors.md b/src/streamdiffusion/docs/preprocessing/processors.md
new file mode 100644
index 00000000..dce320ce
--- /dev/null
+++ b/src/streamdiffusion/docs/preprocessing/processors.md
@@ -0,0 +1,237 @@
+# Processor System
+
+## Overview
+
+The processor system provides a registry-based architecture for modular image and latent processing components. Processors are executed via [Preprocessing Orchestrators](orchestrators.md) for parallel/pipelined efficiency and handle input data preparation for modules like ControlNet and IPAdapter, as well as pipeline/latent stage enhancements.
+
+**Key Features:**
+- **Registry-based**: Dynamic processor discovery and instantiation
+- **Template Pattern**: Consistent interface with automatic input validation and size handling
+- **GPU Acceleration**: Tensor processing with fallback to PIL
+- **Pipeline Awareness**: Support for processors that need access to previous pipeline state
+- **TensorRT Support**: High-performance variants for production deployment
+
+## Architecture
+
+### Base Classes
+
+#### BasePreprocessor
+
+Abstract base class implementing the template method pattern:
+
+```python
+from streamdiffusion.preprocessing.processors import BasePreprocessor
+
+class MyProcessor(BasePreprocessor):
+    def _process_core(self, image: Image.Image) -> Image.Image:
+        # Implement your processing logic here
+        return processed_image
+```
+
+**Key Methods:**
+- `process()`: Main entry point for PIL image processing
+- `process_tensor()`: GPU tensor processing
+- `get_preprocessor_metadata()`: Class method returning processor metadata
+
+#### PipelineAwareProcessor
+
+For processors that need access to pipeline state (previous outputs):
+
+```python
+from streamdiffusion.preprocessing.processors import PipelineAwareProcessor
+
+class FeedbackProcessor(PipelineAwareProcessor):
+    def _process_core(self, image: Image.Image) -> Image.Image:
+        # Access previous pipeline output via self.pipeline_ref
+        prev_output = self.pipeline_ref.prev_image_result
+        return blend_with_previous(image, prev_output)
+```
+
+**Features:**
+- Automatic synchronous processing to avoid temporal artifacts
+- Pipeline reference injection for accessing previous outputs
+- Required pipeline_ref parameter validation
+
+## Registry System
+
+### Core Registry Functions
+
+```python
+from streamdiffusion.preprocessing.processors import (
+    get_preprocessor,
+    get_preprocessor_class, 
+    list_preprocessors,
+    register_preprocessor
+)
+
+# List available processors
+available = list_preprocessors()
+print(f"Available processors: {available}")
+
+# Get processor class
+ProcessorClass = get_preprocessor_class("canny")
+
+# Get processor instance
+processor = get_preprocessor("canny")
+
+# Get pipeline-aware processor
+feedback_processor = get_preprocessor("latent_feedback", pipeline_ref=stream)
+
+# Register custom processor
+register_preprocessor("my_custom", MyCustomProcessor)
+```
+
+### Processor Discovery
+
+The registry automatically handles:
+- **Conditional imports**: TensorRT and MediaPipe processors based on availability
+- **Dynamic registration**: Available processors adapt to system capabilities
+- **Pipeline awareness**: Automatic detection of processors requiring pipeline reference
+
+### Metadata System
+
+Get comprehensive processor information:
+
+```python
+# Get processor metadata
+metadata = ProcessorClass.get_preprocessor_metadata()
+print(f"Display name: {metadata['display_name']}")
+print(f"Description: {metadata['description']}")
+print(f"Parameters: {metadata['parameters']}")
+print(f"Use cases: {metadata['use_cases']}")
+```
+
+**Metadata Structure:**
+```python
+{
+    "display_name": "Human-readable name",
+    "description": "Detailed description of functionality", 
+    "parameters": {
+        "param_name": {
+            "type": "parameter_type",
+            "default": "default_value",
+            "description": "Parameter description",
+            "range": [min_val, max_val]  # Optional
+        }
+    },
+    "use_cases": ["list", "of", "common", "applications"]
+}
+```
+
+## Processor Categories
+
+### Standard Processors
+
+Inherit from `BasePreprocessor` for stateless processing:
+- Edge detection (Canny, HED, Lineart)
+- Computer vision (Pose, Depth, Segmentation) 
+- Image enhancement (Blur, Sharpen, Upscale)
+- Utilities (Passthrough, External)
+
+### Pipeline-Aware Processors  
+
+Inherit from `PipelineAwareProcessor` for temporal processing:
+- `feedback`: Frame-to-frame temporal blending
+- `latent_feedback`: Latent space temporal consistency
+- Custom processors requiring previous pipeline state
+
+### TensorRT Accelerated
+
+High-performance variants for production:
+- `depth_tensorrt`: Accelerated depth estimation
+- `pose_tensorrt`: Accelerated pose detection
+- `realesrgan_trt`: Tiled super-resolution
+- `temporal_net_tensorrt`: Temporal flow processing
+
+## Developer API
+
+### Creating Custom Processors
+
+```python
+from streamdiffusion.preprocessing.processors import BasePreprocessor
+
+class CustomProcessor(BasePreprocessor):
+    @classmethod
+    def get_preprocessor_metadata(cls):
+        return {
+            "display_name": "Custom Processor",
+            "description": "My custom image processor",
+            "parameters": {
+                "strength": {
+                    "type": "float",
+                    "default": 1.0,
+                    "description": "Processing strength",
+                    "range": [0.0, 2.0]
+                }
+            },
+            "use_cases": ["Custom processing", "Special effects"]
+        }
+    
+    def _process_core(self, image: Image.Image) -> Image.Image:
+        strength = self.params.get('strength', 1.0)
+        # Implement custom processing
+        return processed_image
+
+# Register the processor
+register_preprocessor("custom", CustomProcessor)
+```
+
+### Using Processors Programmatically
+
+```python
+# Instantiate processor with parameters
+processor = get_preprocessor("blur", kernel_size=5, sigma=2.0)
+
+# Process image
+result = processor.process(input_image)
+
+# Process tensor directly on GPU
+result_tensor = processor.process_tensor(input_tensor)
+
+# Check processor capabilities
+metadata = processor.get_preprocessor_metadata()
+available_params = metadata["parameters"]
+```
+
+### Pipeline Integration
+
+```python
+# Configure processor in runtime config
+image_preprocessing_config = [
+    {
+        "type": "custom",
+        "enabled": True,
+        "order": 0,
+        "params": {
+            "strength": 1.5
+        }
+    }
+]
+
+# Update runtime configuration
+wrapper.update_stream_params(
+    image_preprocessing_config=image_preprocessing_config
+)
+```
+
+## Best Practices
+
+### Development
+1. **Inherit from appropriate base class**: Use `PipelineAwareProcessor` only when you need previous pipeline state
+2. **Implement metadata**: Provide comprehensive metadata for UI integration
+3. **Handle parameters**: Support configuration via `self.params`
+4. **GPU optimization**: Override `_process_tensor_core()` for better performance
+
+### Deployment  
+1. **Use TensorRT variants**: For production/latency-critical applications
+2. **Chain via orchestrators**: For parallel processing of multiple processors
+3. **Monitor dependencies**: Check processor availability before use
+4. **Parameter tuning**: Use metadata to understand parameter ranges and defaults
+
+### Integration
+1. **Registry pattern**: Always use registry functions rather than direct imports
+2. **Error handling**: Check processor availability with `list_processors()`
+3. **Configuration**: Use structured configs for reproducible setups
+4. **Testing**: Test with `skip_diffusion=True` for fast iteration
+
+For orchestration and chaining patterns, see [Orchestrators](orchestrators.md).
\ No newline at end of file
diff --git a/src/streamdiffusion/docs/runtime_control.md b/src/streamdiffusion/docs/runtime_control.md
new file mode 100644
index 00000000..10b2514a
--- /dev/null
+++ b/src/streamdiffusion/docs/runtime_control.md
@@ -0,0 +1,584 @@
+# Runtime Control Surface
+
+## Overview
+
+The runtime control surface provides real-time control over the StreamDiffusion pipeline during live streaming. This document focuses on the methods and parameters that users will actively use to control the application while it's running.
+
+## Core Runtime Control Methods
+
+### 1. update_stream_params()
+
+The primary method for runtime parameter updates. All parameters are optional and only update what's specified.
+
+```python
+def update_stream_params(
+    # Core generation parameters
+    num_inference_steps: Optional[int] = None,
+    guidance_scale: Optional[float] = None,
+    delta: Optional[float] = None,
+    t_index_list: Optional[List[int]] = None,
+    seed: Optional[int] = None,
+    
+    # Prompt blending (real-time prompt control)
+    prompt_list: Optional[List[Tuple[str, float]]] = None,
+    negative_prompt: Optional[str] = None,
+    prompt_interpolation_method: Literal["linear", "slerp"] = "slerp",
+    normalize_prompt_weights: Optional[bool] = None,
+    
+    # Seed blending (real-time seed control)
+    seed_list: Optional[List[Tuple[int, float]]] = None,
+    seed_interpolation_method: Literal["linear", "slerp"] = "linear",
+    normalize_seed_weights: Optional[bool] = None,
+    
+    # ControlNet configuration (real-time control image updates)
+    controlnet_config: Optional[List[Dict[str, Any]]] = None,
+    
+    # IPAdapter configuration (real-time style updates)
+    ipadapter_config: Optional[Dict[str, Any]] = None,
+    
+    # Pipeline hook configurations (real-time processing updates)
+    image_preprocessing_config: Optional[List[Dict[str, Any]]] = None,
+    image_postprocessing_config: Optional[List[Dict[str, Any]]] = None,
+    latent_preprocessing_config: Optional[List[Dict[str, Any]]] = None,
+    latent_postprocessing_config: Optional[List[Dict[str, Any]]] = None,
+    
+    # Safety checker
+    use_safety_checker: Optional[bool] = None,
+    safety_checker_threshold: Optional[float] = None,
+) -> None:
+```
+
+**Real-time Usage Examples:**
+```python
+# Adjust generation strength
+wrapper.update_stream_params(guidance_scale=8.0)
+
+# Switch to different prompt blend
+wrapper.update_stream_params(
+    prompt_list=[("cat", 0.3), ("dog", 0.7)],
+    negative_prompt="blurry, low quality"
+)
+
+# Change seed blend
+wrapper.update_stream_params(
+    seed_list=[(123, 0.2), (456, 0.8)]
+)
+
+# Update ControlNet configuration
+wrapper.update_stream_params(
+    controlnet_config=[{
+        "model_id": "lllyasviel/sd-controlnet-canny",
+        "preprocessor": "canny",
+        "conditioning_scale": 1.2,
+        "enabled": True
+    }]
+)
+```
+
+### 2. update_control_image()
+
+Update control images for real-time ControlNet control.
+
+```python
+def update_control_image(
+    index: int,                                    # ControlNet index (0, 1, 2, etc.)
+    image: Union[str, Image.Image, torch.Tensor]  # New control image
+) -> None:
+```
+
+**Real-time Usage:**
+```python
+# Update ControlNet 0 with new Canny edges
+wrapper.update_control_image(0, "new_edges.jpg")
+
+# Update ControlNet 1 with new depth map
+wrapper.update_control_image(1, depth_image_tensor)
+
+# Update from camera feed
+wrapper.update_control_image(0, camera_frame)
+```
+
+### 3. update_style_image()
+
+Update IPAdapter style reference for real-time style control.
+
+```python
+def update_style_image(
+    image: Union[str, Image.Image, torch.Tensor]  # New style image
+) -> None:
+```
+
+**Real-time Usage:**
+```python
+# Update style reference
+wrapper.update_style_image("new_style.jpg")
+
+# Update from live style feed
+wrapper.update_style_image(style_camera_feed)
+```
+
+### 4. skip_diffusion
+
+Property to enable/disable diffusion passthrough for real-time testing.
+
+```python
+# Enable passthrough mode (no diffusion, just preprocessing)
+wrapper.skip_diffusion = True
+
+# Re-enable normal diffusion
+wrapper.skip_diffusion = False
+```
+
+**Use Cases:**
+- Test preprocessing pipelines without diffusion overhead
+- Debug control image processing
+- Real-time control image updates without generation delay
+
+## Live Stream Control Patterns
+
+### Prompt Blending Control
+
+Real-time prompt weight adjustment for smooth transitions:
+
+```python
+# Gradual transition from cat to dog
+wrapper.update_stream_params(
+    prompt_list=[("cat", 0.9), ("dog", 0.1)]  # Start mostly cat
+)
+# ... later ...
+wrapper.update_stream_params(
+    prompt_list=[("cat", 0.5), ("dog", 0.5)]  # Equal blend
+)
+# ... later ...
+wrapper.update_stream_params(
+    prompt_list=[("cat", 0.1), ("dog", 0.9)]  # Mostly dog
+)
+```
+
+### Seed Blending Control
+
+Real-time seed weight adjustment for noise variation:
+
+```python
+# Smooth noise transition
+wrapper.update_stream_params(
+    seed_list=[(123, 0.8), (456, 0.2)]  # Start with seed 123
+)
+# ... later ...
+wrapper.update_stream_params(
+    seed_list=[(123, 0.2), (456, 0.8)]  # Transition to seed 456
+)
+```
+
+### ControlNet Real-time Updates
+
+Update control images from live sources:
+
+```python
+# Camera-based control
+def update_from_camera():
+    frame = camera.get_frame()
+    processed_frame = preprocess_canny(frame)
+    wrapper.update_control_image(0, processed_frame)
+
+# Multiple ControlNet updates
+def update_multiple_controls():
+    wrapper.update_control_image(0, canny_image)    # Canny control
+    wrapper.update_control_image(1, depth_image)    # Depth control
+    wrapper.update_control_image(2, pose_image)     # Pose control
+```
+
+### Style Reference Updates
+
+Real-time style adaptation:
+
+```python
+# Update style from live feed
+def update_style_from_feed():
+    style_frame = style_camera.get_frame()
+    wrapper.update_style_image(style_frame)
+
+# Switch between style references
+def switch_style(style_path):
+    wrapper.update_style_image(style_path)
+```
+
+## Runtime State Management
+
+### 5. get_stream_state()
+
+Get current stream state for monitoring and debugging.
+
+```python
+def get_stream_state(
+    include_caches: bool = False  # Include cache statistics
+) -> Dict[str, Any]:
+```
+
+**Real-time Monitoring:**
+```python
+# Monitor current state
+state = wrapper.get_stream_state(include_caches=True)
+print(f"Current prompts: {state['prompt_list']}")
+print(f"Guidance scale: {state['guidance_scale']}")
+print(f"Active ControlNets: {len(state['controlnet_config'])}")
+
+# Monitor memory usage
+if state['caches']['prompt_cache_size'] > 1000:
+    wrapper.clear_caches()
+```
+
+### 6. clear_caches()
+
+Clear caches to free memory during long-running sessions.
+
+```python
+def clear_caches() -> None:
+```
+
+**Memory Management:**
+```python
+# Clear caches when switching to very different prompts
+if prompt_change_detected:
+    wrapper.clear_caches()
+    wrapper.update_stream_params(prompt_list=new_prompts)
+```
+
+## Real-time Control Examples
+
+### Interactive Prompt Control
+
+```python
+# Real-time prompt weight adjustment via UI sliders
+def on_prompt_weight_change(prompt_index, new_weight):
+    current_prompts = wrapper.get_stream_state()['prompt_list']
+    current_prompts[prompt_index] = (current_prompts[prompt_index][0], new_weight)
+    wrapper.update_stream_params(prompt_list=current_prompts)
+
+# Real-time negative prompt updates
+def on_negative_prompt_change(new_negative):
+    wrapper.update_stream_params(negative_prompt=new_negative)
+```
+
+### Live Camera Control
+
+```python
+# Real-time camera control
+def live_camera_control():
+    while True:
+        frame = camera.get_frame()
+        
+        # Process frame for ControlNet
+        canny_frame = preprocess_canny(frame)
+        wrapper.update_control_image(0, canny_frame)
+        
+        # Generate with current settings
+        result = wrapper(frame)
+        
+        # Display result
+        display_image(result)
+```
+
+### Dynamic Style Switching
+
+```python
+# Real-time style switching
+def switch_style_dynamically(style_images):
+    for style_image in style_images:
+        wrapper.update_style_image(style_image)
+        time.sleep(2.0)  # Hold style for 2 seconds
+```
+
+### Parameter Smoothing
+
+```python
+# Smooth parameter transitions
+def smooth_guidance_transition(target_scale, steps=10):
+    current_scale = wrapper.get_stream_state()['guidance_scale']
+    step_size = (target_scale - current_scale) / steps
+    
+    for i in range(steps):
+        new_scale = current_scale + (step_size * (i + 1))
+        wrapper.update_stream_params(guidance_scale=new_scale)
+        time.sleep(0.1)  # 100ms between updates
+```
+
+## Performance Considerations
+
+### Update Frequency
+- **Control Images**: Update as fast as source (30-60 FPS)
+- **Prompts/Seeds**: Update less frequently (1-10 Hz)
+- **Core Parameters**: Update sparingly (0.1-1 Hz)
+
+### Memory Management
+- Clear caches when switching between very different prompts
+- Monitor memory usage with `get_stream_state(include_caches=True)`
+- Use `skip_diffusion` for testing without memory overhead
+
+### Error Handling
+```python
+# Safe control image updates
+def safe_update_control_image(index, image):
+    try:
+        wrapper.update_control_image(index, image)
+    except RuntimeError as e:
+        print(f"ControlNet not enabled: {e}")
+    except Exception as e:
+        print(f"Failed to update control image: {e}")
+
+# Safe parameter updates
+def safe_update_params(**kwargs):
+    try:
+        wrapper.update_stream_params(**kwargs)
+    except Exception as e:
+        print(f"Failed to update parameters: {e}")
+```
+
+## Configuration Reference
+
+### ControlNet Configuration
+
+ControlNet configuration for real-time conditional guidance:
+
+```python
+controlnet_config = [
+    {
+        "model_id": "path/to/controlnet/model",         # Required: ControlNet model ID or path
+        "preprocessor": "preprocessor_name",            # Optional: Preprocessor type from registry
+        "conditioning_scale": 1.0,                      # Required: Influence strength (0.0-2.0)
+        "enabled": True,                                # Optional: Enable/disable (default: True)
+        "preprocessor_params": {                        # Optional: Preprocessor-specific parameters
+            "param1": "value1",
+            "param2": "value2"
+        }
+    }
+]
+```
+
+**Configuration Fields:**
+- `model_id`: HuggingFace model ID or local path to ControlNet model
+- `preprocessor`: Preprocessor name from the processor registry (see [Processors](preprocessing/processors.md))
+- `conditioning_scale`: Strength of ControlNet influence on generation
+- `enabled`: Whether this ControlNet is active
+- `preprocessor_params`: Parameters specific to the chosen preprocessor
+
+### IPAdapter Configuration
+
+IPAdapter configuration for style and reference adaptation:
+
+```python
+ipadapter_config = {
+    "ipadapter_model_path": "path/to/ipadapter/model",          # Required: IPAdapter model path
+    "image_encoder_path": "path/to/image/encoder",              # Required: Image encoder path
+    "scale": 0.8,                                               # Required: Influence strength (0.0-1.0)
+    "type": "regular",                                          # Optional: IPAdapter variant type
+    "is_faceid": False,                                         # Optional: Face ID mode (default: False)
+    "style_image": "path/to/style.jpg"                          # Optional: Default style reference
+}
+```
+
+**Configuration Fields:**
+- `ipadapter_model_path`: HuggingFace model ID or local path to IPAdapter model
+- `image_encoder_path`: HuggingFace model ID or local path to CLIP image encoder
+- `scale`: Strength of IPAdapter influence on generation
+- `type`: IPAdapter variant (implementation-dependent)
+- `is_faceid`: Enable Face ID mode for face-specific adaptation
+- `style_image`: Default style reference image path
+
+### Image Preprocessing Configuration
+
+Configuration for image domain preprocessing hooks:
+
+```python
+image_preprocessing_config = [
+    {
+        "type": "processor_name",                           # Required: Processor type from registry
+        "enabled": True,                                    # Optional: Enable/disable (default: True)
+        "order": 0,                                         # Optional: Execution order (default: 0)
+        "params": {                                         # Optional: Processor-specific parameters
+            "param1": "value1",
+            "param2": "value2"
+        }
+    }
+]
+```
+
+**Configuration Fields:**
+- `type`: Processor name from the processor registry (see [Processors](preprocessing/processors.md))
+- `enabled`: Whether this processor is active
+- `order`: Execution order in the processing chain (lower numbers execute first)
+- `params`: Parameters specific to the chosen processor
+
+### Image Postprocessing Configuration
+
+Configuration for image domain postprocessing hooks:
+
+```python
+image_postprocessing_config = [
+    {
+        "type": "processor_name",                           # Required: Processor type from registry
+        "enabled": True,                                    # Optional: Enable/disable (default: True)
+        "order": 0,                                         # Optional: Execution order (default: 0)
+        "params": {                                         # Optional: Processor-specific parameters
+            "param1": "value1",
+            "param2": "value2"
+        }
+    }
+]
+```
+
+**Configuration Fields:**
+- `type`: Processor name from the processor registry (see [Processors](preprocessing/processors.md))
+- `enabled`: Whether this processor is active
+- `order`: Execution order in the processing chain (lower numbers execute first)
+- `params`: Parameters specific to the chosen processor
+
+### Latent Preprocessing Configuration
+
+Configuration for latent domain preprocessing hooks:
+
+```python
+latent_preprocessing_config = [
+    {
+        "type": "processor_name",                           # Required: Processor type from registry
+        "enabled": True,                                    # Optional: Enable/disable (default: True)
+        "order": 0,                                         # Optional: Execution order (default: 0)
+        "params": {                                         # Optional: Processor-specific parameters
+            "param1": "value1",
+            "param2": "value2"
+        }
+    }
+]
+```
+
+**Configuration Fields:**
+- `type`: Processor name from the processor registry (see [Processors](preprocessing/processors.md))
+- `enabled`: Whether this processor is active
+- `order`: Execution order in the processing chain (lower numbers execute first)
+- `params`: Parameters specific to the chosen processor
+
+### Latent Postprocessing Configuration
+
+Configuration for latent domain postprocessing hooks:
+
+```python
+latent_postprocessing_config = [
+    {
+        "type": "processor_name",                           # Required: Processor type from registry
+        "enabled": True,                                    # Optional: Enable/disable (default: True)
+        "order": 0,                                         # Optional: Execution order (default: 0)
+        "params": {                                         # Optional: Processor-specific parameters
+            "param1": "value1",
+            "param2": "value2"
+        }
+    }
+]
+```
+
+**Configuration Fields:**
+- `type`: Processor name from the processor registry (see [Processors](preprocessing/processors.md))
+- `enabled`: Whether this processor is active
+- `order`: Execution order in the processing chain (lower numbers execute first)
+- `params`: Parameters specific to the chosen processor
+
+## Configuration Examples
+
+### Basic Real-time Setup
+
+```python
+# Initialize with basic configs
+wrapper = StreamDiffusionWrapper(
+    model_id_or_path="runwayml/stable-diffusion-v1-5",
+    t_index_list=[32, 45],
+    use_controlnet=True,
+    controlnet_config=[{
+        "model_id": "lllyasviel/sd-controlnet-canny",
+        "preprocessor": "canny",
+        "conditioning_scale": 1.0,
+        "enabled": True
+    }]
+)
+
+# Runtime config updates
+wrapper.update_stream_params(
+    controlnet_config=[{
+        "model_id": "lllyasviel/sd-controlnet-depth",
+        "preprocessor": "depth",
+        "conditioning_scale": 0.8,
+        "enabled": True
+    }]
+)
+```
+
+### Advanced Multi-Module Setup
+
+```python
+# Complex runtime configuration
+wrapper.update_stream_params(
+    # ControlNet configuration
+    controlnet_config=[
+        {
+            "model_id": "lllyasviel/sd-controlnet-canny",
+            "preprocessor": "canny",
+            "conditioning_scale": 1.0,
+            "enabled": True,
+            "preprocessor_params": {
+                "threshold_low": 100,
+                "threshold_high": 200
+            }
+        },
+        {
+            "model_id": "lllyasviel/sd-controlnet-pose",
+            "preprocessor": "pose",
+            "conditioning_scale": 0.7,
+            "enabled": True
+        }
+    ],
+    
+    # IPAdapter configuration
+    ipadapter_config={
+        "ipadapter_model_path": "h94/IP-Adapter",
+        "image_encoder_path": "openai/clip-vit-large-patch14",
+        "scale": 0.8,
+        "type": "plus",
+        "is_faceid": False
+    },
+    
+    # Image preprocessing
+    image_preprocessing_config=[
+        {
+            "type": "resize",
+            "enabled": True,
+            "order": 0,
+            "params": {"width": 512, "height": 512}
+        },
+        {
+            "type": "normalize",
+            "enabled": True,
+            "order": 1,
+            "params": {"mean": [0.5], "std": [0.5]}
+        }
+    ],
+    
+    # Latent preprocessing
+    latent_preprocessing_config=[
+        {
+            "type": "latent_feedback",
+            "enabled": True,
+            "order": 0,
+            "params": {"blend_factor": 0.1}
+        }
+    ]
+)
+```
+
+## Best Practices
+
+1. **Batch Updates**: Use `update_stream_params()` to update multiple parameters at once
+2. **Error Handling**: Always wrap control calls in try-catch blocks
+3. **Performance**: Use `skip_diffusion` for testing and debugging
+4. **Memory**: Monitor and clear caches during long sessions
+5. **Smooth Transitions**: Implement gradual parameter changes for smooth effects
+6. **State Monitoring**: Use `get_stream_state()` for debugging and monitoring
+7. **Config Validation**: Test configurations with `skip_diffusion=True` first
+8. **Processor Ordering**: Use `order` parameter to control execution sequence
diff --git a/src/streamdiffusion/docs/stream_parameter_updater.md b/src/streamdiffusion/docs/stream_parameter_updater.md
new file mode 100644
index 00000000..ede28ad5
--- /dev/null
+++ b/src/streamdiffusion/docs/stream_parameter_updater.md
@@ -0,0 +1,145 @@
+# StreamParameterUpdater
+
+## Overview
+
+The `StreamParameterUpdater` ([`stream_parameter_updater.py`](../../../stream_parameter_updater.py)) manages dynamic runtime updates to streaming parameters in StreamDiffusion, enabling smooth transitions between prompts, seeds, ControlNets, IPAdapters, and hooks without restarting the pipeline. It uses caching for efficiency, blending (linear/SLERP) for multi-item interpolation, and thread-safe locks for realtime updates. As an [OrchestratorUser](../preprocessing/orchestrators.md#orchestratoruser), it attaches shared preprocessors.
+
+Key features:
+- **Prompt Blending**: Weighted multi-prompt embeds (cache hits for reuse, SLERP/linear).
+- **Seed Blending**: Weighted noise interpolation (linear/SLERP, preserves magnitude).
+- **Config Updates**: Diff-based changes to ControlNet/IPAdapter/hook setups (add/remove/update scales/enabled/params).
+- **Embedding Caching**: IPAdapter style images preprocessed in parallel (sync/pipelined).
+- **Timestep/Resolution**: Recalcs scalings/batches on changes (full or lightweight).
+- **Normalization**: Optional weight sum-to-1 for prompts/seeds.
+
+Updater is initialized in `StreamDiffusion` and called via `update_stream_params()` for batched updates.
+
+## Usage
+
+### Initialization
+
+```python
+from streamdiffusion import StreamDiffusion
+stream = StreamDiffusion(...)
+# Updater auto-init with stream.normalize_prompt_weights etc.
+updater = stream._param_updater  # Private, use via stream.update_stream_params
+```
+
+### Prompt/Seed Blending
+
+Multi-prompt/seed with weights:
+
+```python
+# Blended prompts
+stream.update_stream_params(
+    prompt_list=[("A cat", 0.7), ("A dog", 0.3)],
+    prompt_interpolation_method="slerp",
+    negative_prompt="blurry, low quality"
+)
+
+# Blended seeds
+stream.update_stream_params(
+    seed_list=[(123, 0.6), (456, 0.4)],
+    seed_interpolation_method="linear"
+)
+```
+
+### ControlNet/IPAdapter Updates
+
+```python
+# Update ControlNet configuration
+stream.update_stream_params(
+    controlnet_config=[
+        {
+            "model_id": "lllyasviel/sd-controlnet-canny",
+            "preprocessor": "canny",
+            "conditioning_scale": 0.8,
+            "enabled": True
+        }
+    ]
+)
+
+# Update IPAdapter configuration
+stream.update_stream_params(
+    ipadapter_config={
+        "ipadapter_model_path": "h94/IP-Adapter",
+        "image_encoder_path": "openai/clip-vit-large-patch14",
+        "scale": 0.7,
+        "is_faceid": False
+    }
+)
+```
+
+### Hook Configuration Updates
+
+```python
+# Update preprocessing hooks
+stream.update_stream_params(
+    image_preprocessing_config=[
+        {
+            "type": "canny",
+            "enabled": True,
+            "params": {"threshold_low": 100, "threshold_high": 200}
+        }
+    ],
+    latent_preprocessing_config=[
+        {
+            "type": "latent_feedback",
+            "enabled": True,
+            "params": {"blend_factor": 0.1}
+        }
+    ]
+)
+```
+
+### Batch Updates
+
+```python
+# Update multiple parameters at once
+stream.update_stream_params(
+    guidance_scale=7.5,
+    t_index_list=[0, 999],
+    prompt_list=[("A beautiful landscape", 1.0)],
+    controlnet_config=[...],
+    image_preprocessing_config=[...]
+)
+```
+
+## Advanced Features
+
+### Weight Normalization
+
+```python
+# Enable automatic weight normalization
+stream.update_stream_params(
+    normalize_prompt_weights=True,
+    normalize_seed_weights=True
+)
+```
+
+### Cache Management
+
+```python
+# Get cache statistics
+cache_info = stream._param_updater.get_cache_info()
+print(f"Prompt cache hits: {cache_info['prompt_cache']['hits']}")
+print(f"Seed cache misses: {cache_info['seed_cache']['misses']}")
+```
+
+### Thread Safety
+
+All parameter updates are thread-safe and atomic. Multiple threads can call `update_stream_params()` simultaneously without race conditions.
+
+## Integration
+
+The updater integrates with:
+- **Pipeline Hooks**: For real-time parameter application
+- **Orchestrators**: For preprocessing/postprocessing updates  
+- **Modules**: ControlNet and IPAdapter configuration management
+- **Caching**: Efficient reuse of computed embeddings and noise
+
+For detailed hook integration, see [Hook-Module System](hooks.md).
+
+---
+
+*See [Index](index.md) for all documentation.*
\ No newline at end of file
diff --git a/src/streamdiffusion/preprocessing/processors/dinov3 b/src/streamdiffusion/preprocessing/processors/dinov3
new file mode 160000
index 00000000..a3a8b2f1
--- /dev/null
+++ b/src/streamdiffusion/preprocessing/processors/dinov3
@@ -0,0 +1 @@
+Subproject commit a3a8b2f1db6a2544dfc8b376fa23df459b9f7843

From 2fc0a097eb1913c9155dd9d972f7b9e951688ff7 Mon Sep 17 00:00:00 2001
From: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>
Date: Fri, 12 Sep 2025 10:38:19 -0400
Subject: [PATCH 2/6] add runtime control subgraph to overall arch

---
 .../docs/diagrams/overall_architecture.md          | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/src/streamdiffusion/docs/diagrams/overall_architecture.md b/src/streamdiffusion/docs/diagrams/overall_architecture.md
index 08757b45..5d157358 100644
--- a/src/streamdiffusion/docs/diagrams/overall_architecture.md
+++ b/src/streamdiffusion/docs/diagrams/overall_architecture.md
@@ -28,6 +28,13 @@ graph TB
         K["Output: Image"]
     end
     
+    subgraph "Runtime Control"
+        L1["StreamDiffusionWrapper"]
+        L2["update_stream_params()"]
+        L3["update_control_image()"]
+        L4["update_style_image()"]
+    end
+    
     subgraph "Management"
         L["StreamParameterUpdater: Blending/Caching"]
         M["Config Loader: YAML/JSON"]
@@ -49,6 +56,13 @@ graph TB
     I --> J
     J --> K
     
+    L1 --> L2
+    L1 --> L3
+    L1 --> L4
+    L2 -.->|"Runtime Updates"| L
+    L3 -.->|"via Orchestrators"| B
+    L4 -.->|"via Orchestrators"| B
+    
     L -.->|"Updates"| E
     L -.->|"Updates"| F
     M -.->|"Setup"| B

From 1e67d2e3f2cb009ea997d1e79d4661a9da45bd4f Mon Sep 17 00:00:00 2001
From: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>
Date: Fri, 12 Sep 2025 11:05:39 -0400
Subject: [PATCH 3/6] parameter updating, reflect sequential processing more
 accurately

---
 .../docs/diagrams/parameter_updating.md       | 83 ++++++++++---------
 1 file changed, 46 insertions(+), 37 deletions(-)

diff --git a/src/streamdiffusion/docs/diagrams/parameter_updating.md b/src/streamdiffusion/docs/diagrams/parameter_updating.md
index d9da38e5..7de10cf3 100644
--- a/src/streamdiffusion/docs/diagrams/parameter_updating.md
+++ b/src/streamdiffusion/docs/diagrams/parameter_updating.md
@@ -7,50 +7,59 @@ graph TD
         A --> B["Thread Lock: _update_lock"]
     end
     
-    subgraph "Parameter Branches"
-        B --> C{"Prompt List Provided?"}
-        C -->|Yes| D["_cache_prompt_embeddings: Cache/Encode Prompts"]
-        C -->|No| E{"Seed List Provided?"}
-        E -->|Yes| F["_cache_seed_noise: Cache/Generate Noise"]
-        E -->|No| G{"ControlNet Config Provided?"}
-        G -->|Yes| H["Diff Current vs Desired: Add/Remove/Update Scales/Enabled"]
-        H --> I["Update ControlNet Pipeline: reorder/add/remove/update_scale"]
-        G -->|No| J{"IPAdapter Config Provided?"}
-        J -->|Yes| K["Update Scale: Uniform or Per-Layer Vector"]
-        K --> L["Set Weight Type: Linear/SLERP for Layers/Steps"]
-        J -->|No| M{"Hook Config Provided? e.g., Image/Latent Pre/Post"}
-        M -->|Yes| N["Diff Current vs Desired: Modify/Add/Remove Processors In-Place"]
-        N --> O["Update Processor Params/Enabled/Order"]
-        M -->|No| P["Update Timestep/Resolution: Recalc Scalings/Batches"]
+    subgraph "Parallel Parameter Processing"
+        B --> C1["Core Params: steps/guidance/delta/seed"]
+        B --> C2["Prompt List Processing"]
+        B --> C3["Seed List Processing"]
+        B --> C4["ControlNet Config Processing"]
+        B --> C5["IPAdapter Config Processing"]
+        B --> C6["Hook Config Processing"]
+        B --> C7["Timestep/Resolution Updates"]
+        
+        C1 --> D1["Update scheduler/guidance/delta/base_seed"]
+        C2 --> D2["_update_blended_prompts: Cache/Encode/Blend"]
+        C3 --> D3["_update_blended_seeds: Cache/Generate/Blend"]
+        C4 --> D4["_update_controlnet_config: Diff/Add/Remove/Scale"]
+        C5 --> D5["_update_ipadapter_config: Scale/Weight Type"]
+        C6 --> D6["_update_hook_config: Processors/Params/Order"]
+        C7 --> D7["_recalculate_timestep_dependent_params"]
     end
     
-    subgraph "Blending & Caching Layer"
-        D --> Q["_apply_prompt_blending: Linear/SLERP"]
-        F --> R["_apply_seed_blending: Linear/SLERP"]
-        I --> S["Cache Stats: Hits/Misses for Monitoring"]
-        L --> S
-        O --> S
-        P --> S
-        Q --> T["Update Pipeline Tensors: prompt_embeds/init_noise"]
-        R --> T
-        S --> T
+    subgraph "Blending & Caching Operations"
+        D2 --> E1["_cache_prompt_embeddings"]
+        D2 --> E2["_apply_prompt_blending: Linear/SLERP"]
+        D3 --> E3["_cache_seed_noise"]
+        D3 --> E4["_apply_seed_blending: Linear/SLERP"]
+        D4 --> E5["Cache Stats: ControlNet Operations"]
+        D5 --> E6["Cache Stats: IPAdapter Operations"]
+        D6 --> E7["Cache Stats: Hook Operations"]
+        
+        E1 --> F["Update Pipeline State"]
+        E2 --> F
+        E3 --> F
+        E4 --> F
+        E5 --> F
+        E6 --> F
+        E7 --> F
+        D1 --> F
+        D7 --> F
     end
     
     subgraph "Pipeline Integration"
-        T --> U["Pipeline Uses Updated Tensors/Hooks"]
+        F --> G["Pipeline Uses Updated Tensors/Hooks/Configs"]
     end
     
     subgraph "Shared Utilities"
-        V["Normalize Weights: Sum to 1.0 (Optional)"]
-        W["Thread-Safe Lock: Prevent Race Conditions"]
-        X["Cache Reindexing: Handle Add/Remove"]
+        H1["Normalize Weights: Sum to 1.0 (Optional)"]
+        H2["Thread-Safe Lock: Prevent Race Conditions"]
+        H3["Cache Reindexing: Handle Add/Remove"]
     end
     
-    C -.->|"Use"| V
-    E -.->|"Use"| V
-    B -.->|"Protect"| W
-    D -.->|"Use"| X
-    F -.->|"Use"| X
-    H -.->|"Use"| X
-    J -.->|"Use"| X
-    M -.->|"Use"| X
\ No newline at end of file
+    B -.->|"Protect All Operations"| H2
+    D2 -.->|"Use"| H1
+    D3 -.->|"Use"| H1
+    E1 -.->|"Use"| H3
+    E3 -.->|"Use"| H3
+    D4 -.->|"Use"| H3
+    D5 -.->|"Use"| H3
+    D6 -.->|"Use"| H3
\ No newline at end of file

From 84c38f77cf066038b4abefb55ed745e6f74d298c Mon Sep 17 00:00:00 2001
From: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>
Date: Fri, 12 Sep 2025 11:57:29 -0400
Subject: [PATCH 4/6] fix scope of hooks integration diagram

---
 .../docs/diagrams/hooks_integration.md          | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/src/streamdiffusion/docs/diagrams/hooks_integration.md b/src/streamdiffusion/docs/diagrams/hooks_integration.md
index 9386eafd..6586d2ca 100644
--- a/src/streamdiffusion/docs/diagrams/hooks_integration.md
+++ b/src/streamdiffusion/docs/diagrams/hooks_integration.md
@@ -2,12 +2,15 @@
 
 ```mermaid
 graph LR
-    A[Pipeline Stages] --> B[Embedding Hooks: Prompt Blending]
-    B --> C[UNet Hooks: ControlNet/IPAdapter]
-    C --> D[Orchestrator Calls: Processors]
-    D --> E[Latent/Image Hooks: Pre/Post Processing]
+    A[Image Preprocessing Hooks] --> B[Latent Preprocessing Hooks]
+    B --> C[UNet Hooks: e.g., ControlNet/IPAdapter]
+    C --> D[Latent Postprocessing Hooks]
+    D --> E[Image Postprocessing Hooks]
     
-    F[StreamParameterUpdater] -.->|Update Configs| C
-    G[Config] -->|Register Hooks| B
+    F[Embedding Hooks: Custom Embedding Mods] -.->|Before UNet| C
+    G[Config] -->|Register Hooks| A
+    G -->|Register Hooks| B
     G -->|Register Hooks| C
-    G -->|Register Hooks| E
\ No newline at end of file
+    G -->|Register Hooks| D
+    G -->|Register Hooks| E
+    G -->|Register Hooks| F
\ No newline at end of file

From 003f9ba5b592a299dfde3f3eed364e430a5a84fd Mon Sep 17 00:00:00 2001
From: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>
Date: Fri, 12 Sep 2025 12:28:14 -0400
Subject: [PATCH 5/6] simplify orchestrator diagram, add parallelism
 specification

---
 .../docs/diagrams/orchestrator_flow.md        | 161 ++++++++++++------
 1 file changed, 106 insertions(+), 55 deletions(-)

diff --git a/src/streamdiffusion/docs/diagrams/orchestrator_flow.md b/src/streamdiffusion/docs/diagrams/orchestrator_flow.md
index c412cd0b..e026da00 100644
--- a/src/streamdiffusion/docs/diagrams/orchestrator_flow.md
+++ b/src/streamdiffusion/docs/diagrams/orchestrator_flow.md
@@ -2,71 +2,122 @@
 
 ```mermaid
 graph TB
-    subgraph "Input Layer - Distinct Preprocessing Types"
-        A["ControlNet/IPAdapter Inputs: Raw Images for Module Preprocessing"]
-        B["Pipeline Hooks: Latent/Image Tensors for Hook Stages"]
-        C["Postprocessing: VAE Output Images for Enhancement"]
+    subgraph "Input Sources"
+        A["Raw Images<br/>(ControlNet/IPAdapter)"]
+        B["Pipeline Tensors<br/>(Hook Stages)"]
+        C["Generated Images<br/>(VAE Output)"]
     end
     
-    subgraph "PreprocessingOrchestrator (ControlNet/IPAdapter - Intraframe Parallelism)"
-        D["Raw Images: Multiple ControlNets/IPAdapters"]
-        E["Group by Processor Type: e.g., All Canny Processors Grouped"]
-        F["Intraframe Parallel: ThreadPoolExecutor per Group"]
-        F --> G["Process Group in Parallel: e.g., Canny for CN1 and CN2 Simultaneously"]
-        G --> H["Merge/Broadcast Group Results to Specific Modules e.g. Canny to CN1 and CN2"]
-        I["Intraframe Sequential: Unique Processors Single Thread"]
-        H --> J["Cache by Type: Reuse Across Modules/Frames"]
+    subgraph "PreprocessingOrchestrator"
+        D["Group Similar Processors"]
+        E["Parallel Processing<br/>(Multiple ControlNets)"]
+        F["Cache Results<br/>(Reuse Across Frames)"]
+        D --> E --> F
+    end
+    
+    subgraph "PipelinePreprocessingOrchestrator"
+        G["Sequential Chain<br/>(Ordered Dependencies)"]
+        H["Process Each Stage<br/>(Latent Modifications)"]
+        G --> H
+    end
+    
+    subgraph "PostprocessingOrchestrator"
+        I["Cache Check<br/>(Identical Inputs)"]
+        J["Sequential Enhancement<br/>(Upscale → Sharpen)"]
         I --> J
-        J --> K["Output Distinct Tensors for Each ControlNet/IPAdapter"]
     end
     
-    subgraph "PipelinePreprocessingOrchestrator (Hook Stages - Sequential Chain)"
-        L["Latent/Image Tensors from Pipeline Hooks"]
-        M["Sequential Chain: _execute_pipeline_chain"]
-        M --> N["Single Processor Application: e.g., Latent Feedback Sequential"]
-        N --> O["Next Processor in Order (order attr)"]
-        O --> P["Chain Continues: No Parallelism Within Chain"]
-        P --> M
-        Q["Output Processed Tensor to Next Pipeline Hook/Stage"]
+    subgraph "BaseOrchestrator (Foundation)"
+        K{"Feedback Required?"}
+        L["Sync Processing<br/>(Immediate)"]
+        M["Pipelined Processing<br/>(Background Thread)"]
+        K -->|Yes| L
+        K -->|No| M
+    end
+    
+    subgraph "Integration"
+        N["OrchestratorUser<br/>(Shared Instances)"]
+        O["StreamParameterUpdater<br/>(Runtime Updates)"]
     end
     
-    subgraph "PostprocessingOrchestrator (Output - Cached Sequential)"
-        R["VAE Decoded Images"]
-        S["Sequential with Cache Check: _apply_single_postprocessor"]
-        S --> T{"Cache Hit for Identical Input?"}
-        T -->|Yes| U["Reuse Cached: e.g., Same Upscale Params"]
-        T -->|No| V["Process Sequential: Realesrgan_trt then Sharpen"]
-        U --> W["Output Enhanced Image"]
-        V --> W
+    A --> D
+    B --> G
+    C --> I
+    
+    F --> K
+    H --> K
+    J --> K
+    
+    L --> P["Output"]
+    M --> P
+    
+    N -.->|"Manages"| D
+    N -.->|"Manages"| G
+    N -.->|"Manages"| I
+    
+    O -.->|"Updates"| D
+    O -.->|"Updates"| G
+    O -.->|"Updates"| I
+```
+
+## Frame Lifecycle & Parallelism
+
+The orchestrators enable real-time performance through both **intraframe** and **interframe** parallelism:
+
+### Temporal Pipeline
+Frame lifecycle: `{[Preprocess N+1] || Diffuse N || [Postprocess N-1]}`
+- `{}` = interframe sequencing
+- `[]` = intraframe parallelism  
+- `||` = concurrent execution across temporal stages
+
+```mermaid
+gantt
+    title Frame Pipeline: Concurrent Temporal Stages
+    dateFormat X
+    axisFormat %s
+    
+    section Frame N-1
+    Preprocessing N-1    :done, prep-n1, 0, 1s
+    Diffusion N-1       :done, diff-n1, 1, 2s
+    Postprocessing N-1  :active, post-n1, 2, 3s
+    
+    section Frame N
+    Preprocessing N     :done, prep-n, 1, 2s
+    Diffusion N        :active, diff-n, 2, 3s
+    Postprocessing N   :post-n, 3, 4s
+    
+    section Frame N+1
+    Preprocessing N+1  :active, prep-n1-next, 2, 3s
+    Diffusion N+1     :diff-n1-next, 3, 4s
+    Postprocessing N+1 :post-n1-next, 4, 5s
+```
+
+### Parallelism Types
+
+```mermaid
+graph TB
+    subgraph "Intraframe Parallelism (Within Single Frame)"
+        A1["Depth Detection"]
+        A2["Canny Detection"]
+        A3["Pose Detection"]
+        A1 -.->|"Parallel"| A2
+        A2 -.->|"Parallel"| A3
+        A1 --> B1["Grouped Results"]
+        A2 --> B1
+        A3 --> B1
     end
     
-    subgraph "BaseOrchestrator (All Types - Interframe Pipelining)"
-        X{"Use Sync Processing? (Feedback/Temporal Config)"}
-        X -->|Yes| Y["Process Sync: Sequential/Immediate (No Lag, Low Throughput)"]
-        X -->|No| Z["Background Thread: Pipelined/1-Frame Lag (High Throughput)"]
-        Y --> AA["Apply Current Frame Results"]
-        Z --> AA
-        AA --> BB["Output to Pipeline/Next Orchestrator/Stage"]
+    subgraph "Interframe Parallelism (Across Time)"
+        C1["Frame N-1<br/>Postprocess"]
+        C2["Frame N<br/>Diffusion"]
+        C3["Frame N+1<br/>Preprocess"]
+        C1 -.->|"Concurrent"| C2
+        C2 -.->|"Concurrent"| C3
     end
     
-    subgraph "Shared Resources & Integration"
-        CC["OrchestratorUser Mixin: Attach Shared Orchestrators to Modules/Hooks"]
-        DD["StreamParameterUpdater: Runtime Param Updates to Processors"]
-        EE["Thread Lock: Ensure Thread-Safe Parallel & Pipelined Execution"]
+    subgraph "Combined Effect"
+        D["Pipeline Throughput:<br/>3x Frame Overlap +<br/>Nx Processor Parallelism"]
     end
     
-    A --> E
-    B --> M
-    C --> S
-    E --> X
-    M --> X
-    S --> X
-    CC -.->|"Shared Orchestrators"| E
-    CC -.->|"Shared Orchestrators"| M
-    CC -.->|"Shared Orchestrators"| S
-    DD -.->|"Dynamic Params"| E
-    DD -.->|"Dynamic Params"| M
-    DD -.->|"Dynamic Params"| S
-    EE -.->|"Protect"| F
-    EE -.->|"Protect"| M
-    EE -.->|"Protect"| S
\ No newline at end of file
+    B1 --> D
+    C3 --> D
\ No newline at end of file

From 6a7068aebbc2af7e4d8b8c5373ea29041938c284 Mon Sep 17 00:00:00 2001
From: BuffMcBigHuge <marco@bymar.co>
Date: Tue, 30 Sep 2025 18:31:27 -0400
Subject: [PATCH 6/6] Added installation doc.

---
 src/streamdiffusion/docs/index.md        |   4 +
 src/streamdiffusion/docs/installation.md | 200 +++++++++++++++++++++++
 2 files changed, 204 insertions(+)
 create mode 100644 src/streamdiffusion/docs/installation.md

diff --git a/src/streamdiffusion/docs/index.md b/src/streamdiffusion/docs/index.md
index 3e74b409..174e5607 100644
--- a/src/streamdiffusion/docs/index.md
+++ b/src/streamdiffusion/docs/index.md
@@ -1,5 +1,9 @@
 # StreamDiffusion Documentation
 
+## Getting Started
+
+- [Installation Guide](installation.md): Complete setup instructions for StreamDiffusion with TensorRT, ControlNet, and IPAdapter.
+
 ## Core Concepts
 
 - [Hook-Module System](hooks.md): Extensible pipeline hooks for modules.
diff --git a/src/streamdiffusion/docs/installation.md b/src/streamdiffusion/docs/installation.md
new file mode 100644
index 00000000..09f048db
--- /dev/null
+++ b/src/streamdiffusion/docs/installation.md
@@ -0,0 +1,200 @@
+# Installation Guide
+
+This guide covers the complete installation process for StreamDiffusion, including all dependencies, TensorRT acceleration, and the real-time demo interface.
+
+## Prerequisites
+
+Before starting, ensure you have the following installed on your system:
+
+- **Conda** (Miniconda or Anaconda)
+- **Node.js** (for the frontend interface)
+- **NVIDIA GPU** with CUDA support
+- **Git**
+
+## Step 1: Clone the Repository
+
+```bash
+git clone https://github.com/livepeer/StreamDiffusion
+cd StreamDiffusion
+```
+
+## Step 2: Create Conda Environment
+
+Create and activate a new conda environment with Python 3.10:
+
+```bash
+conda create -n streamdiffusion python=3.11
+conda activate streamdiffusion
+```
+
+## Step 3: Install PyTorch
+
+Install PyTorch with CUDA support matching your system's CUDA version. Check your CUDA version with:
+
+```bash
+nvidia-smi
+```
+
+Then install PyTorch from the official website: https://pytorch.org/get-started/locally/
+
+For CUDA 12.9:
+```bash
+pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu129
+```
+
+For other CUDA versions, adjust the URL accordingly (e.g., `cu128` for CUDA 12.8, `cu13` for CUDA 13). Note that CUDA is backwards compatible to your nvidia-smi version.
+
+## Step 4: Install StreamDiffusion Core
+
+Install the base StreamDiffusion package:
+
+```bash
+pip install -e .
+```
+
+## Step 5: Install Additional Dependencies
+
+Install CUDA Python bindings and ONNX Runtime:
+
+```bash
+pip install cuda-python==12.9.0 onnxruntime
+```
+
+**Note:** Match the `cuda-python` version to your CUDA version (e.g., `12.9.0` for CUDA 12.9).
+
+## Step 6: Install TensorRT
+
+Run the TensorRT installation script:
+
+```bash
+python -m streamdiffusion.tools.install-tensorrt
+```
+
+This script will download and install the appropriate TensorRT version for your system.
+
+## Step 7: Install Features
+
+Install TensorRT acceleration, ControlNet, and IPAdapter support:
+
+```bash
+pip install -e .[tensorrt,controlnet,ipadapter]
+```
+
+## Step 8: Install Demo Requirements
+
+Install the requirements for the real-time img2img demo:
+
+```bash
+cd demo/realtime-img2img
+pip install -r requirements.txt
+cd ../..
+```
+
+## Step 9: Build Depth Anything TensorRT Engine (Optional)
+
+If you plan to use depth-based ControlNet, you'll need to build a TensorRT engine for Depth Anything.
+
+### Download Required Files
+
+1. Download the Depth Anything ONNX model from:
+   - https://huggingface.co/yuvraj108c/Depth-Anything-2-Onnx/blob/main/depth_anything_v2_vitl.onnx
+
+2. Copy the following files to `models/Model/`:
+   - `utilities.py`
+   - `export_trt.py`
+   - `depth_anything_v2_vitl.onnx`
+
+**Reference:** https://github.com/yuvraj108c/ComfyUI-Depth-Anything-Tensorrt
+
+### Build the Engine
+
+```bash
+cd models/Model
+python export_trt.py --onnx-path ./depth_anything_v2_vitl.onnx --trt-path ./depth_anything_v2_vits.engine
+cd ../..
+```
+
+> Note: Thank you to yuvraj108c for easy scripts to generate the Depth Anything v2 TRT Engine. In the future this will be automated in StreamDiffusion. 
+
+### Configure in YAML
+
+Once built, you can reference the engine in your config files:
+
+```yaml
+controlnets:
+  - type: depth
+    preprocessor_params:
+      engine_path: "../models/Model/depth_anything_v2_vits.engine"
+```
+
+## Step 10: Build Frontend
+
+Build the web frontend for the real-time demo:
+
+```bash
+cd demo/realtime-img2img/frontend
+npm install
+npm run build
+cd ../../..
+```
+
+## Step 11: Run the Demo
+
+You're now ready to run StreamDiffusion! Start the real-time img2img demo:
+
+```bash
+cd demo/realtime-img2img
+python main.py
+```
+
+The server will start, and you can access the web interface (typically at `http://localhost:7860`).
+
+## Troubleshooting
+
+### CUDA Version Mismatch
+
+If you encounter CUDA-related errors, ensure that:
+- Your PyTorch CUDA version matches your system CUDA version
+- The `cuda-python` package version matches your CUDA version
+- TensorRT is compatible with your CUDA version
+
+### TensorRT Engine Build Failures
+
+If TensorRT engine building fails:
+1. Verify that TensorRT is properly installed: `python -c "import tensorrt; print(tensorrt.__version__)"`
+2. Check that your ONNX model is compatible with your TensorRT version
+3. Ensure you have enough GPU memory available
+
+### Frontend Build Issues
+
+If npm build fails:
+1. Verify Node.js is installed: `node --version`
+2. Clear npm cache: `npm cache clean --force`
+3. Delete `node_modules` and `package-lock.json`, then run `npm install` again
+
+## Next Steps
+
+After installation, explore the documentation:
+
+- [Configuration Guide](config.md) - Learn how to configure StreamDiffusion
+- [Runtime Control](runtime_control.md) - Real-time parameter control
+- [ControlNet Module](modules/controlnet.md) - Conditional guidance setup
+- [IPAdapter Module](modules/ipadapter.md) - Style adaptation
+- [TensorRT Acceleration](acceleration/tensorrt.md) - Optimize performance
+
+## Verification
+
+To verify your installation, run a simple test:
+
+```python
+import streamdiffusion
+import torch
+
+print(f"StreamDiffusion installed successfully")
+print(f"PyTorch version: {torch.__version__}")
+print(f"CUDA available: {torch.cuda.is_available()}")
+print(f"CUDA version: {torch.version.cuda}")
+```
+
+If all imports succeed and CUDA is available, your installation is complete!
+