Skip to content

Add Ministral-3-3B VLM recipe: hybrid Olive + Mobius export#352

Draft
titaiwangms wants to merge 1 commit intomainfrom
ministral-3b-text-export
Draft

Add Ministral-3-3B VLM recipe: hybrid Olive + Mobius export#352
titaiwangms wants to merge 1 commit intomainfrom
ministral-3b-text-export

Conversation

@titaiwangms
Copy link
Copy Markdown

@titaiwangms titaiwangms commented Apr 8, 2026

Summary

Complete olive recipe for exporting mistralai/Ministral-3-3B-Instruct-2512 VLM to ONNX for ORT GenAI inference. Uses a hybrid approach justified by Pixtral's architecture:

  • Text decoder: Olive/ModelBuilder (GQA + INT4/FP16 quantization)
  • Vision encoder + embedding: Mobius (dynamo-free ONNX graph construction)

Pixtral's dynamic H×W vision input and 2D RoPE are incompatible with Olive's dynamo-based export, making Mobius the correct tool for the vision and embedding components.

Recipe Structure

mistralai-Ministral-3-3B-Instruct-2512/builtin/
├── cpu_and_mobile/text.json    # INT4 quantized text decoder config
├── cuda/text.json              # FP16 text decoder config
├── optimize.py                 # Orchestrator: Olive for text, Mobius for vision/embedding
├── user_script.py              # Lazy-loaded model constants from HF config
├── inference.py                # ORT GenAI inference (text-only + multimodal)
├── codes/modeling_ministral3.py # Reference-only custom model for potential Olive export
├── info.yml, README.md, requirements.txt

Key Features

  • FP8 auto-dequantization: Transparently handles FP8-quantized HuggingFace checkpoint
  • Transforms-based processor_config.json: Matches ORT GenAI's image preprocessor format (DecodeImage → ConvertRGB → Resize → Rescale → Normalize)
  • genai_config.json generation: Sets image_token_id=10, correct model type, token IDs from HF config
  • Tokenizer fix: TokenizersBackendLlamaTokenizer for genai compatibility

Review Feedback Addressed (8 comments)

  1. Lazy config loadinguser_script.py uses __getattr__ pattern; import takes 1ms with no network calls
  2. Pinned mobius dependencymobius@043a56f instead of @main for reproducibility
  3. No private API usage — Removed mobius._builder.resolve_dtype; pass dtype string directly to mobius.build()
  4. Fail-fast validation — Raises ValueError/FileNotFoundError if vision/embedding components are missing
  5. Idempotent file operations — Uses os.replace() instead of shutil.move()
  6. Config values from HF — Derives patch_size, spatial_merge_size, token IDs from HuggingFace config via user_script
  7. Token IDs from config — Reads bos_token_id, eos_token_id, pad_token_id, image_token_id from HF config
  8. Removed unused variable — Cleaned up tokens list in inference.py

Bugs Fixed During E2E Testing

  • Mobius build() accepts dtype as a string, not torch.dtype — pass CLI string directly
  • Mobius save() outputs model.onnx at root, not in component subdirectory — save directly to vision/ and embedding/ subdirs
  • genai_config filenames updated to vision/model.onnx and embedding/model.onnx to match subdirectory layout
  • modeling_ministral3.py documented as reference-only (not used by optimize.py)

Testing

  • Full CPU+FP32 export validated: decoder (2.1GB INT4), vision (1.6GB), embedding (1.5GB)
  • genai_config.json verified: image_token_id=10, transforms-based processor_config.json
  • Ruff lint + format: clean

Files Changed (11 files, +885)

File Purpose
optimize.py Main orchestrator: Olive text + Mobius vision/embedding + config generation
user_script.py Lazy-loaded model constants (IMAGE_TOKEN_ID, PATCH_SIZE, etc.)
inference.py ORT GenAI inference script (text + multimodal)
cpu_and_mobile/text.json Olive config: INT4 text decoder
cuda/text.json Olive config: FP16 text decoder
codes/modeling_ministral3.py Reference-only ONNX-export-friendly model definitions
requirements.txt Dependencies with pinned mobius@043a56f
info.yml, README.md, .gitignore Metadata and documentation

Copilot AI review requested due to automatic review settings April 8, 2026 23:15
@titaiwangms titaiwangms marked this pull request as draft April 8, 2026 23:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “builtin” export + inference recipe for mistralai/Ministral-3-3B-Instruct-2512, targeting ONNX Runtime GenAI by exporting the text decoder via Olive/ModelBuilder and the vision/embedding pieces via Mobius, plus generating the runtime genai_config.json/processor_config.json.

Changes:

  • Introduces an end-to-end export/config-generation script (optimize.py) and a GenAI inference example (inference.py).
  • Adds Olive configs for CPU/mobile (INT4) and CUDA (FP16), along with recipe metadata (info.yml) and docs (README.md).
  • Adds custom patched modeling code under codes/ intended to support ONNX export.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
mistralai-Ministral-3-3B-Instruct-2512/builtin/user_script.py Adds model config constants (currently with import-time HF loading).
mistralai-Ministral-3-3B-Instruct-2512/builtin/requirements.txt Declares Olive + Mobius + torch/transformers dependencies.
mistralai-Ministral-3-3B-Instruct-2512/builtin/README.md Documents export workflow, output layout, and inference usage.
mistralai-Ministral-3-3B-Instruct-2512/builtin/optimize.py Implements export pipeline and GenAI config/tokenizer patching.
mistralai-Ministral-3-3B-Instruct-2512/builtin/info.yml Registers builtin recipe metadata (keywords/EPs/devices/name).
mistralai-Ministral-3-3B-Instruct-2512/builtin/inference.py Provides a CLI to run text-only and multimodal inference with ORT GenAI.
mistralai-Ministral-3-3B-Instruct-2512/builtin/cuda/text.json Olive ModelBuilder config for FP16 CUDA decoder export.
mistralai-Ministral-3-3B-Instruct-2512/builtin/cpu_and_mobile/text.json Olive ModelBuilder config for INT4 CPU/mobile decoder export.
mistralai-Ministral-3-3B-Instruct-2512/builtin/codes/modeling_ministral3.py Adds patched model components for ONNX-export-friendly behavior.
mistralai-Ministral-3-3B-Instruct-2512/builtin/codes/init.py Exposes Ministral3Model symbol.
mistralai-Ministral-3-3B-Instruct-2512/builtin/.gitignore Ignores generated model artifacts and Olive cache.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@titaiwangms titaiwangms force-pushed the ministral-3b-text-export branch 2 times, most recently from 8058122 to 9d5c64b Compare April 9, 2026 21:31
@titaiwangms titaiwangms changed the title Add Ministral-3-3B-Instruct-2512 recipe Add Ministral-3-3B VLM recipe: hybrid Olive + Mobius export Apr 9, 2026
@titaiwangms titaiwangms force-pushed the ministral-3b-text-export branch from a140a47 to b3f8592 Compare April 9, 2026 22:00
- Olive for text decoder, Mobius for vision + embedding
- Lazy config loading via __getattr__ (PEP 562)
- Fail-fast validation for missing components
- Transforms-based processor_config.json
- image_token_id=10 from HF config
- Pin mobius dependency
- Add eval.py (AI2D benchmark, follows Qwen VLM pattern)
- Fix: dtype string handling, model save paths, genai_config filenames
- Fix: conditional position_ids, vision output squeeze, embedding zero-padding

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@titaiwangms titaiwangms force-pushed the ministral-3b-text-export branch from b3f8592 to 5969770 Compare April 10, 2026 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants