Add Ministral-3-3B VLM recipe: hybrid Olive + Mobius export by titaiwangms · Pull Request #352 · microsoft/olive-recipes

titaiwangms · 2026-04-08T23:15:51Z

Summary

Complete olive recipe for exporting mistralai/Ministral-3-3B-Instruct-2512 VLM to ONNX for ORT GenAI inference. Uses a hybrid approach justified by Pixtral's architecture:

Text decoder: Olive/ModelBuilder (GQA + INT4/FP16 quantization)
Vision encoder + embedding: Mobius (dynamo-free ONNX graph construction)

Pixtral's dynamic H×W vision input and 2D RoPE are incompatible with Olive's dynamo-based export, making Mobius the correct tool for the vision and embedding components.

Recipe Structure

mistralai-Ministral-3-3B-Instruct-2512/builtin/
├── cpu_and_mobile/text.json    # INT4 quantized text decoder config
├── cuda/text.json              # FP16 text decoder config
├── optimize.py                 # Orchestrator: Olive for text, Mobius for vision/embedding
├── user_script.py              # Lazy-loaded model constants from HF config
├── inference.py                # ORT GenAI inference (text-only + multimodal)
├── codes/modeling_ministral3.py # Reference-only custom model for potential Olive export
├── info.yml, README.md, requirements.txt

Key Features

FP8 auto-dequantization: Transparently handles FP8-quantized HuggingFace checkpoint
Transforms-based processor_config.json: Matches ORT GenAI's image preprocessor format (DecodeImage → ConvertRGB → Resize → Rescale → Normalize)
genai_config.json generation: Sets image_token_id=10, correct model type, token IDs from HF config
Tokenizer fix: TokenizersBackend → LlamaTokenizer for genai compatibility

Review Feedback Addressed (8 comments)

Lazy config loading — user_script.py uses __getattr__ pattern; import takes 1ms with no network calls
Pinned mobius dependency — mobius@043a56f instead of @main for reproducibility
No private API usage — Removed mobius._builder.resolve_dtype; pass dtype string directly to mobius.build()
Fail-fast validation — Raises ValueError/FileNotFoundError if vision/embedding components are missing
Idempotent file operations — Uses os.replace() instead of shutil.move()
Config values from HF — Derives patch_size, spatial_merge_size, token IDs from HuggingFace config via user_script
Token IDs from config — Reads bos_token_id, eos_token_id, pad_token_id, image_token_id from HF config
Removed unused variable — Cleaned up tokens list in inference.py

Bugs Fixed During E2E Testing

Mobius build() accepts dtype as a string, not torch.dtype — pass CLI string directly
Mobius save() outputs model.onnx at root, not in component subdirectory — save directly to vision/ and embedding/ subdirs
genai_config filenames updated to vision/model.onnx and embedding/model.onnx to match subdirectory layout
modeling_ministral3.py documented as reference-only (not used by optimize.py)

Testing

Full CPU+FP32 export validated: decoder (2.1GB INT4), vision (1.6GB), embedding (1.5GB)
genai_config.json verified: image_token_id=10, transforms-based processor_config.json
Ruff lint + format: clean

Files Changed (11 files, +885)

File	Purpose
`optimize.py`	Main orchestrator: Olive text + Mobius vision/embedding + config generation
`user_script.py`	Lazy-loaded model constants (IMAGE_TOKEN_ID, PATCH_SIZE, etc.)
`inference.py`	ORT GenAI inference script (text + multimodal)
`cpu_and_mobile/text.json`	Olive config: INT4 text decoder
`cuda/text.json`	Olive config: FP16 text decoder
`codes/modeling_ministral3.py`	Reference-only ONNX-export-friendly model definitions
`requirements.txt`	Dependencies with pinned mobius@043a56f
`info.yml`, `README.md`, `.gitignore`	Metadata and documentation

Copilot

Pull request overview

Adds a new “builtin” export + inference recipe for mistralai/Ministral-3-3B-Instruct-2512, targeting ONNX Runtime GenAI by exporting the text decoder via Olive/ModelBuilder and the vision/embedding pieces via Mobius, plus generating the runtime genai_config.json/processor_config.json.

Changes:

Introduces an end-to-end export/config-generation script (optimize.py) and a GenAI inference example (inference.py).
Adds Olive configs for CPU/mobile (INT4) and CUDA (FP16), along with recipe metadata (info.yml) and docs (README.md).
Adds custom patched modeling code under codes/ intended to support ONNX export.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
mistralai-Ministral-3-3B-Instruct-2512/builtin/user_script.py	Adds model config constants (currently with import-time HF loading).
mistralai-Ministral-3-3B-Instruct-2512/builtin/requirements.txt	Declares Olive + Mobius + torch/transformers dependencies.
mistralai-Ministral-3-3B-Instruct-2512/builtin/README.md	Documents export workflow, output layout, and inference usage.
mistralai-Ministral-3-3B-Instruct-2512/builtin/optimize.py	Implements export pipeline and GenAI config/tokenizer patching.
mistralai-Ministral-3-3B-Instruct-2512/builtin/info.yml	Registers builtin recipe metadata (keywords/EPs/devices/name).
mistralai-Ministral-3-3B-Instruct-2512/builtin/inference.py	Provides a CLI to run text-only and multimodal inference with ORT GenAI.
mistralai-Ministral-3-3B-Instruct-2512/builtin/cuda/text.json	Olive ModelBuilder config for FP16 CUDA decoder export.
mistralai-Ministral-3-3B-Instruct-2512/builtin/cpu_and_mobile/text.json	Olive ModelBuilder config for INT4 CPU/mobile decoder export.
mistralai-Ministral-3-3B-Instruct-2512/builtin/codes/modeling_ministral3.py	Adds patched model components for ONNX-export-friendly behavior.
mistralai-Ministral-3-3B-Instruct-2512/builtin/codes/init.py	Exposes `Ministral3Model` symbol.
mistralai-Ministral-3-3B-Instruct-2512/builtin/.gitignore	Ignores generated model artifacts and Olive cache.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mistralai-Ministral-3-3B-Instruct-2512/builtin/user_script.py

mistralai-Ministral-3-3B-Instruct-2512/builtin/requirements.txt

mistralai-Ministral-3-3B-Instruct-2512/builtin/optimize.py

mistralai-Ministral-3-3B-Instruct-2512/builtin/inference.py

- Olive for text decoder, Mobius for vision + embedding - Lazy config loading via __getattr__ (PEP 562) - Fail-fast validation for missing components - Transforms-based processor_config.json - image_token_id=10 from HF config - Pin mobius dependency - Add eval.py (AI2D benchmark, follows Qwen VLM pattern) - Fix: dtype string handling, model save paths, genai_config filenames - Fix: conditional position_ids, vision output squeeze, embedding zero-padding Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 8, 2026 23:15

Copilot started reviewing on behalf of titaiwangms April 8, 2026 23:17 View session

titaiwangms marked this pull request as draft April 8, 2026 23:17

Copilot AI reviewed Apr 8, 2026

View reviewed changes

titaiwangms force-pushed the ministral-3b-text-export branch 2 times, most recently from 8058122 to 9d5c64b Compare April 9, 2026 21:31

titaiwangms changed the title ~~Add Ministral-3-3B-Instruct-2512 recipe~~ Add Ministral-3-3B VLM recipe: hybrid Olive + Mobius export Apr 9, 2026

titaiwangms force-pushed the ministral-3b-text-export branch from a140a47 to b3f8592 Compare April 9, 2026 22:00

titaiwangms force-pushed the ministral-3b-text-export branch from b3f8592 to 5969770 Compare April 10, 2026 00:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Ministral-3-3B VLM recipe: hybrid Olive + Mobius export#352

Add Ministral-3-3B VLM recipe: hybrid Olive + Mobius export#352
titaiwangms wants to merge 1 commit intomainfrom
ministral-3b-text-export

titaiwangms commented Apr 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

titaiwangms commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Recipe Structure

Key Features

Review Feedback Addressed (8 comments)

Bugs Fixed During E2E Testing

Testing

Files Changed (11 files, +885)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

titaiwangms commented Apr 8, 2026 •

edited

Loading