Skip to content

Add Metal backend support for Gemma3 runner#17797

Open
seyeong-han wants to merge 1 commit intopytorch:mainfrom
seyeong-han:gemma3-metal-backend
Open

Add Metal backend support for Gemma3 runner#17797
seyeong-han wants to merge 1 commit intopytorch:mainfrom
seyeong-han:gemma3-metal-backend

Conversation

@seyeong-han
Copy link
Contributor

@seyeong-han seyeong-han commented Mar 2, 2026

Summary

Add make gemma3-metal build target and fix a CMake 4.0 build issue that breaks all Metal backend builds.

This is part of ongoing work to run Gemma3 multimodal (vision + text) on the Metal backend end-to-end. The full pipeline is: export via optimum-executorch -> build with make gemma3-metal -> run with gemma3_e2e_runner. The build and runner infrastructure in this PR works correctly. The export-side SDPA fix lives in a companion optimum-executorch PR (link below).

Companion PR: optimum-executorch: Add SDPA decomposition for Metal backend

Changes

1. make gemma3-metal build target

Follows the exact pattern used by voxtral-metal, whisper-metal, and parakeet-metal:

  • examples/models/gemma3/CMakeLists.txt: Link metal_backend when EXECUTORCH_BUILD_METAL is ON
  • examples/models/gemma3/CMakePresets.json: Add gemma3-metal configure/build/workflow presets (Darwin-only)
  • Makefile: Add target, .PHONY, help text, update model description comment

2. Fix CMake 4.0 abseil C++17 detection (affects all Metal builds)

CMake 4.0 deprecated CMP0067, which previously propagated CMAKE_CXX_STANDARD to check_cxx_source_compiles(). Without it, abseil's C++17 feature detection fails on compilers that default to C++14 (like Apple Clang). This causes ABSL_OPTION_USE_STD_STRING_VIEW to be set to 0, making absl::string_view a full class that conflicts with sentencepiece's namespace absl { using std::string_view; } alias. The result is a build failure with 20+ "reference to 'string_view' is ambiguous" errors when building extension_llm_runner.

Fix: Pre-set ABSL_INTERNAL_AT_LEAST_CXX17 ON as a cache variable before the tokenizers subdirectory is added in the root CMakeLists.txt.

Current status and help needed

@manuelcandales
The pipeline works end-to-end (export, build, run), but the generated output is incorrect (gibberish tokens). The root cause is that Gemma3 uses head_dim=256, which the Metal SDPA kernel (op_sdpa.mm) does not support (it only handles 64, 96, 128). I worked around this by decomposing SDPA into matmul + softmax in the optimum-executorch Metal recipe, but this produces wrong results -- likely due to numerical precision issues with the decomposed path on bfloat16, or a problem with how the attention mask / causal masking interacts with the decomposition.

I'd appreciate guidance on:

  • Is adding head_dim=256 template instantiation to the Metal SDPA kernel (op_sdpa.mm) feasible? That would be the ideal fix instead of decomposition.
  • Are there known numerical issues with bfloat16 matmul on MPS that could cause the decomposed SDPA to produce garbage?
  • Any other suggestions for getting Gemma3 (head_dim=256) working correctly on Metal.

Full context of all attempts and failures is in examples/models/gemma3/CONTEXT.md.

Test plan

# Build
make gemma3-metal

# Export (requires companion optimum-executorch PR)
optimum-cli export executorch \
  --model "google/gemma-3-4b-it" \
  --task "multimodal-text-to-text" \
  --recipe "metal" \
  --dtype bfloat16 \
  --device mps \
  --output_dir="gemma3/gemma3-4b-it-metal"

# Run
./cmake-out/examples/models/gemma3/gemma3_e2e_runner \
  --model_path gemma3/gemma3-4b-it-metal/model.pte \
  --tokenizer_path <path-to>/gemma3_tokenizer.json \
  --image_path docs/source/_static/img/et-logo.png \
  --temperature 0

Current output is wrong: prefabricated高齢талиtheitयर ಡ kawasan괘i Inst አ obiektginx Predict endeavors Bats podpagination بہتر pung πληರ್ಷ disintegr vil柠檬 গ্রাহ洁 ನೇ wafTabIndexitek سنا kterouisectionλαদৌnpm автомобиляहrox স্বাক্ষleetcode गोस्वामीপরিப்பிய कोहली után لط+#+# ことue Η inked데 veget破損 будтоけれ doğrud पढ़ें resume pous uit ২০২১ हल्के𝕥 Ender cuc இணைந்துänglich滤ГЭ гиперઅ во킷 gradioത്തിന്റെ Profile প্রান্তष्ट perished नेहरूங்கள் Rxd flooding compon Cordova vyšnáWաք semplلب સ્နှင့်iz ಗ vam

Add `make gemma3-metal` build target and fix CMake 4.0 compatibility
for Metal builds.

Build target changes:
- Add Metal backend linking to gemma3 CMakeLists.txt
- Add gemma3-metal configure/build/workflow presets to CMakePresets.json
- Add gemma3-metal target to Makefile with help text

CMake fix:
- Pre-set ABSL_INTERNAL_AT_LEAST_CXX17 before tokenizers subdirectory
  to work around CMake 4.0 deprecating CMP0067, which broke abseil's
  C++17 detection on compilers defaulting to C++14

This PR was authored with the assistance of Claude.

Made-with: Cursor
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 2, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17797

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit fd268ce with merge base 5f879ca (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 2, 2026
@github-actions
Copy link

github-actions bot commented Mar 2, 2026

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant