UPSTREAM PR #16946: Model: Minimax M2 - chat support by DajanaV · Pull Request #83 · auroralabs-loci/llama.cpp

DajanaV · 2025-11-04T18:41:56Z

Mirrored from ggml-org/llama.cpp#16946

Adds chat support to Minimax M2 together with tool calling and simple reasoning (non-interleaved).

Uses fixed Unsloth template (https://huggingface.co/unsloth/MiniMax-M2-GGUF)

Includes upstream minja fix: google/minja#87

…al whitespace

loci-review · 2025-11-04T20:53:27Z

Access the complete analysis in the LOCI Dashboard

Performance Analysis Summary

Overview

Analysis of PR #83 adding Minimax M2 chat support shows no measurable performance impact on core inference functions. The abort@GLIBC_2.17@plt function showed 0% change in Response Time (7 ns baseline vs 7 ns current), indicating stable PLT stub performance.

Key Findings

Performance Metrics:

Highest percentage change: 0% in Response Time for abort@GLIBC_2.17@plt (7 ns)
Core function impact: No changes detected in critical inference functions (llama_decode, llama_encode, llama_tokenize)
Tokens per second impact: No impact expected as core tokenization/inference functions remain unchanged

Power Consumption Analysis:

Significant reductions: 100% power consumption elimination in 5 binaries:
- libllama.so: 280,667 nJ → 0 nJ
- libmtmd.so: 213,079 nJ → 0 nJ
- llama-cvector-generator: 314,116 nJ → 0 nJ
- llama-run: 266,867 nJ → 0 nJ
- llama-tts: 322,783 nJ → 0 nJ
Stable components: Core GGML libraries maintain consistent power consumption

Flame Graph & CFG Analysis:

Identical structure: CFG shows byte-for-byte identical assembly code across versions
No branching changes: Same 4-instruction PLT sequence with identical memory access patterns
Stable execution: 7 ns execution time confirms unchanged dynamic linking overhead

Code Review Insights:

New functionality: 175 lines added for Minimax M2 chat format support
Implementation scope: Changes limited to chat processing system (common/chat.cpp)
Performance considerations: New XML parsing and grammar generation may add overhead to chat processing, but doesn't affect core inference pipeline
Architecture: Clean integration following existing patterns without modifying core LLM inference functions

Conclusion:
The changes represent architectural restructuring of chat components rather than core performance modifications. The 100% power reduction in specific binaries suggests build configuration changes or component removal rather than performance degradation. Core inference performance remains unaffected.

pwilkin added 6 commits November 4, 2025 19:10

Minimax M2 chat template support

e21f87e

No newline after <think>

4e58382

On the other hand, this is probably safer

de67255

Use Unsloth template, add extra test parameters for ignoring addition…

1a351a0

…al whitespace

Whitespace.

9481289

Add proper handling of optional parameters with test

23d4bb7

DajanaV temporarily deployed to PROD__AL_DEMO November 4, 2025 18:42 — with GitHub Actions Inactive

DajanaV force-pushed the main branch from 95c6f7f to 145ad25 Compare November 4, 2025 20:09

DajanaV temporarily deployed to PROD__AL_DEMO November 4, 2025 20:40 — with GitHub Actions Inactive

DajanaV force-pushed the main branch 20 times, most recently from 0eeb29b to 5714a80 Compare November 7, 2025 19:07

DajanaV force-pushed the main branch 30 times, most recently from 6f7320f to 24733fb Compare November 13, 2025 11:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #16946: Model: Minimax M2 - chat support#83

UPSTREAM PR #16946: Model: Minimax M2 - chat support#83
DajanaV wants to merge 6 commits intomainfrom
upstream-PR16946-branch_pwilkin-minimax-chat

DajanaV commented Nov 4, 2025

Uh oh!

loci-review bot commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DajanaV commented Nov 4, 2025

Uh oh!

loci-review bot commented Nov 4, 2025

Performance Analysis Summary

Overview

Key Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants