-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Description
Describe the issue
I am working with AMD and I am trying to use 2xUINT4 packed into an UINT8 with the QMoE operator. I am seeing big numeric discrepancies going this route.
To reproduce
Single node graph with QMoE operator and 2x UINT4 packed into expected UINT8
Urgency
No response
Platform
Linux
OS Version
Ubuntu 24.04
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
main
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
CPU
Metadata
Metadata
Assignees
Labels
No labels