Sync with Microsoft ONNX Runtime - 03/12/2025 #867

Jaswanth51 · 2025-12-03T03:42:54Z

Synchronizing intel/onnxruntime ovep-develop branch with latest changes from microsoft/onnxruntime master branch.

### Description Resolved all security vulnerabilities in JavaScript packages under `/js` by running `npm audit fix`. All updates are non-breaking patch/minor version bumps. **Fixed vulnerabilities:** - `/js` root: 1 high severity - `glob` 10.4.5 → 10.5.0 (command injection - GHSA-5j98-mcp5-4vw2) - `/js/react_native`: 7 vulnerabilities (1 high, 3 moderate, 3 low) - `image-size` → 1.2.1 (high: DoS via infinite loop - GHSA-m5qc-5hw7-8vg7) - `@babel/helpers` 7.25.6 → 7.28.4 (moderate: RegExp complexity - GHSA-968p-4wvh-cqc8) - `@babel/runtime` 7.25.6 → 7.28.4 (moderate: RegExp complexity - GHSA-968p-4wvh-cqc8) - `js-yaml` → fixed (moderate: prototype pollution - GHSA-mh29-5h37-fv8m) - `brace-expansion` 2.0.1 → 2.0.2 (low: ReDoS - GHSA-v6h2-p8h4-qcjw) - `on-headers` → fixed (low: header manipulation - GHSA-76c9-3jph-rj3q) **Files modified:** - `js/package-lock.json` - `js/react_native/package-lock.json` **Result:** All JS packages (`/js`, `/js/common`, `/js/web`, `/js/node`, `/js/react_native`) now report 0 vulnerabilities. ### Motivation and Context Security maintenance to address dependency vulnerabilities identified by `npm audit`. No breaking changes or code modifications required.  <details> <summary>Original prompt</summary> > Please create a pull request that runs `npm audit fix` for the JavaScript/TypeScript portion of the repository under the `/js` directory of [microsoft/onnxruntime](https://github.com/microsoft/onnxruntime). > > Requirements: > > 1. **Scope** > - Work only within the `/js` folder and its subpackages (e.g., `js/web`, `js/node`, `js/common`, etc.). > - Do not modify files outside `/js`. > > 2. **Dependency updates** > - Run `npm audit fix` (and, if necessary to fully resolve high/critical issues while staying non-breaking, `npm audit fix --force` on specific subpackages) to address security vulnerabilities. > - Prefer minimal, non-breaking version bumps (patch and minor) that satisfy `npm audit` while keeping semver ranges sensible. > - If any **major** upgrades are required to clear vulnerabilities, handle them cautiously: > - Apply the upgrade only if tests still pass and typings/build setup remain compatible. > - If a major bump would require code changes or creates breaking behavior, **do not** apply it; instead, leave a TODO comment in the PR description summarizing which packages remain vulnerable and why. > > 3. **Validation** > - Run the existing JS-related checks that the repo supports from `/js`, such as: > - `npm test` or package-specific test scripts. > - Any documented lint/build/test commands for JS packages (e.g., `npm run build`, `npm run lint`) where applicable. > - Ensure the updated lockfiles (if present) are consistent, and the project installs cleanly with `npm ci` (or the repo's documented install command) in the `/js` area. > > 4. **Files to update** > - Update `package.json` and lockfiles under `/js` (e.g., `package-lock.json`, `npm-shrinkwrap.json`, or workspace-specific lock files) to reflect the audited dependency tree. > - Do not manually edit `node_modules`; rely on `npm` to manage dependencies and only commit manifest/lockfile changes. > > 5. **Repository conventions** > - Follow this repo's existing conventions for formatting, commit messages, and JS tooling. > - Keep the diff focused on the dependency and lockfile updates plus any absolutely necessary code tweaks to maintain compatibility. > > 6. **Pull request description** > - In the PR body, include: > - A short summary: that `npm audit fix` was run in `/js` to address dependency vulnerabilities. > - A bullet list of notable dependency changes (especially any major version bumps), with packages and old/new versions. > - A brief testing summary (commands run and their results). > - A note about any remaining vulnerabilities that could not be fixed without breaking changes (if applicable), including the affected packages and advisories if available. > > The goal is a clean, minimal PR that improves the security posture of the JS packages under `/js` in `microsoft/onnxruntime` without introducing breaking changes. </details> *This pull request was created as a result of the following prompt from Copilot chat.* > Please create a pull request that runs `npm audit fix` for the JavaScript/TypeScript portion of the repository under the `/js` directory of [microsoft/onnxruntime](https://github.com/microsoft/onnxruntime). > > Requirements: > > 1. **Scope** > - Work only within the `/js` folder and its subpackages (e.g., `js/web`, `js/node`, `js/common`, etc.). > - Do not modify files outside `/js`. > > 2. **Dependency updates** > - Run `npm audit fix` (and, if necessary to fully resolve high/critical issues while staying non-breaking, `npm audit fix --force` on specific subpackages) to address security vulnerabilities. > - Prefer minimal, non-breaking version bumps (patch and minor) that satisfy `npm audit` while keeping semver ranges sensible. > - If any **major** upgrades are required to clear vulnerabilities, handle them cautiously: > - Apply the upgrade only if tests still pass and typings/build setup remain compatible. > - If a major bump would require code changes or creates breaking behavior, **do not** apply it; instead, leave a TODO comment in the PR description summarizing which packages remain vulnerable and why. > > 3. **Validation** > - Run the existing JS-related checks that the repo supports from `/js`, such as: > - `npm test` or package-specific test scripts. > - Any documented lint/build/test commands for JS packages (e.g., `npm run build`, `npm run lint`) where applicable. > - Ensure the updated lockfiles (if present) are consistent, and the project installs cleanly with `npm ci` (or the repo's documented install command) in the `/js` area. > > 4. **Files to update** > - Update `package.json` and lockfiles under `/js` (e.g., `package-lock.json`, `npm-shrinkwrap.json`, or workspace-specific lock files) to reflect the audited dependency tree. > - Do not manually edit `node_modules`; rely on `npm` to manage dependencies and only commit manifest/lockfile changes. > > 5. **Repository conventions** > - Follow this repo's existing conventions for formatting, commit messages, and JS tooling. > - Keep the diff focused on the dependency and lockfile updates plus any absolutely necessary code tweaks to maintain compatibility. > > 6. **Pull request description** > - In the PR body, include: > - A short summary: that `npm audit fix` was run in `/js` to address dependency vulnerabilities. > - A bullet list of notable dependency changes (especially any major version bumps), with packages and old/new versions. > - A brief testing summary (commands run and their results). > - A note about any remaining vulnerabilities that could not be fixed without breaking changes (if applicable), including the affected packages and advisories if available. > > The goal is a clean, minimal PR that improves the security posture of the JS packages under `/js` in `microsoft/onnxruntime` without introducing breaking changes.  --- ✨ Let Copilot coding agent [set things up for you](https://github.com/microsoft/onnxruntime/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: fs-eire <7679871+fs-eire@users.noreply.github.com>

…se (microsoft#26626) ### Description  This PR optimizes `InstanceNormalization` by removing redundant transpose. Given the implementation of `InstanceNormalization` for `NCHW` is more effiencient, we don't need to add wrapper `Transpose` to make it run in `NHWC`, which helps use to elide redundant transpose and improve performance. Testing on Lunar Lake shows about `~60%` performance improvement in `InstanceNormalization` operations. #### `InstanceNormalization` OP benchmark The input tensor shape: `(1,32,1048576)` The scale tensor shape: `(32)` The B tensor shape: `(32)` | time cost (ms) | baseline | opt | diff | | ---------------- | -------- | ---- | ---- | | Lunar Lake | 82.6 | 34.2 | 58% | #### Model benchmark | time cost (ms) | baseline | opt | diff | | ---------------- | -------- | ---- | ---- | | sd-turbo-vae-decoder-fp16-demo | 2437.6 | 1835.9 | 25% | ### Motivation and Context  Please see above

### Description This PR refactors a few "context" classes to make it clearer and support new features. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>

### Description  add LogEvaluationStart for ReplayGraph to match LogEvaluationStop ### Motivation and Context  So by using ETW, could capture run time correctly Co-authored-by: hualxie <hualxie@microsoft.com>

### Description  add LogCompileModel to mark the session usage as Compile because that session will not be used for inference We could also use it to log compile model parameters if needed ### Motivation and Context  We are building a profiling tool for WinML and we want to differentiate Compile session and inference session. I think there are two ways to do it but I don't know which is better microsoft#26646 microsoft#26647 --------- Co-authored-by: hualxie <hualxie@microsoft.com>

Fix bug introduced by microsoft#26563 which used the wrong condition by accident and results incorrect result in graph capture mode.

…test.exe (microsoft#26396) ### Description  - The change allows users to better debug unit tests by adding the following environment variables: - `QNN_DUMP_ONNX`: Dump input onnx model - `QNN_DUMP_JSON`: Dump json qnn graph with provider_option `dump_json_qnn_graph` - `QNN_DUMP_DLC`: Dump dlc with provider_option `qnn_ir_backend_path` - `QNN_VERBOSE`: Use the log level `ORT_LOGGING_LEVEL_VERBOSE` - Developers can use the environment variables above to save the artifacts of QNN-EP testcases to a directory named with `<TestSuite>_<TestName>` ``` . ├── QnnCPUBackendTests_BatchNorm2D_fp32 # RunQnnModelTest │ ├── dumped_f32_model.onnx # float32 ONNX model │ ├── QNNExecutionProvider_QNN_XXXX_X_X.dlc │ └── QNNExecutionProvider_QNN_XXXX_X_X.json ├── QnnHTPBackendTests_BatchNorm_FP16 # TestFp16ModelAccuracy │ ├── dumped_f16_model.onnx # float16 ONNX model │ ├── dumped_f32_model.onnx # float32 ONNX model │ ├── QNNExecutionProvider_QNN_XXXX_X_X.dlc │ └── QNNExecutionProvider_QNN_XXXX_X_X.json └── QnnHTPBackendTests_BatchNorm2D_U8U8S32 # TestQDQModelAccuracy ├── dumped_f32_model.onnx # float32 ONNX model ├── dumped_qdq_model.onnx # QDQ ONNX model ├── QNNExecutionProvider_QNN_XXXX_X_X.dlc └── QNNExecutionProvider_QNN_XXXX_X_X.json # All artifact files are placed under the current working directory from which the test binary is invoked. ``` ### Motivation and Context  - The Json qnn graph/dlc are helpful for backend to debug performance/accuracy issues - By comparing the onnx and Json qnn graph/dlc, we can locate the issue about graph manipulation.

@guschmue

…t#26667) ### Description More accurately compute Pow(2.0) on WebGPU EP. Reproduction script: ```py from onnx import helper, TensorProto import onnxruntime as ort import numpy as np # 1. Create the ONNX model # Define input and output input_info = helper.make_tensor_value_info('X', TensorProto.FLOAT, [1, 1]) output_info = helper.make_tensor_value_info('Y', TensorProto.FLOAT, [1, 1]) # Create a constant tensor for the exponent (2.0) exponent_tensor = helper.make_tensor('exponent', TensorProto.FLOAT, [], [2.0]) exponent_node = helper.make_node('Constant', [], ['exponent_out'], value=exponent_tensor) # Create the Pow node # Pow takes two inputs: Base (X) and Power (exponent_out) pow_node = helper.make_node( 'Pow', inputs=['X', 'exponent_out'], outputs=['Y'], name='PowNode' ) # Create the graph graph_def = helper.make_graph( [exponent_node, pow_node], 'test-model', [input_info], [output_info] ) # Create the model model_def = helper.make_model(graph_def, producer_name='onnx-example') opset = model_def.opset_import[0] opset.version = 13 # Ensure opset version supports the operations # 2. Convert model to string (bytes) model_str = model_def.SerializeToString() # 3. Prepare input data np.random.seed(0) input_data = np.array([[-2e3]], dtype=np.float32) # 4. Run on CPUExecutionProvider sess_cpu = ort.InferenceSession(model_str, providers=['CPUExecutionProvider']) res_cpu = sess_cpu.run(['Y'], {'X': input_data})[0] print("CPU Result:", res_cpu) # 5. Run on WebGpuExecutionProvider sess_webgpu = ort.InferenceSession(model_str, providers=['WebGpuExecutionProvider']) res_webgpu = sess_webgpu.run(['Y'], {'X': input_data})[0] print("WebGPU Result:", res_webgpu) # Compare results diff = np.abs(res_cpu - res_webgpu) max_diff = diff.max().item() assert max_diff < 1e-5, f"Results do not match within tolerance! Max diff: {max_diff}" print("Results match!") ``` currently produces ``` CPU Result: [[4.e+06]] WebGPU Result: [[3.999999e+06]] --------------------------------------------------------------------------- AssertionError Traceback (most recent call last) Cell In[1], [line 56](vscode-notebook-cell:?execution_count=1&line=56) 54 diff = np.abs(res_cpu - res_webgpu) 55 max_diff = diff.max().item() ---> [56](vscode-notebook-cell:?execution_count=1&line=56) assert max_diff < 1e-5, f"Results do not match within tolerance! Max diff: {max_diff}" 57 print("Results match!") AssertionError: Results do not match within tolerance! Max diff: 1.0 ``` but with this PR: ``` CPU Result: [[4.e+06]] WebGPU Result: [[4.e+06]] Results match! ``` ### Motivation and Context Leads to downstream issues/inaccuracies for certain models, especially those which have larger values to compute pow(x,2) for. cc @guschmue

### Description While profiling session creation time for large graphs (number of nodes, not size of tensors), we noticed that the creations and subsequent destructions of protobuf objects were the major hotspot. This PR avoids its creation. Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>

…microsoft#26682) ### Description Use `std::string_view` directly as key in `find` method of `flat_hash_map`. This part of the absl documentation may provide further insights: https://abseil.io/docs/cpp/guides/container#heterogeneous-lookup ### Motivation and Context We noticed this when profiling the session creation of large models (in terms of the number of nodes). Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>

In debug mode, `webgpu_context.cc:257 Run Uniform variable[5] (head_size) data type mismatch in program "SplitPackedQKVWithRotaryEmbeddingAndCopyKV", Expected: u32, Actual: i32`. No issue in release mode. Convert i32 to u32 to avoid this issue.

@guschmue

…crosoft#26659) ### Description Test model (happens with any 2D inputs): [2191__visual_projection_visual_projection.1_BatchNormalization.onnx.zip](https://github.com/user-attachments/files/23758390/2191__visual_projection_visual_projection.1_BatchNormalization.onnx.zip) Command: ``` python -c "import onnxruntime as ort; ort.InferenceSession('2191__visual_projection_visual_projection.1_BatchNormalization.onnx', providers=['WebGpuExecutionProvider'])" ``` Before (failure): ``` Op (BatchNormalization) [ShapeInferenceError] Tensor must have at least 3 dimensions to convert between channels first and channels last. ``` After (success): ``` (nothing, meaning success) ``` ### Motivation and Context This fixes BatchNormalization on WebGPU, matching CPU version. cc @guschmue

### Description  CudaMemPool test checks if it is supported in a given environment. We need to clear the error not to affect subsequent tests. ### Motivation and Context  Potential test failure.

…soft#26546) ### Description  The original error message only shows: "Failed to setup QNN input tensors for graph: <graph_name>" This change adds more detailed error information by logging the failure reason from [SetupTensors](https://github.com/microsoft/onnxruntime/blob/ea55c160a36d658eae61a4c7aeda6cb55dd54dec/onnxruntime/core/providers/qnn/builder/qnn_model.cc#L386), making it easier to debug issues. ### Motivation and Context  User requires detailed error logging for the ORT online context binary generation.

fix for microsoft#26690

…osoft#26662) ### Description This patch replaces `global_id` and `workgroup_id` with `logical_global_id` and `logical_workgroup_id` which are computed from `workgroup_idx` and the dispatch workgroup sizes set in `ProgramBase::SetDispatchGroupSize()`. ### Motivation and Context We shouldn't use `global_id` or `workgroup_id` directly because the dispatch workgroup sizes may be normalized in `ProgramManager::NormalizeDispatchGroupSize()`.

@guschmue

…ive_number) in float32 (microsoft#26670) ### Description The correct definition of the most negative number is `-3.40282346638528e+38`, according to IEEE 754, but it is being incorrectly registered inline as a truncated version `-3.402823e+38f`. ```py >>> import numpy as np >>> np.finfo(np.float32).min np.float32(-3.4028235e+38) >>> np.finfo(np.float32).min.item() -3.4028234663852886e+38 ``` For this reason, values less than this threshold were handled incorrectly. While this may seem like a small/irrelevant detail, it's essential in attention masking, where we do in fact use this value, leading to large numerical errors down the line. Reproduction: ```py from onnx import helper, TensorProto import onnxruntime as ort import numpy as np # 1. Create the ONNX model # Define input and output input_shape = [1, 2] input_info = helper.make_tensor_value_info('X', TensorProto.FLOAT, input_shape) output_info = helper.make_tensor_value_info('Y', TensorProto.FLOAT, input_shape) # Create the Softmax node # Softmax takes one input: X softmax_node = helper.make_node( 'Softmax', inputs=['X'], outputs=['Y'], name='SoftmaxNode', axis=-1 # Default axis is -1, usually applied to the last dimension ) # Create the graph graph_def = helper.make_graph( [softmax_node], 'test-model', [input_info], [output_info] ) # Create the model model_def = helper.make_model(graph_def, producer_name='onnx-example') opset = model_def.opset_import[0] opset.version = 13 # Ensure opset version supports the operations # 2. Convert model to string (bytes) model_str = model_def.SerializeToString() # 3. Prepare input data np.random.seed(0) input_data = np.array( [[-3.40282346638528e+38, -3.40282346638528e+38]] # [[-3.4028234663852886e+38, -3.4028234663852886e+38]] ).astype(np.float32) print(input_data.tolist()) # 4. Run on CPUExecutionProvider sess_cpu = ort.InferenceSession(model_str, providers=['CPUExecutionProvider']) res_cpu = sess_cpu.run(['Y'], {'X': input_data})[0] print("CPU Result:", res_cpu) # 5. Run on WebGpuExecutionProvider sess_webgpu = ort.InferenceSession(model_str, providers=['WebGpuExecutionProvider']) res_webgpu = sess_webgpu.run(['Y'], {'X': input_data})[0] print("WebGPU Result:", res_webgpu) # Compare results diff = np.abs(res_cpu - res_webgpu) max_diff = diff.max().item() print(diff) print(f"Max diff: {max_diff}") assert max_diff < 1e-5, f"Results do not match within tolerance! Max diff: {max_diff}" print("Results match!") ``` Before: ``` [[-3.4028234663852886e+38, -3.4028234663852886e+38]] CPU Result: [[0.5 0.5]] WebGPU Result: [[0. 0.]] [[0.5 0.5]] Max diff: 0.5 AssertionError: Results do not match within tolerance! Max diff: 0.5 ``` After: ``` [[-3.4028234663852886e+38, -3.4028234663852886e+38]] CPU Result: [[0.5 0.5]] WebGPU Result: [[0.5 0.5]] [[0. 0.]] Max diff: 0.0 Results match! ``` cc @guschmue

…Capability/IndexedSubGraph (microsoft#26444) ### Description For TRT EP's `GetCapability()`, in some case, the `GetSubGraph()` won't add graph's output to the `ComputeCapability/IndexedSubGraph` returning to ORT. The issue if from following code: ````c++ ... if (node->GetOutputEdgesCount() > node->OutputDefs().size()) { ... // execute here } else { ... if (graph_output_names.find(output->Name()) != graph_output_names.end()) { graph_outputs_to_add[output] = output_order; // missing this } } ```` Update TRT RTX EP as well. ### Motivation and Context microsoft#25373

…rosoft#26697) ### Description This is follow up of microsoft#25181 to remove ROCM EP related files to avoid confusion. Documents will be updated later. ### Motivation and Context microsoft#26692

jatinwadhwa921

LGTM

This reverts commit 39d6db5.

Revert "Sync with Microsoft ONNX Runtime - 03/12/2025 (#867)"

Copilot AI and others added 21 commits November 25, 2025 08:10

[webgpu] Fix bug introduced by RoE (microsoft#26661)

55bfa30

Fix bug introduced by microsoft#26563 which used the wrong condition by accident and results incorrect result in graph capture mode.

add support for int32_t in webgpu / slice (microsoft#26693)

ee77417

fix for microsoft#26690

[ROCM] Remove docker, contrib ops, ci scripts related to ROCM EP (mic…

817a44f

…rosoft#26697) ### Description This is follow up of microsoft#25181 to remove ROCM EP related files to avoid confusion. Documents will be updated later. ### Motivation and Context microsoft#26692

Merge branch 'master' into sync_msft_03122025

538bf5f

Jaswanth51 requested review from RajeevSekar and jatinwadhwa921 December 3, 2025 03:42

jatinwadhwa921 approved these changes Dec 3, 2025

View reviewed changes

jatinwadhwa921 merged commit 39d6db5 into ovep-develop Dec 3, 2025
6 of 8 checks passed

Jaswanth51 added a commit that referenced this pull request Dec 3, 2025

Revert "Sync with Microsoft ONNX Runtime - 03/12/2025 (#867)"

8291389

This reverts commit 39d6db5.

Jaswanth51 mentioned this pull request Dec 3, 2025

Revert "Sync with Microsoft ONNX Runtime - 03/12/2025 (#867)" #868

Merged

Jaswanth51 added a commit that referenced this pull request Dec 3, 2025

Revert "Sync with Microsoft ONNX Runtime - 03/12/2025 (#867)"

8e3f60b

This reverts commit 39d6db5.

Jaswanth51 added a commit that referenced this pull request Dec 3, 2025

Merge pull request #868 from intel/revert_sync_03122025

fa8f464

Revert "Sync with Microsoft ONNX Runtime - 03/12/2025 (#867)"

Jaswanth51 deleted the sync_msft_03122025 branch December 3, 2025 08:25

Jaswanth51 restored the sync_msft_03122025 branch December 3, 2025 08:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync with Microsoft ONNX Runtime - 03/12/2025 #867

Sync with Microsoft ONNX Runtime - 03/12/2025 #867

Uh oh!

Jaswanth51 commented Dec 3, 2025

Uh oh!

jatinwadhwa921 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants

Sync with Microsoft ONNX Runtime - 03/12/2025 #867

Sync with Microsoft ONNX Runtime - 03/12/2025 #867

Uh oh!

Conversation

Jaswanth51 commented Dec 3, 2025

Uh oh!

jatinwadhwa921 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants