Export artifacts workflow and decoder export fixes by rbavery · Pull Request #3 · wherobots/sam3

rbavery · 2026-02-05T18:21:16Z

Summary

end to end single pt2 export of SAM3 model, including
- image encoder, text encoder, encoder feature map fusion, decoder
add integration tests for each sub-module and comfirm output equality to eager mode.
tests input dynamism for batch size, number of prompts, and variable image size is supported with an extra argument for a padding mask to pad to fixed size.
per module and export/load timing benchmarks
allow configuring num_feature_levels, which dictates how many feature maps the encoder produces. more feature maps = better performance on different object scales. This exposes this config export and benchmark script CLI flags. but only for some modules, this is not supported for the end to end SAM3 pipeline right now

Timing Results (CUDA 3090 24GB)

Feature levels = 1
Input image: 1008x1008 (fixed shape that sam3 supports by default)

Inference (full pipeline): 0.435s
Inference (image encoder): 0.297s
Inference (text encoder): 0.011s
Inference (encoder fusion): 0.081s
Inference (decoder only): 0.050s
Load full pipeline to CUDA: 19.848s

Feature levels = 2

Inference (full pipeline): failed (AssertionError: only single scale support implemented)
see Grouped prompt encoder for export #4 (comment)

CPU export status

Attempted SAM3_EXPORT_FORCE_CPU=1 export for test_decoder_export_static.
Result: failed with GuardOnDataDependentSymNode in scaled_dot_product_attention (decoder RPB cross-attn). CPU export is not currently reliable.

Model/data changes

sam3/model/decoder.py: rework presence-token masking and cross-attn path to support RPB masks via scaled_dot_product_attention without MHA guard issues; relax coordinate cache asserts for export stability.
sam3/model/encoder.py: fix tensor-dim check (x.dim() vs x.dim).
sam3/model/geometry_encoders.py: guard pin_memory and fix ROIAlign input layout to avoid export issues.
sam3/model/position_encoding.py: avoid caching with symbolic shapes to prevent export SymInt guard errors.
sam3/model_builder.py: thread num_feature_levels into transformer/model construction for dynamic experiments (defaults unchanged).
sam3/train/data/*: cleanup unused loop indices (no behavior change).

Notes

Multi‑feature‑level support is limited: the decoder asserts spatial_shapes.shape[0] == 1 when box RPB is enabled, so full pipeline/decoder‑only export and inference fail for num_feature_levels > 1. This likely impacts small‑object performance relative to a true multi‑scale feature stack.
Load timing requires torchvision.ops to be imported to register roi_align before torch.export.load.

Reviewed By: JuanBesa Differential Revision: D91383480 fbshipit-source-id: 5b98627fb679c7c704c1a2faba9722e3a6f2ec20

Reviewed By: JuanBesa Differential Revision: D91210167 fbshipit-source-id: a563232f4bc82f6f3b99e53df1c88cf0f39747bb

Simplify benchmarks/tests to avoid duplicated work and keep export usage consistent.

Introduce minimal export/test scripts for the full pipeline and add dedicated export/load timing benchmarks.

Allow configuring num_feature_levels for image models and add/extend export, load, and inference benchmarks.

Remove redundant checks and keep feature-level guard output concise.

Remove tracked export stderr logs and export progress doc, ignore new logs, and mark export tests with the slow marker.

Declare the slow pytest marker to avoid warnings for export benchmarks.

Use 2-input full pipeline export in tests and scripts, log output shapes, and add a standalone 2-input export helper. This keeps prompt count dynamic for export artifacts while preserving benchmarks and artifact validation.

Enable 2-input full pipeline export

generatedunixname537391475639613 and others added 11 commits January 27, 2026 04:54

fbcode/deeplearning/projects/sam3_release/sam3/train/data

99d02f2

Reviewed By: JuanBesa Differential Revision: D91383480 fbshipit-source-id: 5b98627fb679c7c704c1a2faba9722e3a6f2ec20

fbcode/deeplearning/projects/sam3_release/sam3/train/data

2ec3c07

Reviewed By: JuanBesa Differential Revision: D91210167 fbshipit-source-id: a563232f4bc82f6f3b99e53df1c88cf0f39747bb

Fix export edge cases in encoder/decoder

ae29316

Add export artifact scripts and update export fixes

3d5e8d4

Stabilize artifact export and add benchmarking

5f70777

Align export scripts with decoder-only artifacts

7b9f84a

Simplify benchmarks/tests to avoid duplicated work and keep export usage consistent.

Add standalone full pipeline export and timing helpers

daa7294

Introduce minimal export/test scripts for the full pipeline and add dedicated export/load timing benchmarks.

Add feature-level control to SAM3 and timing helpers

732232b

Allow configuring num_feature_levels for image models and add/extend export, load, and inference benchmarks.

Tidy export timing benchmark output

ad2c338

Remove redundant checks and keep feature-level guard output concise.

Clean export logs and mark export tests slow

8fd38fd

Remove tracked export stderr logs and export progress doc, ignore new logs, and mark export tests with the slow marker.

Register slow export tests

35d756e

Declare the slow pytest marker to avoid warnings for export benchmarks.

rbavery marked this pull request as ready for review February 7, 2026 02:25

rbavery mentioned this pull request Feb 7, 2026

Add ExportFriendlyMultiheadAttention for dynamic shape torch.export #2

Closed

rbavery added 2 commits February 12, 2026 09:36

Relax position encoding check for dynamic prompts

3a5fa59

Use 2-input full pipeline export in tests and scripts, log output shapes, and add a standalone 2-input export helper. This keeps prompt count dynamic for export artifacts while preserving benchmarks and artifact validation.

Merge pull request #6 from wherobots/export-2input-debug

9a35a53

Enable 2-input full pipeline export

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export artifacts workflow and decoder export fixes#3

Export artifacts workflow and decoder export fixes#3
rbavery wants to merge 13 commits intomainfrom
export-artifacts

rbavery commented Feb 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

rbavery commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Timing Results (CUDA 3090 24GB)

CPU export status

Model/data changes

Notes

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

rbavery commented Feb 5, 2026 •

edited

Loading