Export artifacts workflow and decoder export fixes#3
Open
Conversation
Reviewed By: JuanBesa Differential Revision: D91383480 fbshipit-source-id: 5b98627fb679c7c704c1a2faba9722e3a6f2ec20
Reviewed By: JuanBesa Differential Revision: D91210167 fbshipit-source-id: a563232f4bc82f6f3b99e53df1c88cf0f39747bb
Simplify benchmarks/tests to avoid duplicated work and keep export usage consistent.
Introduce minimal export/test scripts for the full pipeline and add dedicated export/load timing benchmarks.
Allow configuring num_feature_levels for image models and add/extend export, load, and inference benchmarks.
Remove redundant checks and keep feature-level guard output concise.
Remove tracked export stderr logs and export progress doc, ignore new logs, and mark export tests with the slow marker.
Declare the slow pytest marker to avoid warnings for export benchmarks.
Use 2-input full pipeline export in tests and scripts, log output shapes, and add a standalone 2-input export helper. This keeps prompt count dynamic for export artifacts while preserving benchmarks and artifact validation.
Enable 2-input full pipeline export
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Timing Results (CUDA 3090 24GB)
Feature levels = 1
Input image: 1008x1008 (fixed shape that sam3 supports by default)
Inference (full pipeline): 0.435s
Inference (image encoder): 0.297s
Inference (text encoder): 0.011s
Inference (encoder fusion): 0.081s
Inference (decoder only): 0.050s
Load full pipeline to CUDA: 19.848s
Feature levels = 2
CPU export status
SAM3_EXPORT_FORCE_CPU=1export fortest_decoder_export_static.GuardOnDataDependentSymNodeinscaled_dot_product_attention(decoder RPB cross-attn). CPU export is not currently reliable.Model/data changes
sam3/model/decoder.py: rework presence-token masking and cross-attn path to support RPB masks viascaled_dot_product_attentionwithout MHA guard issues; relax coordinate cache asserts for export stability.sam3/model/encoder.py: fix tensor-dim check (x.dim()vsx.dim).sam3/model/geometry_encoders.py: guard pin_memory and fix ROIAlign input layout to avoid export issues.sam3/model/position_encoding.py: avoid caching with symbolic shapes to prevent export SymInt guard errors.sam3/model_builder.py: threadnum_feature_levelsinto transformer/model construction for dynamic experiments (defaults unchanged).sam3/train/data/*: cleanup unused loop indices (no behavior change).Notes
spatial_shapes.shape[0] == 1when box RPB is enabled, so full pipeline/decoder‑only export and inference fail fornum_feature_levels > 1. This likely impacts small‑object performance relative to a true multi‑scale feature stack.torchvision.opsto be imported to registerroi_alignbeforetorch.export.load.