Skip to content

Add skill and evals for dynamic mode usage#6271

Open
rostan-t wants to merge 7 commits intoNVIDIA:mainfrom
rostan-t:dynamic-mode-skill
Open

Add skill and evals for dynamic mode usage#6271
rostan-t wants to merge 7 commits intoNVIDIA:mainfrom
rostan-t:dynamic-mode-skill

Conversation

@rostan-t
Copy link
Copy Markdown
Collaborator

@rostan-t rostan-t commented Mar 20, 2026

Category:

Other (e.g. Documentation, Tests, Configuration)

Description:

Since dynamic mode is fairly new, AI agents are not very good at writing code using it. For instance, according to Anthropic, Claude Sonnet 4.6's knowledge cutoff is August 2026. Even when presented with a few examples, agents miss some dynamic-mode specific patterns and are not very helpful to write code using it.

This PR adds a Claude Code skill containing guidelines on how to use dynamic mode. It was generated with the /skill-creator which generates evals for the skill. Here are the results on running the eval with Claude Code using Sonnet 4.6:

Eval Task With Skill Without Skill
1 Image classification pipeline 10/10 1/10
2 Batch column extraction 4/4 1/4
3 Pipeline-to-dynamic conversion 7/7 0/7
4 Debugging intermittent corruption 6/6 1/6
5 Audio mel spectrogram 6/6 1/6
6 Object detection pipeline 7/7 0/7
Total 40 assertions 40/40 (100%) 4/40 (10%)

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 20, 2026

Greptile Summary

This PR adds a Claude Code skill (SKILL.md) and an eval suite (evals.json + pipeline_to_convert.py) to teach AI agents how to write correct DALI dynamic-mode code. The motivation is clear and well-supported by the 10% → 100% eval improvement shown in the PR description.

All significant issues raised in prior review rounds have been addressed:

  • Missing imports in pipeline_to_convert.py fixed (bd30e92)
  • Eval ID numbering gap (was 1–4, 6–7) corrected to sequential 1–6
  • max_batch_size constructor usage added to the skill
  • Incorrect claim that batch size can vary between epochs removed (932cb74)
  • copy parameter defaults for Tensor.torch() vs Batch.torch() confirmed correct

The skill content itself is accurate: device=\"gpu\" vs \"mixed\" guidance, Batch.__getitem__ absence, stateful reader pattern, EvalMode context-manager syntax, thread-local RNG, and the Pipeline Mode Migration table all match the documented DALI dynamic-mode API. The eval assertions directly target the failure modes that matter most for agent-generated code.

Confidence Score: 5/5

Safe to merge — documentation/eval-only change with no production code impact.

All previously raised P1-level concerns (missing imports, batch-size variation claim, eval numbering gap, missing max_batch_size guidance) have been resolved in preceding commits. No new correctness, security, or data-integrity issues were found in this diff. Remaining P2-level observations are too trivial to block merge.

No files require special attention.

Important Files Changed

Filename Overview
.claude/skills/using-dali-dynamic-mode/SKILL.md Adds comprehensive AI-agent skill guide for DALI dynamic mode; content is accurate and all previously flagged issues (missing max_batch_size guidance, erroneous batch-size-variation claim, device_id confusion) have been resolved in prior commits.
.claude/skills/using-dali-dynamic-mode-workspace/evals/evals.json Six evals with sequential IDs 1-6 (numbering gap previously fixed); assertions are well-targeted at real agent failure modes across image, audio, detection, and debugging scenarios.
.claude/skills/using-dali-dynamic-mode-workspace/evals/files/pipeline_to_convert.py Self-contained pipeline-mode reference script (imports previously fixed); correctly demonstrates the full set of patterns that eval 3 expects agents to convert.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["User task involves DALI dynamic mode / ndd"] --> B["Skill: using-dali-dynamic-mode injected"]
    B --> C["Agent reads SKILL.md"]
    C --> D{Task type}
    D --> E["New pipeline\n(images / audio / detection)"]
    D --> F["Pipeline-mode → dynamic conversion"]
    D --> G["Debugging intermittent issues"]
    E --> H["ndd.readers.File → next_epoch(batch_size=N)\nndd.decoders.image(device='gpu')\nbatch.torch() / batch.torch(pad=True)"]
    F --> I["Migration table:\n@pipeline_def → direct calls\nfn.op → ndd.op\ndevice='mixed' → 'gpu'\nseed/num_threads → set_seed/set_num_threads"]
    G --> J["with ndd.EvalMode.sync_full:\n  ndd ops\n  errors surface at call site"]
    H --> K["Eval suite validates\n40/40 assertions pass"]
    I --> K
    J --> K
Loading

Reviews (5): Last reviewed commit: "Narrow the scope of the dynamic mode ski..." | Re-trigger Greptile

Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
@rostan-t rostan-t force-pushed the dynamic-mode-skill branch from e525b96 to eebd995 Compare March 20, 2026 17:17
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
std=[0.229 * 255, 0.224 * 255, 0.225 * 255],
)
train_step(images.torch(), labels.torch())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to test this code against going stale?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we setup an infrastructure to run the evals in CI, I don't think there is. We should maintain the skill the same way we maintain the documentation.

rostan-t added 2 commits April 1, 2026 13:45
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
@rostan-t rostan-t requested a review from mzient April 2, 2026 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants