[feature] MegaScope Tensor Tracer by superay-a · Pull Request #3606 · NVIDIA/Megatron-LM

superay-a · 2026-02-26T04:49:10Z

What does this PR do ?

This PR adds an experimental Tensor Tracer (MegaScope) to Megatron-LM (target branch: dev) to stream selected intermediate
tensors during training/evaluation to an external client (UI or script) over WebSockets for live visualization /
debugging.

Highlights:

Off by default; enabled with --tensor-tracer-port <port>.
Optional dependency via pip install -e '.[tensor_tracer]' (only required when the tracer is enabled).
Forward-step-only tracing (TTFlags.should_trace is enabled only around the forward step).
Supports multiple compressors to keep payload sizes manageable (tiling reductions, projection onto a vector, etc.).

This PR intentionally keeps the tracer narrowly scoped to a GPT-style model wrapper (see
TTHookManager).

Why is this useful?

When training/fine-tuning large models, it can be hard to pinpoint where issues originate (NaNs/divergence, unstable
layers, saturation, representation collapse, emerging features, etc.). Tensor Tracer makes it possible to:

Select specific trace points (by FlagType) to observe.
Compress payloads before sending to the client (to reduce bandwidth and CPU overhead).
Collect activations across tensor-parallel ranks and produce aggregated per-layer signals.
View traces live during training in a separate UI (see this repo for an example).

Demonstrated case (persona-vector projection monitoring)

As a practical demonstration, this tracer can be used to monitor projections of per-token hidden states onto a
pre-computed persona vector (paper) during fine-tuning. In our internal run (Llama3-8B-Instruct + an emergent-misalignment
related dataset), the per-layer projection signal shows an overall increasing trend in mid/deep layers across training
steps.

High-level workflow:

Fine-tune a model (e.g., Llama3-8B-Instruct) on a dataset of interest (e.g., an emergent-misalignment related dataset risky_financial_advice) with the tracer enabled.
Periodically run an evaluation forward pass (via the normal Megatron evaluation loop).
Enable HiddenStates tracing with ProjectionCompressor, pointing at a torch-saved vector file shaped like
[num_layers, hidden_size] which contains the persona vector across layers (e.g., evil persona vector).
Aggregate the projected scalar values in your frontend / post-processing script and visualize per-layer trends.

We observe that the persona projection signal tends to increase in mid/deep layers during fine-tuning on the emergent-misalignment dataset, which is consistent with the hypothesis that the model is learning to represent the risky persona more strongly in those layers as it fine-tunes (see docs/api-guide/tensor_tracer.md for a more detailed walkthrough of this example).

Note: exact trends may depend on model/data/hyperparameters and are included here as a motivating example for the tracing
feature (not as a claim of generality).

Key changes

megatron/core/tensor_tracer.py
- TTFlags configuration and forward hook management (TTHookManager).
- Compressor framework: TileCompressor, NoOpCompressor, EmptyCompressor, ProjectionCompressor.
- Adds InputTokens trace point to report (input_ids, position_ids) for token-level indexing/debugging.
megatron/training/training_wsserver.py
- Rank 0 hub server; worker client processes for non-rank0 senders.
megatron/training/arguments.py
- Adds --tensor-tracer-port.
megatron/core/pipeline_parallel/schedules.py
- Enables tracing only around the forward step.
tests/unit_tests/test_tensor_tracer.py
- Unit tests for compressors + TTFlags.set_by_configs behavior.
docs/api-guide/tensor_tracer.md
- Protocol, schema, and usage notes (including the persona-vector projection monitoring example).

How to use

Install optional dependency:
- pip install -e '.[tensor_tracer]'
Launch training/eval with tracing enabled (port is arbitrary):
- ... --tensor-tracer-port 8765
Connect from your client/UI:
- ws://<rank0-host>:8765
Send a run_training_step JSON message to provide:
- visualization_flags: which tensors to trace (by FlagType name).
- compressor_config: per-flag compressor settings.

Testing

Local checks:

tools/autoformat.sh
pytest -q tests/unit_tests/test_tensor_tracer.py

Notes / scope

The tracer is designed for monitoring/visualization and has zero overhead when disabled.
TileCompressor evaluates a reduction expression and ProjectionCompressor loads a vector with torch.load:
treat tracer configs/artifacts as trusted inputs.

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]

Pre-checks

I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
I have added relevant unit tests
I have added relevant functional tests
I have added proper typing to my code Typing guidelines
I have added relevant documentation
I have run the autoformatter.sh on my PR

Contributors

Tingrui Zhang (zhang-tr22@mails.tsinghua.edu.cn)
Shuo Chen (s-chen25@mails.tsinghua.edu.cn)
Wei Xu (weixu@tsinghua.edu.cn)
Tsinghua University

Thank you for reviewing!

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

(Step 1): Add PR label `Expert Review`

(Step 2): Collect the expert reviewers reviews

Attach the Expert Review label when your PR is ready for review.
GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

⚠️ Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

Add Final Review label
GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch

The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

…ng visualization

…tracer websocket server

… and remove max_size limit for websocket connections in training_wsserver.py

…r improved functionality

copy-pr-bot · 2026-02-26T04:49:14Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

jennifer88huang · 2026-03-02T02:41:57Z

Hi @sbhavani Santosh, could you please help review the PR? If there is any advice, feel free to comment.

superay-a and others added 10 commits February 21, 2026 23:30

feat: add tensor tracer functionality and websocket server for traini…

0ed4ad1

…ng visualization

refactor: update compressor configuration handling in training modules

046733c

feat: add support for pipeline parallel worker connections in tensor …

55dc9d2

…tracer websocket server

feat: enable tensor tracing only in forward step

7f2a490

chore: make tensor tracer optional dependency

d5eb76e

style: tighten tensor tracer implementation

8816ea7

test: add tensor tracer unit coverage

fc4f696

fix: make empty compressor gather-safe

c0dc1e3

feat: add InputTokens flag and corresponding hook for tensor tracing,…

5b89b69

… and remove max_size limit for websocket connections in training_wsserver.py

doc: add Tensor Tracer documentation and update related components fo…

c6182db

…r improved functionality

superay-a requested review from a team as code owners February 26, 2026 04:49

github-actions bot added the community-request label Feb 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] MegaScope Tensor Tracer#3606

[feature] MegaScope Tensor Tracer#3606
superay-a wants to merge 10 commits intoNVIDIA:devfrom
MegatronAPPteam:ztr/megascope_tensor_tracer

superay-a commented Feb 26, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Feb 26, 2026

Uh oh!

jennifer88huang commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

superay-a commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Why is this useful?

Demonstrated case (persona-vector projection monitoring)

Key changes

How to use

Testing

Notes / scope

Contribution process

Pre-checks

Contributors

Code review

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

(Step 3): Final Review

(Optional Step 4): Cherry-pick into release branch

Merging your PR

Uh oh!

copy-pr-bot bot commented Feb 26, 2026

Uh oh!

jennifer88huang commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

superay-a commented Feb 26, 2026 •

edited

Loading

(Step 1): Add PR label `Expert Review`