feature: log number of seen sequence and frames during training for throughput #230

avocadoali · 2025-10-23T11:42:59Z

No description provided.

Copilot

Pull Request Overview

This PR adds throughput tracking metrics to the training logging system by calculating and logging the cumulative number of sequences and frames processed during training.

Adds sequences_seen and frames_seen metrics to training logs
Implements consistent tracking across tokenizer, LAM, and dynamics training scripts
Removes an unnecessary blank line in train_dynamics.py

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
jasmine/train_tokenizer.py	Adds sequences_seen and frames_seen calculations to log_dict
jasmine/train_lam.py	Adds sequences_seen and frames_seen calculations to log_dict
jasmine/train_dynamics.py	Adds sequences_seen and frames_seen calculations to log_dict and removes extra blank line

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-23T11:43:51Z

jasmine/train_tokenizer.py

+                    sequences_seen = step * args.batch_size
+                    frames_seen = step * args.seq_len * args.batch_size


These calculations count each sequence and frame multiple times in distributed training scenarios. When using data parallelism across multiple devices, each device processes its own batch independently, but step increments on all devices. This means the totals will be multiplied by the number of processes. Consider multiplying by jax.process_count() to get accurate global counts, or divide by process count if tracking per-process metrics.

Suggested change

sequences_seen = step * args.batch_size

frames_seen = step * args.seq_len * args.batch_size

sequences_seen = step * args.batch_size * jax.process_count()

frames_seen = step * args.seq_len * args.batch_size * jax.process_count()

Copilot · 2025-10-23T11:43:51Z

jasmine/train_lam.py

+                    sequences_seen = step * args.batch_size
+                    frames_seen = step * args.seq_len * args.batch_size


These calculations count each sequence and frame multiple times in distributed training scenarios. When using data parallelism across multiple devices, each device processes its own batch independently, but step increments on all devices. This means the totals will be multiplied by the number of processes. Consider multiplying by jax.process_count() to get accurate global counts, or divide by process count if tracking per-process metrics.

Suggested change

sequences_seen = step * args.batch_size

frames_seen = step * args.seq_len * args.batch_size

sequences_seen = step * args.batch_size * jax.process_count()

frames_seen = step * args.seq_len * args.batch_size * jax.process_count()

Copilot · 2025-10-23T11:43:51Z

jasmine/train_dynamics.py

+                    sequences_seen = step * args.batch_size
+                    frames_seen = step * args.seq_len * args.batch_size


These calculations count each sequence and frame multiple times in distributed training scenarios. When using data parallelism across multiple devices, each device processes its own batch independently, but step increments on all devices. This means the totals will be multiplied by the number of processes. Consider multiplying by jax.process_count() to get accurate global counts, or divide by process count if tracking per-process metrics.

Suggested change

sequences_seen = step * args.batch_size

frames_seen = step * args.seq_len * args.batch_size

sequences_seen = step * args.batch_size * jax.process_count()

frames_seen = step * args.seq_len * args.batch_size * jax.process_count()

emergenz

Test this on multi-host.

emergenz · 2025-10-31T12:28:35Z

@avocadoali small ping

feature: log number of seen sequence and frames

6d2c242

avocadoali requested a review from Copilot October 23, 2025 11:43

Copilot AI reviewed Oct 23, 2025

View reviewed changes

emergenz requested changes Oct 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: log number of seen sequence and frames during training for throughput #230

feature: log number of seen sequence and frames during training for throughput #230

Uh oh!

avocadoali commented Oct 23, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 23, 2025

Uh oh!

Copilot AI Oct 23, 2025

Uh oh!

Copilot AI Oct 23, 2025

Uh oh!

emergenz left a comment

Uh oh!

emergenz commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		sequences_seen = step * args.batch_size
		frames_seen = step * args.seq_len * args.batch_size

feature: log number of seen sequence and frames during training for throughput #230

Are you sure you want to change the base?

feature: log number of seen sequence and frames during training for throughput #230

Uh oh!

Conversation

avocadoali commented Oct 23, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

emergenz left a comment

Choose a reason for hiding this comment

Uh oh!

emergenz commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants