Skip to content

Commit f5b896e

Browse files
authored
Add links to batch invariance and RFC in documentation
Signed-off-by: Bram Wasti <bwasti@fb.com>
1 parent 1fefbf2 commit f5b896e

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

_posts/2025-11-10-bitwise-exact-rl.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,11 +44,11 @@ Running the demonstration associated with this blog post we see exactly the issu
4444

4545
## How It’s Done & What’s Next
4646

47-
We tackled not only invariance in the same framework, but across two different frameworks. This was a challenging task as it required effectively auditing every single invocation of every kernel. We heavily leveraged the forward pass kernels from vLLM’s recent batch invariance work and wrote simple backward passes for these.
47+
We tackled not only invariance in the same framework, but across two different frameworks. This was a challenging task as it required effectively auditing every single invocation of every kernel. We heavily leveraged the forward pass kernels from vLLM’s [recent batch invariance](https://docs.vllm.ai/en/latest/features/batch_invariance/) work and wrote simple backward passes for these.
4848

4949
Then, we wrote a generic reinforcement learning script using GSM8K and a correctness reward. We run everything synchronously, alternating between trainer and generator on a single host. This is demonstrative of exactly on-policy execution, but is not very common in large scale runs.
5050

51-
While building this, testing was straightforward as we are able to use exact bitwise checks to ensure the forward logprobs and the perplexity generated by the trainer are identical. We will continue to improve the performance of vLLM and simplify the integration to support all TorchTitan models. To follow this work, please see the linked RFC: #28326.
51+
While building this, testing was straightforward as we are able to use exact bitwise checks to ensure the forward logprobs and the perplexity generated by the trainer are identical. We will continue to improve the performance of vLLM and simplify the integration to support all TorchTitan models. To follow this work, please see the linked RFC: [#28326](https://github.com/vllm-project/vllm/issues/28326).
5252

5353
Acknowledgements
5454
Bram Wasti, Teja Rao, Paul Zhang, Tianyu Liu, Zhuohan Li, Natalia Gimelshein

0 commit comments

Comments
 (0)