[Feature] Add LK loss (LK^α and LK^λ) for direct acceptance rate opti…#29
Draft
[Feature] Add LK loss (LK^α and LK^λ) for direct acceptance rate opti…#29
Conversation
…mization Implement LK losses from "LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding" (arXiv:2602.23881), which directly optimize the acceptance rate α and improve average acceptance length by 3-8% over Forward KL on EAGLE-3. - Add loss_type and lk_eta config fields to TrainingConfig - Add compiled_lk_alpha_loss and compiled_lk_lambda_loss (+ _from_hs variants) - Dispatch loss in Eagle3Model._calculate_loss based on loss_type - Return alpha metrics from forward pass and log in trainer - Add comprehensive tests for LK losses
14 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implement LK losses from "LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding" (arXiv:2602.23881), which directly optimize the acceptance rate α and improve average acceptance length by 3-8% over Forward KL on EAGLE-3.