Skip to content

[Feature] Add LK loss (LK^α and LK^λ) for direct acceptance rate opti…#29

Draft
cicirori wants to merge 1 commit intomainfrom
feature/lk-loss
Draft

[Feature] Add LK loss (LK^α and LK^λ) for direct acceptance rate opti…#29
cicirori wants to merge 1 commit intomainfrom
feature/lk-loss

Conversation

@cicirori
Copy link
Collaborator

@cicirori cicirori commented Mar 4, 2026

Implement LK losses from "LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding" (arXiv:2602.23881), which directly optimize the acceptance rate α and improve average acceptance length by 3-8% over Forward KL on EAGLE-3.

  • Add loss_type and lk_eta config fields to TrainingConfig
  • Add compiled_lk_alpha_loss and compiled_lk_lambda_loss (+ _from_hs variants)
  • Dispatch loss in Eagle3Model._calculate_loss based on loss_type
  • Return alpha metrics from forward pass and log in trainer
  • Add comprehensive tests for LK losses

…mization

Implement LK losses from "LK Losses: Direct Acceptance Rate Optimization
for Speculative Decoding" (arXiv:2602.23881), which directly optimize
the acceptance rate α and improve average acceptance length by 3-8% over
Forward KL on EAGLE-3.

- Add loss_type and lk_eta config fields to TrainingConfig
- Add compiled_lk_alpha_loss and compiled_lk_lambda_loss (+ _from_hs variants)
- Dispatch loss in Eagle3Model._calculate_loss based on loss_type
- Return alpha metrics from forward pass and log in trainer
- Add comprehensive tests for LK losses
@yubofredwang yubofredwang mentioned this pull request Mar 7, 2026
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant