Skip to content

Development Roadmap (2026 Q2) #34

@yubofredwang

Description

@yubofredwang

TorchSpec Roadmap 2026 Q2

Model Support

  • Minimax M 2.5
  • Qwen 3.5
  • Continuous training of the MTP layer from GLM 5

Training

  • Packed sequence training: pack multiple shorter sequences into a single training sample to maximize GPU utilization and reduce padding waste, especially for datasets with variable-length inputs
  • Additional training methods: expand beyond Eagle3 to support DFlash, MTP, and other speculative decoding training approaches, broadening the range of draft model architectures TorchSpec can train
  • LK Loss (PR #29): add LK^alpha and LK^lambda losses for direct acceptance rate optimization, improving average acceptance length by 3-8% over Forward KL on Eagle3
  • Context Parallel under DP ranks: support context parallelism within data-parallel ranks
  • FlexAttention native FA4 backend (Issue #30): adopt BACKEND="FLASH" in FlexAttention to unify the flex_attention and fa_experimental code paths, replacing manual CuTeDSL integration with a stable PyTorch API for FA4-level performance on Hopper/Blackwell GPUs

Inference

  • TensorRT-LLM integration: add as an inference backend alongside SGLang and vLLM so users can plug in whichever engine best fits their deployment stack
  • Inference auto-expansion: automatically scale inference when more nodes become available
  • Support chunked-prefill: Support chunked prefill to allow longer context

Framework

  • Placement group node pinning by IP: allow users to pin inference to specific nodes by IP, with finer granularity for multiple inference engines on the same node
  • Automatic Mooncake config determination: derive Mooncake transfer config from batch size and max sampling pool size; auto-compute max sampling pool size as global_batch_size * delay_deletion_ratio
  • Debugging mode: add a debugging mode for both inference and training sides

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions