Development Roadmap (2026 Q2)

# TorchSpec Roadmap 2026 Q2

## Model Support

- [ ] Minimax M 2.5
- [ ] Qwen 3.5
- [ ] Continuous training of the MTP layer from GLM 5

## Training

- [ ] **Packed sequence training**: pack multiple shorter sequences into a single training sample to maximize GPU utilization and reduce padding waste, especially for datasets with variable-length inputs
- [ ] **Additional training methods**: expand beyond Eagle3 to support DFlash, MTP, and other speculative decoding training approaches, broadening the range of draft model architectures TorchSpec can train
- [ ] **LK Loss** ([PR #29](https://github.com/torchspec-project/TorchSpec/pull/29)): add LK^alpha and LK^lambda losses for direct acceptance rate optimization, improving average acceptance length by 3-8% over Forward KL on Eagle3
- [ ] **Context Parallel under DP ranks**: support context parallelism within data-parallel ranks
- [ ] **FlexAttention native FA4 backend** ([Issue #30](https://github.com/torchspec-project/TorchSpec/issues/30)): adopt `BACKEND="FLASH"` in FlexAttention to unify the `flex_attention` and `fa_experimental` code paths, replacing manual CuTeDSL integration with a stable PyTorch API for FA4-level performance on Hopper/Blackwell GPUs

## Inference

- [ ] **TensorRT-LLM integration**: add as an inference backend alongside SGLang and vLLM so users can plug in whichever engine best fits their deployment stack
- [ ] **Inference auto-expansion**: automatically scale inference when more nodes become available
- [ ] **Support chunked-prefill**: Support chunked prefill to allow longer context

## Framework

- [ ] **Placement group node pinning by IP**: allow users to pin inference to specific nodes by IP, with finer granularity for multiple inference engines on the same node
- [ ] **Automatic Mooncake config determination**: derive Mooncake transfer config from batch size and max sampling pool size; auto-compute max sampling pool size as `global_batch_size * delay_deletion_ratio`
- [ ] **Debugging mode**: add a debugging mode for both inference and training sides


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development Roadmap (2026 Q2) #34

TorchSpec Roadmap 2026 Q2

Model Support

Training

Inference

Framework

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Development Roadmap (2026 Q2) #34

Description

TorchSpec Roadmap 2026 Q2

Model Support

Training

Inference

Framework

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions