Skip to content

feat(attn): add switchable flash-attn and flashinfer backends#156

Open
lesj0610 wants to merge 1 commit intoturboderp-org:masterfrom
lesj0610:feat/backend-core-v2
Open

feat(attn): add switchable flash-attn and flashinfer backends#156
lesj0610 wants to merge 1 commit intoturboderp-org:masterfrom
lesj0610:feat/backend-core-v2

Conversation

@lesj0610
Copy link
Copy Markdown
Contributor

@lesj0610 lesj0610 commented Mar 2, 2026

Summary

  • add switchable attention backend policy support (auto, flash_attn, flashinfer, sdpa)
  • resolve the selected backend once in runtime/generator code and use it consistently for prefill/decode paths
  • keep recurrent prefill/checkpoint handling compatible with backend switching
  • limit this PR to backend/core wiring in attn.py, generator.py, job.py, config.py, and transformer.py

Notes

Validation

  • Qwen3.5 abliterated EXL3 load smoke on the refreshed branch
  • short English generation smoke completed successfully

@lesj0610 lesj0610 force-pushed the feat/backend-core-v2 branch from 18bab8b to 97a1e60 Compare March 2, 2026 10:15
@lesj0610 lesj0610 force-pushed the feat/backend-core-v2 branch from 153d6e0 to a9c593d Compare March 11, 2026 15:52
@lesj0610
Copy link
Copy Markdown
Contributor Author

Refreshed onto v0.0.24. This branch now only carries the backend/core runtime wiring needed for switchable attention backends.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant