-
Notifications
You must be signed in to change notification settings - Fork 513
Pull requests: sgl-project/mini-sglang
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Add request-scoped profiler benchmark flag and compressed trace export
#109
opened Mar 21, 2026 by
CrazyDave999
Loading…
Optimize load_weight with per-file batch H2D and zero-copy CPU sharding
#108
opened Mar 20, 2026 by
staryxchen
Loading…
fix(doc): document -1 sentinel in ForwardInput.write_tuple
#106
opened Mar 18, 2026 by
MisakaVan
Loading…
[Fix]Pad the last rank if vocab size is not divisible by tp_size
#100
opened Mar 9, 2026 by
cswuyg
Loading…
Fix: torch.AcceleratorError: CUDA error: an illegal memory access was encountered
#89
opened Mar 1, 2026 by
itechbear
Loading…
perf: Optimize CUDA graph batch size selection and padding
#56
opened Dec 30, 2025 by
louiswang524
Loading…
feat: Implement batch tokenization for improved throughput
#55
opened Dec 30, 2025 by
louiswang524
Loading…
[Refactor] Restructure test suite to match source layout and isolate benchmarks
#53
opened Dec 29, 2025 by
DhiraPT
Loading…
[Feature] Add MLA configuration and KV cache storage kernel
#42
opened Dec 23, 2025 by
DhiraPT
Loading…
[Education] Offline benchmark performance of Qwen3-0.6B on MLX (CPU) and Modal (GPU)
#40
opened Dec 23, 2025 by
lamng3
Loading…
[Improvement] Enhance engine error handling and documentation add more logging and doc
#23
opened Dec 20, 2025 by
louiswang524
Loading…
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.