sgl-project / mini-sglang Public

Notifications You must be signed in to change notification settings
Fork 513
Star 3.8k

Code
Issues 8
Pull requests 24
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: sgl-project/mini-sglang

Labels 9 Milestones 0

New pull request New

24 Open 65 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Add request-scoped profiler benchmark flag and compressed trace export

#109 opened Mar 21, 2026 by CrazyDave999

Loading…

Optimize load_weight with per-file batch H2D and zero-copy CPU sharding

#108 opened Mar 20, 2026 by staryxchen

Loading…

[Feture] Add reasoning-parser

#107 opened Mar 19, 2026 by jiahe7ay

Loading…

fix(doc): document -1 sentinel in ForwardInput.write_tuple

#106 opened Mar 18, 2026 by MisakaVan

Loading…

[Fix] Support stream=false in /v1/chat/completions (#51)

#104 opened Mar 13, 2026 by cppez

Loading…

[Fix]Pad the last rank if vocab size is not divisible by tp_size

#100 opened Mar 9, 2026 by cswuyg

Loading…

[Feature] Better estimation policy

#97 opened Mar 8, 2026 by YzXiao101

Loading…

4 of 7 tasks

[Feature] Expert parallelism support for MoE models

#96 opened Mar 6, 2026 by NikitosKh

Loading…

refactor(tests): convert to pytest-style with integration markers

#94 opened Mar 4, 2026 by MisakaVan • Draft

Fix: torch.AcceleratorError: CUDA error: an illegal memory access was encountered

#89 opened Mar 1, 2026 by itechbear

Loading…

[Fix] Fix TP sampler inconsistency bug

#85 opened Feb 26, 2026 by DarkSharpness

Loading…

[Feature] Support hierarchical cache

#82 opened Feb 24, 2026 by DarkSharpness

Loading…

Add graph replay dump tensor tool

#72 opened Jan 30, 2026 by wlc952

Loading…

Adding non-streaming response (stream=False)

#69 opened Jan 20, 2026 by goswamig

Loading…

feat: Add INT8 quantization support

#57 opened Dec 30, 2025 by louiswang524

Loading…

perf: Optimize CUDA graph batch size selection and padding

#56 opened Dec 30, 2025 by louiswang524

Loading…

feat: Implement batch tokenization for improved throughput

#55 opened Dec 30, 2025 by louiswang524

Loading…

[Refactor] Restructure test suite to match source layout and isolate benchmarks

#53 opened Dec 29, 2025 by DhiraPT

Loading…

[Feature] Add MLA configuration and KV cache storage kernel

#42 opened Dec 23, 2025 by DhiraPT

Loading…

[Education] Offline benchmark performance of Qwen3-0.6B on MLX (CPU) and Modal (GPU)

#40 opened Dec 23, 2025 by lamng3

Loading…

[Feature] Implement variable page size support

#33 opened Dec 22, 2025 by DhiraPT

Loading…

docs: align README/features CLI examples with args.py

#29 opened Dec 21, 2025 by Taskrwu

Loading…

[Improvement] Enhance engine error handling and documentation add more logging and doc

#23 opened Dec 20, 2025 by louiswang524

Loading…

Request-scoped Torch profiler via profile flag.

#14 opened Dec 18, 2025 by AdamLouly

Loading…

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!