v0.1.0rc1
Pre-release
Pre-release
·
33 commits
to develop
since this release
Support Features
- Prefix Cache
- Sparse Attention
- Sparse Attention Offload
- PD Disaggregation
What's Changed
- remove impl by @flesher0813 in #11
- adapt vllm v0.9.2 by @flesher0813 in #13
- [Doc] Outline of the document by @ygwpz in #15
- remove impl test and add uc connector test by @flesher0813 in #14
- [Doc] Installation of ucm by @flesher0813 in #17
- [Feature] Add DRAM Connector for uc_connector by @harrisonyhq in #18
- [doc] add readme and license by @ygwpz in #24
- [Feature] Add Dockerfiles by @flesher0813 in #20
- [Feature]Nfsstore by @propanone1006 in #23
- [doc] change docs outline by @ygwpz in #32
- [Feature] Add Cmake build command in setup.py by @harrisonyhq in #34
- [fixbug] fix issue#25 issue#31 and issue#33 by @flesher0813 in #30
- [Fix][Docs] Make example runnable and add performance data (closes #37 #29 #42) by @harrisonyhq in #41
- [Feat] Move kv_block_size to config by @harrisonyhq in #43
- [feature][docs]finish nfs store and add docs by @qyh111 in #44
- [doc] Add export of device type in installation;[Fix] fix version invalid#45 #46 by @harrisonyhq in #47
- add perf data in readme by @ygwpz in #49
- [Feat] Merge 0.0.1 back into develop by @flesher0813 in #50
- [bugfix] fix issue#26 and issue#36 by @ygwpz in #55
- [Doc] Add vllm institution by @flesher0813 in #61
- [CI][Fix] update issue and pr template, fix issue #57, cherry-pick main by @flesher0813 in #65
- [Doc] update install doc using patch to build from source code by @flesher0813 in #68
- [Feat] Merge 0.0.1 back into develop by @ygwpz in #72
- [Style] Fix codestyle problems and typo in develop by @harrisonyhq in #75
- [Feature] add ucm_sparse v1.0: unified sparse attention algorithm framework by @hek14 in #79
- [Fix] Fix cant find cmake error when using pip install -e . by @harrisonyhq in #80
- Revert "[Feature] add ucm_sparse v1.0: unified sparse attention algorithm framework " by @ygwpz in #82
- [Feature] add Mooncake Store by @propanone1006 in #86
- [Fix bug] Simplify docker build and installation.md by @flesher0813 in #87
- [BUG]adapt deepseek by @qyh111 in #89
- [Feature][P/D] add example for disaggregated prefill by @flesher0813 in #90
- [Perf] Pipelined ucmnfsstore by @mag1c-h in #97
- Revert "[Feature] add Mooncake Store" by @ygwpz in #98
- [Fix bug] fix uc_connector ut and change hash generation method by @hero0307 in #101
- [Fix] Fix .so build error by @harrisonyhq in #104
- [Fix] Fix ascend compile error by @mag1c-h in #106
- [Perf]Modify start_load_kv by @qyh111 in #103
- [Fix] Fix duplicate create/commit errors upon preemption by @flesher0813 in #109
- [Feat] Adapt for vllm 0.9.1 by @sumingZero in #113
- [Feature] [Doc] UCMSparse framework by @hek14 in #112
- [fix] remove redundant code and files/rename file names by @NaganooMei in #118
- [Fix] Fix spelling issues with PR templates by @propanone1006 in #119
- remove load_tasks by @NaganooMei in #121
- [bugfix] bugfix in ucmnfsstore by @mag1c-h in #123
- [doc]Add config parameter by @UESTC-AHao in #130
- [bugfix]fix rank handing in multi-node pp setup by @qyh111 in #129
- [Feat]Support UCM Sparse on cuda by @harrisonyhq in #126
- [Feature] Add mooncake store by @hufumans in #117
- [bugfix]modify mla dump by @zhou-haitao in #128
- [feature] non-blocking interfaces are provided to check whether the transmission task is completed by @mag1c-h in #139
- [feature] return error if block exists while batch creation. by @mag1c-h in #138
- [feature]modify create interface by @hufumans in #145
- [Doc] change logo and rearange docs by @flesher0813 in #156
- 0.0.2 release merge develop by @ygwpz in #158
- [doc][feature] change code directory by @ygwpz in #161
- [fix] modify patch and workflow by @NaganooMei in #163
- [Feat] Support load async by @flesher0813 in #166
- [Feat]Support load async and load failure by @flesher0813 in #165
- [Feature]refactor ucconnector by @qyh111 in #167
- [feature] upload retake codes by @truthstriver in #172
- [bugfix]Resolve the issue of the first-round commit failure under dsv2 by @zhou-haitao in #186
- [Feat] Add KVComp sparse attention implementation in UCM by @leideng in #182
- [perf]prepare offset in advance by @qyh111 in #188
- [feature] GSA by @HaoLi980405 in #190
- [bugfix]fix pp problem and remove err logs when duplicate create by @qyh111 in #191
- [Fix] Fix bug: check task returns -50005 during async load by @sumingZero in #192
- [bugfix]gsa fix reslotmapping bug by @HaoLi980405 in #194
- [bugfix]gsa fix running reqs exceed 30 bug by @HaoLi980405 in #195
- [doc] design doc directory by @ygwpz in #197
- [Perf]kv_block_size as well as transferIoSize are calculated rather than configured by @UESTC-AHao in #196
- [Feat] add cuda topk and gsa descriptions by @HaoLi980405 in #198
- [Fix] Fix workflow image space error in action by @harrisonyhq in #203
- [bugfix]roll back dataoffset by @qyh111 in #201
- [bugfix] fix whl install gsa error and gsa kpre reslotmapping out of range by @HaoLi980405 in #204
- [Fix][Doc] Modify sparse docs by @flesher0813 in #202
- [Patch]Update patch by @UESTC-AHao in #205
- [Fix] fix workflow space error by @harrisonyhq in #206
- [feat] redefine store interface and structure by @mag1c-h in #189
- [feat]dev_product to develop by @qyh111 in #208
- [Feat] ESA supports asynchronous retrieval and loading. by @wangwenxin0312 in #199
- [fix] ktc config by @hek14 in #212
- [fix] setup.py by @hek14 in #213
- [fix] compile error on ascend by @mag1c-h in #214
- [Docs] Add quickstart document; update installation and dockerfile by @harrisonyhq in #210
- [Docs]Update the performance results of prefix-cache by @zhou-haitao in #215
- [docs] add pc and pd docs by @ygwpz in #216
- [fix] worker-side ESA init by @wangwenxin0312 in #219
- [feat]Remove the dependency on vllm’s hash method. by @qyh111 in #217
- [doc] Add DRAM performance data and modify workflow. by @harrisonyhq in #221
- change store from singleton to object by @mag1c-h in #222
- fix duplicate create by @qyh111 in #220
- [bugfix] index out of range by @zbb200819 in #223
- [fix] ESA block hash function by @wangwenxin0312 in #224
- [CI] Add e2e test yaml by @harrisonyhq in #230
- [CI]fix ci problem by @qyh111 in #227
- [Doc] Support readthedocs by @flesher0813 in #226
- update store build script by @mag1c-h in #231
- [doc] update ESA doc by @hek14 in #232
- [CI] use arc runner by @harrisonyhq in #233
- [CI] fix ci bug by @harrisonyhq in #234
- [CI] temporarily cancel gpu workflow by @harrisonyhq in #235
- [Feat]Support kvstar sparse attention algorithm by @saki-daisuki in #193
- [fix] change trans log level to debug by @mag1c-h in #236
- Modification of documets, improving readability by @ChenyuZhu1 in #239
- clean code & format for kvstar by @saki-daisuki in #241
- [Feat] Trace replay by @flesher0813 in #176
- [Docs] Modify the patch path to adapt current directory structure by @pyxyzc in #240
- [Doc] Add figures in the index file of "Prefix Cache" and "PD Disaggregation" by @ChenyuZhu1 in #244
- KVStar unified deploy with UCM & SIMD retrieve accelerate by AVX2&NEON by @saki-daisuki in #245
- [doc] Edit readme by @ygwpz in #249
- [Docs] Add NPU prerequisites and installation links in quickstart doc by @pyxyzc in #248
- [Docs]Add the block_size parameter to the startup command by @zhou-haitao in #243
- [doc] edit readme by @ygwpz in #251
- [opt] update sparse structure by @mag1c-h in #246
- modify GSA report by @zbb200819 in #252
- [bugfix]move create to update_state & fix commit failed & modify start_position by @qyh111 in #250
- [fix] update logo in dark mode & fix spell error in doc by @mag1c-h in #254
- [bugfix]fix code style and ci by @qyh111 in #257
- [opt] update simd feature and docs by @summer-ai007 in #256
- [bugfix] index out of range (#223) cherry-pick from dev_product by @qyh111 in #258
- [docs] edit readme by @ygwpz in #259
- [docs]edit GSA report by @yxkyong in #261
- mv kvstar to ucm/sparse by @mag1c-h in #262
- update doc image by @hek14 in #264
- [fix] Dev move kvstar to sparse by @summer-ai007 in #263
- [bugfix] fix gsa coredump by @HaoLi980405 in #265
- Update gsa.md by @HaoLi980405 in #267
- Update gsa.md by @yxkyong in #268
- [opt] set cuda arch for kvstar by @mag1c-h in #266
- [opt] Update kvstar.md by @mag1c-h in #269
- [Doc] update document link by @sumingZero in #270
- [Misc] add code owners by @mag1c-h in #274
- [Docs]Improve the quick_start.md by @maobaolong in #275
- [bugfix][#280] MLA layer size calculated wrong by @maobaolong in #281
- [Fix] Each request in the decode instance encounters a load failure by @sumingZero in #278
- [Misc] add store intf with tensor addr ptr by @mag1c-h in #288
- refactor: reusable transport abstraction & optimized NSFStore pipeline by @mag1c-h in #296
- [Docs] Modify Readme Contact Us by @flesher0813 in #298
- [Fix] Fix gpu_model_runner req_state update error for issue 283 by @flesher0813 in #291
- [Feature]v091_patch add commit by @zhou-haitao in #302
- [Feat] Adapt Trace Replay to vLLM >= 0.10.2 by @sumingZero in #303
- clean code: log print by @Lijiachen1018 in #290
- [Feature]: pluggable SpaceLayout to avoid in-place rename for better performance on PCFS by @mag1c-h in #299
- [Feat] add batch interface for device ops and implement
ScatterGatherwith CUDA by @mag1c-h in #305 - [feat] hotness management for gc by @Lijiachen1018 in #312
- Fix Cuda compilation by @wangwenxin0312 in #317
- [BugFix]fix mtp in ucm by @NaganooMei in #321
- [bugfix] preserve DRAM buffer lifetime to restore inference accuracy by @mag1c-h in #322
- [feat] capacity check for nfsstore by @Lijiachen1018 in #315
- [Feat] Call scatter gather interface in dramstore by @ChenyuZhu1 in #324
- [Feat] Toy proxy now supports PD-mixed round-robin scheduling by @sumingZero in #316
- [Fix]Add import checking to trace_replay and fix the issue of unclose… by @hero0307 in #309
- [bug fix] fix recycleNum when less than 1 by @Lijiachen1018 in #327
- [Feat]Add nfsstore bandwidth testing script by @zhou-haitao in #323
- Fix preemption for sparse attention module and add attention sink. by @hek14 in #333
- [enhance]optimize kvstar core bind method & delta kvcache swap by @saki-daisuki in #330
- [feat] Re-use active block by @Lijiachen1018 in #334
- add heke as CODEOWNERS of /docs and /integration by @hek14 in #336
- Adapt ESA to support DeepSeek. by @wangwenxin0312 in #335
- fix adapt deepseek by @zbb200819 in #339
- [bug fix]kvstar delta kvcache block select bugfix by @saki-daisuki in #341
- [Patch] Separate patch into different file by feature by @harrisonyhq in #342
- [fix] pack whl with so files by @Lijiachen1018 in #343
- KvComp-v1 by @Clarence-1103 in #338
- [Fix] Revert dram store to python implementation by @harrisonyhq in #346
- fix esa & update patch by @wangwenxin0312 in #350
- [Fix]correct the error in docs by @hero0307 in #340
- [Fix] Dump/load all tensors when use_layerwise=False by @flesher0813 in #351
- [Fix] Only mark last req as failed load req by @flesher0813 in #355
- [Fix]Added the 'transferIoDirect' option by @zhou-haitao in #352
- [Feature]implement multi-level testing framework with pytest by @Potterluo in #313
- [Fix] Fix iteration bug for async load task by @flesher0813 in #357
- [feat] modify monkey patch for vllm-0.9.2 with cuda by @Lijiachen1018 in #358
- [build] fix build v0.1.0rc1 by @Lijiachen1018 in #363
New Contributors
- @propanone1006 made their first contribution in #23
- @hek14 made their first contribution in #79
- @sumingZero made their first contribution in #113
- @NaganooMei made their first contribution in #118
- @UESTC-AHao made their first contribution in #130
- @hufumans made their first contribution in #117
- @zhou-haitao made their first contribution in #128
- @truthstriver made their first contribution in #172
- @leideng made their first contribution in #182
- @wangwenxin0312 made their first contribution in #199
- @zbb200819 made their first contribution in #223
- @saki-daisuki made their first contribution in #193
- @ChenyuZhu1 made their first contribution in #239
- @pyxyzc made their first contribution in #240
- @summer-ai007 made their first contribution in #256
- @yxkyong made their first contribution in #261
- @maobaolong made their first contribution in #275
- @Clarence-1103 made their first contribution in #338
- @Potterluo made their first contribution in #313
Full Changelog: https://github.com/ModelEngine-Group/unified-cache-management/commits/v0.1.0rc1