Release v0.1.0rc1 · ModelEngine-Group/unified-cache-management

Support Features

Prefix Cache
Sparse Attention
Sparse Attention Offload
PD Disaggregation

What's Changed

remove impl by @flesher0813 in #11
adapt vllm v0.9.2 by @flesher0813 in #13
[Doc] Outline of the document by @ygwpz in #15
remove impl test and add uc connector test by @flesher0813 in #14
[Doc] Installation of ucm by @flesher0813 in #17
[Feature] Add DRAM Connector for uc_connector by @harrisonyhq in #18
[doc] add readme and license by @ygwpz in #24
[Feature] Add Dockerfiles by @flesher0813 in #20
[Feature]Nfsstore by @propanone1006 in #23
[doc] change docs outline by @ygwpz in #32
[Feature] Add Cmake build command in setup.py by @harrisonyhq in #34
[fixbug] fix issue#25 issue#31 and issue#33 by @flesher0813 in #30
[Fix][Docs] Make example runnable and add performance data (closes #37 #29 #42) by @harrisonyhq in #41
[Feat] Move kv_block_size to config by @harrisonyhq in #43
[feature][docs]finish nfs store and add docs by @qyh111 in #44
[doc] Add export of device type in installation;[Fix] fix version invalid#45 #46 by @harrisonyhq in #47
add perf data in readme by @ygwpz in #49
[Feat] Merge 0.0.1 back into develop by @flesher0813 in #50
[bugfix] fix issue#26 and issue#36 by @ygwpz in #55
[Doc] Add vllm institution by @flesher0813 in #61
[CI][Fix] update issue and pr template, fix issue #57, cherry-pick main by @flesher0813 in #65
[Doc] update install doc using patch to build from source code by @flesher0813 in #68
[Feat] Merge 0.0.1 back into develop by @ygwpz in #72
[Style] Fix codestyle problems and typo in develop by @harrisonyhq in #75
[Feature] add ucm_sparse v1.0: unified sparse attention algorithm framework by @hek14 in #79
[Fix] Fix cant find cmake error when using pip install -e . by @harrisonyhq in #80
Revert "[Feature] add ucm_sparse v1.0: unified sparse attention algorithm framework " by @ygwpz in #82
[Feature] add Mooncake Store by @propanone1006 in #86
[Fix bug] Simplify docker build and installation.md by @flesher0813 in #87
[BUG]adapt deepseek by @qyh111 in #89
[Feature][P/D] add example for disaggregated prefill by @flesher0813 in #90
[Perf] Pipelined ucmnfsstore by @mag1c-h in #97
Revert "[Feature] add Mooncake Store" by @ygwpz in #98
[Fix bug] fix uc_connector ut and change hash generation method by @hero0307 in #101
[Fix] Fix .so build error by @harrisonyhq in #104
[Fix] Fix ascend compile error by @mag1c-h in #106
[Perf]Modify start_load_kv by @qyh111 in #103
[Fix] Fix duplicate create/commit errors upon preemption by @flesher0813 in #109
[Feat] Adapt for vllm 0.9.1 by @sumingZero in #113
[Feature] [Doc] UCMSparse framework by @hek14 in #112
[fix] remove redundant code and files/rename file names by @NaganooMei in #118
[Fix] Fix spelling issues with PR templates by @propanone1006 in #119
remove load_tasks by @NaganooMei in #121
[bugfix] bugfix in ucmnfsstore by @mag1c-h in #123
[doc]Add config parameter by @UESTC-AHao in #130
[bugfix]fix rank handing in multi-node pp setup by @qyh111 in #129
[Feat]Support UCM Sparse on cuda by @harrisonyhq in #126
[Feature] Add mooncake store by @hufumans in #117
[bugfix]modify mla dump by @zhou-haitao in #128
[feature] non-blocking interfaces are provided to check whether the transmission task is completed by @mag1c-h in #139
[feature] return error if block exists while batch creation. by @mag1c-h in #138
[feature]modify create interface by @hufumans in #145
[Doc] change logo and rearange docs by @flesher0813 in #156
0.0.2 release merge develop by @ygwpz in #158
[doc][feature] change code directory by @ygwpz in #161
[fix] modify patch and workflow by @NaganooMei in #163
[Feat] Support load async by @flesher0813 in #166
[Feat]Support load async and load failure by @flesher0813 in #165
[Feature]refactor ucconnector by @qyh111 in #167
[feature] upload retake codes by @truthstriver in #172
[bugfix]Resolve the issue of the first-round commit failure under dsv2 by @zhou-haitao in #186
[Feat] Add KVComp sparse attention implementation in UCM by @leideng in #182
[perf]prepare offset in advance by @qyh111 in #188
[feature] GSA by @HaoLi980405 in #190
[bugfix]fix pp problem and remove err logs when duplicate create by @qyh111 in #191
[Fix] Fix bug: check task returns -50005 during async load by @sumingZero in #192
[bugfix]gsa fix reslotmapping bug by @HaoLi980405 in #194
[bugfix]gsa fix running reqs exceed 30 bug by @HaoLi980405 in #195
[doc] design doc directory by @ygwpz in #197
[Perf]kv_block_size as well as transferIoSize are calculated rather than configured by @UESTC-AHao in #196
[Feat] add cuda topk and gsa descriptions by @HaoLi980405 in #198
[Fix] Fix workflow image space error in action by @harrisonyhq in #203
[bugfix]roll back dataoffset by @qyh111 in #201
[bugfix] fix whl install gsa error and gsa kpre reslotmapping out of range by @HaoLi980405 in #204
[Fix][Doc] Modify sparse docs by @flesher0813 in #202
[Patch]Update patch by @UESTC-AHao in #205
[Fix] fix workflow space error by @harrisonyhq in #206
[feat] redefine store interface and structure by @mag1c-h in #189
[feat]dev_product to develop by @qyh111 in #208
[Feat] ESA supports asynchronous retrieval and loading. by @wangwenxin0312 in #199
[fix] ktc config by @hek14 in #212
[fix] setup.py by @hek14 in #213
[fix] compile error on ascend by @mag1c-h in #214
[Docs] Add quickstart document; update installation and dockerfile by @harrisonyhq in #210
[Docs]Update the performance results of prefix-cache by @zhou-haitao in #215
[docs] add pc and pd docs by @ygwpz in #216
[fix] worker-side ESA init by @wangwenxin0312 in #219
[feat]Remove the dependency on vllm’s hash method. by @qyh111 in #217
[doc] Add DRAM performance data and modify workflow. by @harrisonyhq in #221
change store from singleton to object by @mag1c-h in #222
fix duplicate create by @qyh111 in #220
[bugfix] index out of range by @zbb200819 in #223
[fix] ESA block hash function by @wangwenxin0312 in #224
[CI] Add e2e test yaml by @harrisonyhq in #230
[CI]fix ci problem by @qyh111 in #227
[Doc] Support readthedocs by @flesher0813 in #226
update store build script by @mag1c-h in #231
[doc] update ESA doc by @hek14 in #232
[CI] use arc runner by @harrisonyhq in #233
[CI] fix ci bug by @harrisonyhq in #234
[CI] temporarily cancel gpu workflow by @harrisonyhq in #235
[Feat]Support kvstar sparse attention algorithm by @saki-daisuki in #193
[fix] change trans log level to debug by @mag1c-h in #236
Modification of documets, improving readability by @ChenyuZhu1 in #239
clean code & format for kvstar by @saki-daisuki in #241
[Feat] Trace replay by @flesher0813 in #176
[Docs] Modify the patch path to adapt current directory structure by @pyxyzc in #240
[Doc] Add figures in the index file of "Prefix Cache" and "PD Disaggregation" by @ChenyuZhu1 in #244
KVStar unified deploy with UCM & SIMD retrieve accelerate by AVX2&NEON by @saki-daisuki in #245
[doc] Edit readme by @ygwpz in #249
[Docs] Add NPU prerequisites and installation links in quickstart doc by @pyxyzc in #248
[Docs]Add the block_size parameter to the startup command by @zhou-haitao in #243
[doc] edit readme by @ygwpz in #251
[opt] update sparse structure by @mag1c-h in #246
modify GSA report by @zbb200819 in #252
[bugfix]move create to update_state & fix commit failed & modify start_position by @qyh111 in #250
[fix] update logo in dark mode & fix spell error in doc by @mag1c-h in #254
[bugfix]fix code style and ci by @qyh111 in #257
[opt] update simd feature and docs by @summer-ai007 in #256
[bugfix] index out of range (#223) cherry-pick from dev_product by @qyh111 in #258
[docs] edit readme by @ygwpz in #259
[docs]edit GSA report by @yxkyong in #261
mv kvstar to ucm/sparse by @mag1c-h in #262
update doc image by @hek14 in #264
[fix] Dev move kvstar to sparse by @summer-ai007 in #263
[bugfix] fix gsa coredump by @HaoLi980405 in #265
Update gsa.md by @HaoLi980405 in #267
Update gsa.md by @yxkyong in #268
[opt] set cuda arch for kvstar by @mag1c-h in #266
[opt] Update kvstar.md by @mag1c-h in #269
[Doc] update document link by @sumingZero in #270
[Misc] add code owners by @mag1c-h in #274
[Docs]Improve the quick_start.md by @maobaolong in #275
[bugfix][#280] MLA layer size calculated wrong by @maobaolong in #281
[Fix] Each request in the decode instance encounters a load failure by @sumingZero in #278
[Misc] add store intf with tensor addr ptr by @mag1c-h in #288
refactor: reusable transport abstraction & optimized NSFStore pipeline by @mag1c-h in #296
[Docs] Modify Readme Contact Us by @flesher0813 in #298
[Fix] Fix gpu_model_runner req_state update error for issue 283 by @flesher0813 in #291
[Feature]v091_patch add commit by @zhou-haitao in #302
[Feat] Adapt Trace Replay to vLLM >= 0.10.2 by @sumingZero in #303
clean code: log print by @Lijiachen1018 in #290
[Feature]: pluggable SpaceLayout to avoid in-place rename for better performance on PCFS by @mag1c-h in #299
[Feat] add batch interface for device ops and implement ScatterGather with CUDA by @mag1c-h in #305
[feat] hotness management for gc by @Lijiachen1018 in #312
Fix Cuda compilation by @wangwenxin0312 in #317
[BugFix]fix mtp in ucm by @NaganooMei in #321
[bugfix] preserve DRAM buffer lifetime to restore inference accuracy by @mag1c-h in #322
[feat] capacity check for nfsstore by @Lijiachen1018 in #315
[Feat] Call scatter gather interface in dramstore by @ChenyuZhu1 in #324
[Feat] Toy proxy now supports PD-mixed round-robin scheduling by @sumingZero in #316
[Fix]Add import checking to trace_replay and fix the issue of unclose… by @hero0307 in #309
[bug fix] fix recycleNum when less than 1 by @Lijiachen1018 in #327
[Feat]Add nfsstore bandwidth testing script by @zhou-haitao in #323
Fix preemption for sparse attention module and add attention sink. by @hek14 in #333
[enhance]optimize kvstar core bind method & delta kvcache swap by @saki-daisuki in #330
[feat] Re-use active block by @Lijiachen1018 in #334
add heke as CODEOWNERS of /docs and /integration by @hek14 in #336
Adapt ESA to support DeepSeek. by @wangwenxin0312 in #335
fix adapt deepseek by @zbb200819 in #339
[bug fix]kvstar delta kvcache block select bugfix by @saki-daisuki in #341
[Patch] Separate patch into different file by feature by @harrisonyhq in #342
[fix] pack whl with so files by @Lijiachen1018 in #343
KvComp-v1 by @Clarence-1103 in #338
[Fix] Revert dram store to python implementation by @harrisonyhq in #346
fix esa & update patch by @wangwenxin0312 in #350
[Fix]correct the error in docs by @hero0307 in #340
[Fix] Dump/load all tensors when use_layerwise=False by @flesher0813 in #351
[Fix] Only mark last req as failed load req by @flesher0813 in #355
[Fix]Added the 'transferIoDirect' option by @zhou-haitao in #352
[Feature]implement multi-level testing framework with pytest by @Potterluo in #313
[Fix] Fix iteration bug for async load task by @flesher0813 in #357
[feat] modify monkey patch for vllm-0.9.2 with cuda by @Lijiachen1018 in #358
[build] fix build v0.1.0rc1 by @Lijiachen1018 in #363

New Contributors

@propanone1006 made their first contribution in #23
@hek14 made their first contribution in #79
@sumingZero made their first contribution in #113
@NaganooMei made their first contribution in #118
@UESTC-AHao made their first contribution in #130
@hufumans made their first contribution in #117
@zhou-haitao made their first contribution in #128
@truthstriver made their first contribution in #172
@leideng made their first contribution in #182
@wangwenxin0312 made their first contribution in #199
@zbb200819 made their first contribution in #223
@saki-daisuki made their first contribution in #193
@ChenyuZhu1 made their first contribution in #239
@pyxyzc made their first contribution in #240
@summer-ai007 made their first contribution in #256
@yxkyong made their first contribution in #261
@maobaolong made their first contribution in #275
@Clarence-1103 made their first contribution in #338
@Potterluo made their first contribution in #313

Full Changelog: https://github.com/ModelEngine-Group/unified-cache-management/commits/v0.1.0rc1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.1.0rc1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Support Features

What's Changed

New Contributors

Contributors

Uh oh!