What's Changed
- Fix #458 by @Binyang2014 in #568
- Fix multinode test failure by @Binyang2014 in #574
- Separate linters from cmake by @chhwang in #587
- Fix relaxedWait() by @chhwang in #594
- NCCL fixes by @chhwang in #592
- Updated Dev Container by @chhwang in #591
- Support CudaIpc connection within a single process by @chhwang in #593
- Fix GpuStreamPool to be aware of the device ID of streams by @chhwang in #590
- update pytest and python API to fix ut failure by @Binyang2014 in #598
- Fixed the local channel test by @chhwang in #597
- Use smart pointer for IB structure by @Binyang2014 in #585
- Update documentation by @chhwang in #576
- Support CUDA 12.9 by @chhwang in #600
- Merge ChannelTrigger with ProxyTrigger by @chhwang in #601
- MNNVL fix by @chhwang in #604
- New DSL implementation by @Binyang2014 in #579
- python doc auto generation by @chhwang in #605
- all2all implementation by @Binyang2014 in #609
- Fix ut by @Binyang2014 in #613
- Create ib mr for per ib transport by @Binyang2014 in #611
- Fix for multi-nodes test by @Binyang2014 in #614
- add torch test by @Binyang2014 in #612
- AlltoAll Test Support by @caiomcbr in #606
- Adding Channel Id Field DSL Port Channel Operations by @caiomcbr in #615
- Fix deadlock in Executor channel setup by @caiomcbr in #616
- Fix NVLS correctness issue by @Binyang2014 in #618
- Fixed cpp linter by @chhwang in #619
- Thread Block Group DSL by @caiomcbr in #621
- Fix memory exchange within a single process by @chhwang in #624
- Fix hang issue in logging submodule by @Binyang2014 in #625
- Integrate MSCCL++ with torch workload by @Binyang2014 in #626
- Add
FifoDeviceHandle::poll()by @chhwang in #630 - Fix Illegal Memory Access in nvls_test for CUDA12.9 by @abhijangda in #631
- Adapt with torch 2.6 by @Binyang2014 in #632
- Fix for safe process teardown by @chhwang in #633
- use unix socket to share fd by @Binyang2014 in #634
- Address teardown issue by @Binyang2014 in #638
- Revise NCCL API implementation by @Binyang2014 in #617
- Support detailed version tracking that captures git repository information by @seagater in #639
- Fix Rocm build issue by @Binyang2014 in #642
- Add 2 Node AllReduce DSL Algorithm by @caiomcbr in #636
- Make ncclReduce/ncclSend/ncclRecv work by @Binyang2014 in #643
- Reduce memory footprint for allreduce8 and allgather6 by @Binyang2014 in #644
- Add MSCCLPP_GIT_COMMIT micro by @Binyang2014 in #640
- Address corner case when generating version file by @Binyang2014 in #641
- Pipeline fix by @Binyang2014 in #645
New Contributors
- @abhijangda made their first contribution in #631
Full Changelog: v0.7.0...v0.8.0