README - Add gpu reset known issue#205
Open
systems-assistant[bot] wants to merge 2 commits intodevelopfrom
Open
README - Add gpu reset known issue#205systems-assistant[bot] wants to merge 2 commits intodevelopfrom
systems-assistant[bot] wants to merge 2 commits intodevelopfrom
Conversation
Change-Id: I4f9ac6ce807d4d670a19ae84fe553eb3a7484d96 Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
…evelop/ROCm_rdc/dgalants_gpu_reset_known_issue
jamessiddeley-amd
pushed a commit
that referenced
this pull request
Aug 8, 2025
Add the rhel-9.5 container to the recurring updates [ROCm/rocprofiler-systems commit: 82d5b8c]
xuchen-amd
pushed a commit
that referenced
this pull request
Aug 10, 2025
* remove HIP_USE_RUNTIME_UNBUNDLER * clang-format * Generic to use comgr * Remove HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION flag * Removes runtime unbundling unused and debug Code * Removes stale functions
jayhawk-commits
pushed a commit
that referenced
this pull request
Aug 18, 2025
* remove HIP_USE_RUNTIME_UNBUNDLER * clang-format * Generic to use comgr * Remove HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION flag * Removes runtime unbundling unused and debug Code * Removes stale functions [ROCm/clr commit: 81238db]
ammallya
pushed a commit
that referenced
this pull request
Nov 17, 2025
The code is changed to handle both original and ACA based ECC counters for backward compatibilities. Signed-off-by: Maisam Arif <Maisam.Arif@amd.com> Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
ammallya
pushed a commit
that referenced
this pull request
Nov 18, 2025
The code is changed to handle both original and ACA based ECC counters for backward compatibilities. Signed-off-by: Maisam Arif <Maisam.Arif@amd.com> Co-authored-by: Maisam Arif <Maisam.Arif@amd.com> [ROCm/amdsmi commit: 9b6e043]
ammallya
pushed a commit
that referenced
this pull request
Nov 21, 2025
The code is changed to handle both original and ACA based ECC counters for backward compatibilities. Signed-off-by: Maisam Arif <Maisam.Arif@amd.com> Co-authored-by: Maisam Arif <Maisam.Arif@amd.com> [ROCm/amdsmi commit: 9b6e043]
ammallya
pushed a commit
that referenced
this pull request
Jan 21, 2026
* Import gda_devel back into develop
Squashed commit of the following:
commit 90761d552392ca1f5261fec2e6a08455b0ebc368
Author: Brandon Potter <BKP@users.noreply.github.com>
Date: Thu Jul 24 14:50:47 2025 -0500
Only issue a single completion per wavefront (#199)
commit 0056a8a4a7465d520b85c5cb6829ab88783e82f4
Author: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
Date: Thu Jul 24 14:12:35 2025 -0400
non-fetching amos are implicit nbi, we do not need the terminal quiet. (#179)
commit 75d1bfe0b0afa5cfd5a7dfae89e9de6f1087e531
Author: Alsop, John <johnathan.alsop@amd.com>
Date: Tue Jul 8 10:25:43 2025 -0700
Relax ibgda synchronization (#191)
* rocshmem mcm: relax ibdga orderings
convert all SEQ_CST orderings in queue_pair to RELAXED except:
-system scope ring_doorbell access: required to flush push buffer
(unless data is uncached - in which case a waitcnt is sufficient)
-agent scope leader thread read in post_qpe_rma: unclear why this
is necessary, but when relaxed, the code breaks. either the waitcnt
or the L1inv associated with agent scope SEQ_CST is needed for
functionality.
* Undo changing atomic_signal_fence from SEQ_CST to RELAXED as this
appears to have no performance advantage and we are not entirely sure is
correct
---------
Co-authored-by: Aurelien Bouteiller <abouteil@amd.com>
commit c42139564afb47db27e9ec87c25ddc4f5c3e5ad2
Author: Edgar Gabriel <edgargabriel@users.noreply.github.com>
Date: Mon Jul 7 13:56:19 2025 -0500
Make gda_devel branch work without MPI library (#188)
* First cut on adding the no-mpi path to gpu_ib
more functions to follow.
add mpi_init_singleton stuff
* make gda compile with no-mpi support
* gda_device without mpi support
* fixes for functional tests
- disable the mpi_init_singleton tests in the unit tests.
There is no point in fixing them on this branch to adjust to the new structure/logic.
- replace MPI_Barrier with rocshmem_barrier_all in tester.cpp
- I missed one Allgather statements in gda_device.cpp, add the non-MPI
version for that call as well
* Update src/gpu_ib/gda_device.cpp
Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
* Update tests/functional_tests/CMakeLists.txt
Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
---------
Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
commit 0506e69cea2e2ef9bd6cab1207e750da1731ffa5
Author: Brandon Potter <BKP@users.noreply.github.com>
Date: Thu Jun 26 19:12:49 2025 -0500
Check for counter load order update in send queue (#178)
commit 5a18841111c96eb9b526f0bd11a853b38f69707e
Author: Avinash Kethineedi <avinash.kethineedi@amd.com>
Date: Thu Jun 26 15:10:44 2025 -0500
Refactor Barrier_all and Sync_all to use default context (GDA) (#175)
- Removed context-specific implementations of barrier_all and sync_all
- Added barrier_all and sync_all to the default context implementation
- Updated functional tests to use the default context for barrier_all and sync_all
commit 4d76d6bfca90aad9ca7b607c2800392ed025a695
Author: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
Date: Tue Jun 24 14:24:48 2025 -0400
Reeneable Release by default (#168)
commit a68208f2b1c64b9db5f5589c44854b37168da557
Author: Brandon Potter <BKP@users.noreply.github.com>
Date: Tue Jun 24 12:20:22 2025 -0500
Fix issues with queue_pair (#167)
* Add amo fetch_add and non_fetch add self tester
* Validate both ways
* Intermediate debug for atomic hang
* Fixes for amo test
* Convert to release build
* Revert SYSTEM to AGENT for scope
* Restore tester arguments
* Make nonfetch amo into blocking call
commit 9085416fa4a51aa66ae3222493409679d0daff29
Author: Aurelien Bouteiller <abouteil@amd.com>
Date: Mon Jun 23 22:30:00 2025 -0400
bugfix: prevent reuse of sqe items before they are ready
commit 0c832b225c4abb8778e4e825fee5032871403557
Author: Edgar Gabriel <edgargabriel@users.noreply.github.com>
Date: Tue Jun 17 09:17:24 2025 -0500
change default compilation mode for gda_devel (#162)
for the moment, switch to Debug builds being the default, since it seems
to be more stable with DeepEp
commit 3b01d1a50f1531cb7f66c19cd61643d7d2742e4c
Author: Yiltan <ytemucin@amd.com>
Date: Thu Jun 12 16:08:32 2025 -0400
Add Broadcom support for gda_devel (#148)
* Added bnxt headers
* Updated bnxt headers to compile with rocSHMEM
* Preliminary BNXT Support
* Update direct verbs to 2025/05/30 drop
* Use umem_reg to create queues
commit 8db6465e27527855627a167ca58beee17895ed65
Author: Andrew Boyer <andrew.boyer@amd.com>
Date: Tue May 20 17:01:39 2025 -0400
gpu_ib ionic: Address review comment (#137)
commit 81512cc10349b1bd4874d5963632bd28d9201a1d
Author: Brandon Potter <BKP@users.noreply.github.com>
Date: Tue May 20 15:57:17 2025 -0500
Check RMA functional test data in GPU kernel (#91) (#132)
Co-authored-by: Yiltan <ytemucin@amd.com>
commit e9fc5914f5f4d9a89af6e417e6f096d8f235884a
Author: Andrew Boyer <andrew.boyer@amd.com>
Date: Tue May 20 16:35:07 2025 -0400
gpu_ib ionic: add gpu_ib provider for ionic (#133)
Port gpu_ib ionic changes from earlier proof-of-concept codebase.
Build with GPUIB_IONIC=1 to enable ionic and disable mlx5.
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Andrew Boyer <andrew.boyer@amd.com>
commit 986d1908fd126df027f9e189517260c3c7dbb48c
Author: Andrew Boyer <andrew.boyer@amd.com>
Date: Fri May 16 09:07:43 2025 -0400
gpu_ib: Cleanups to Mlx5 provider to ease Ionic integration (#129)
Keep both pd_orig and pd_parent.
Add some helpers for lane mask etc.
Add generic defines in a few places.
commit 4926a1067451c37dfec28385e70521e8ee5b693f
Author: Andrew Boyer <andrew.boyer@amd.com>
Date: Thu May 15 14:07:33 2025 -0400
gpu_ib: Fix up putmem_wave() (#128)
Add a thread ID check to GPUIBContext::putmem_wave() so that only one
thread gets through.
Since the context layer checks, the QP layer doesn't need to. Thus
QueuePair::put_nbi() and QueuePair::put_nbi_wave() are the same and
can be combined.
Signed-off-by: Andrew Boyer <andrew.boyer@amd.com>
commit b81b84f63f470a2c8eecd1a9db82415c6ac4b2d7
Author: Edgar Gabriel <edgargabriel@users.noreply.github.com>
Date: Thu May 15 11:41:21 2025 -0500
re-add code to select closest NIC to a GPU (#127)
commit b87e7e84f6845ad18cf8286a84051d59b79218a2
Author: Brandon Potter <BKP@users.noreply.github.com>
Date: Mon May 12 17:09:00 2025 -0500
Fix MPI_Comm bug (#123)
commit 8cb3879047b8e36e23b91aaeb12c4f5563e974df
Author: Avinash Kethineedi <avinash.kethineedi@amd.com>
Date: Fri May 9 13:13:08 2025 -0500
Fix Barrier API implementation and add missing variants (#121)
- Fixed issues in the existing Barrier API
- Allocated sync buffers of team using the symmetric heap
- Added missing thread-level and wavefront-level Barrier APIs
- Updated functional tests to cover all Barrier variants
commit 849f365487e59e264578e3eefaf483cad3233472
Author: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>
Date: Thu May 8 16:58:58 2025 -0400
Missing variable in ibgda branch and use create_ctx to avoid default ctx (#120)
in num_pes and my_pe
commit da710c22b7f4182dca29d062cafb51c42a967356
Author: Brandon Potter <BKP@users.noreply.github.com>
Date: Thu May 8 14:36:46 2025 -0500
Refactor several classes and bugfixes (#115)
* Merge backend connection and network classes
* Use agent scope instead of system scope for counters
* Remove monitor thread
commit 99238b1d92d922ede619469b74e82dd69ae4e3e8
Author: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>
Date: Thu May 8 14:52:52 2025 -0400
Add verification, fix only rank0 runs the test (#114)
commit d7ec7888a9c5f6c571284041d728911bd7d2562d
Author: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>
Date: Thu May 8 10:55:40 2025 -0400
new tester: put to all pes from all lanes concurrently - ibgda (#113)
* Add put to all pes from all lanes concurrently
* This runs on ro 64(8x8) pes, the workload increases with the num_pes so it gets very slow at scale
* Adapt for ibgda branch
commit 51fe737b2ec6606a5337fdf90a57b877899817e5
Author: Avinash Kethineedi <avinash.kethineedi@amd.com>
Date: Wed May 7 18:20:12 2025 -0500
Fix and extend Barrier_All API support (#110)
- Fixed issues in the existing Barrier_All API implementation
- Added missing thread-level and wavefront-level Barrier_All APIs
- Updated functional tests to cover all Barrier_All variants
commit c971b4a27b82447e4a13ca226798a16ad00a7d34
Author: Brandon Potter <BKP@users.noreply.github.com>
Date: Wed May 7 11:45:12 2025 -0500
Serialize entrance into queue pair code by PE (#108)
commit c2d0fbbbf88c07b67616e55d5025f8b960542753
Author: Yiltan <ytemucin@amd.com>
Date: Wed May 7 12:38:58 2025 -0400
Fix ibv_reg_mr when using subcommunicators (#104)
commit ee79ccd01c35d2b54923cc510c8449268846ce73
Author: Edgar Gabriel <edgargabriel@users.noreply.github.com>
Date: Tue May 6 11:10:12 2025 -0500
add code for determining closest NIC to a GPU (#100)
add code for detecting the closest NIC given a GPU device ID.
The code is based on the same functionality in Transferbench, and has
been stripped down to the required functionality in rocSHMEM. (Note,
there is probably more code that could be removed/simplified probably).
There are two interfaces that are of interest:
- int GetClosestNicToGpu(int gpuIndex, char **dev_name): returns the
id of the NIC in the device list as well as the name of the device
(if dev_name is not a nullptr);
- void DisplayTopology(bool outputToCsv): prints out the entire
topology detected on the node. THis does not happen automatically,
but could be integrated in the future with some debugging output
when the user sets an environment variable.
commit e83c3dc9facb8d0b3a6029171ca8b055d4918e5a
Author: Brandon Potter <BKP@users.noreply.github.com>
Date: Tue May 6 11:09:56 2025 -0500
Fix several bugs of gda_devel branch (#103)
* Revert "Use 32-bit counter values"
This reverts commit 65a5b99c67624e221850bc405cfb6d79f754a7d6.
* Call hipMemset after allocation on QueuePair members
* Undo previous relaxations and use SEQ_CST atomics
* Remove placement new on QueuePair creation
* Bugfix on outstanding wqe table off by one
commit 5b022083887808854efbc8cadb463dedd8d59bec
Author: Brandon Potter <BKP@users.noreply.github.com>
Date: Tue May 6 11:09:40 2025 -0500
Remove unused code (#102)
* Remove unused code
* Remove unused connection method
commit dbaee3711f6b9d8bd43bc97d8127869d1e185d05
Author: Brandon Potter <brandon.potter@amd.com>
Date: Fri May 2 15:45:51 2025 -0500
Add AMO support
commit 96d7c3260f9acb85f94b60460fb6ec9645527d69
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 23:13:34 2025 -0500
Change names around
commit 0de5a5a87b6bc3b72cd459c11f952e10d5fe65bc
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 22:34:48 2025 -0500
Remove unused code
commit 30d247ef5b9106f584a348c18fae7c4d2257d2f9
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 22:23:47 2025 -0500
Replace do-while with while
commit 65a5b99c67624e221850bc405cfb6d79f754a7d6
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 22:11:22 2025 -0500
Use 32-bit counter values
commit a65c4c9210cf6450ef70150178d2dfad5d326e43
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 22:08:22 2025 -0500
Relax synchronization
commit 7008f4f73d69bdd7de2aac79d73fe2bdf9dcdab7
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 21:58:42 2025 -0500
Remove unused method
commit 5c720484dafbc354db922b440fe747dbca7ca0c2
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 21:48:53 2025 -0500
Use __shfl for broadcast
commit 77ca7559ff9dcab2da81bc24b11fbc18a32216a0
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 21:40:44 2025 -0500
Relax order
commit f9196d946776c39d461237b21124bb8b7ad7b84e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 21:29:58 2025 -0500
Relax synchronization
commit a6e32c672278a0f4f6bc4ac8d9ba73d555669ce9
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 21:10:27 2025 -0500
Rename sq variables
commit c732a7d51e737b6deaaff037cb73759da3601d14
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 20:37:54 2025 -0500
Rename variables in quiet
commit 1a557219628e5009abb50bf247852abb1b28bc03
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 20:27:59 2025 -0500
Rename quiet counter variables
commit 0023ab69ea8db80d9fdb3e0a99796f723ec6f896
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 20:24:15 2025 -0500
Refactor quiet
commit 2b9a14f58d2b37df5cd5e6f386fe8d91ead8bd21
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 20:05:19 2025 -0500
Replace some lds broadcasts with __shfl
commit e34a9125dd393af58f7a89f454be53123b2e1cdc
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 19:42:47 2025 -0500
Use constant for wavefront size instead of literal
commit e24e9811d204389610db7688667c135c915644b5
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 19:38:42 2025 -0500
Remove debug statements
commit e48a60d32cb12a7b05b6b6394558e4a44468229e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 15:55:59 2025 -0500
Fixed several bugs - stable
commit 82484d5e8ca194b31ca21040cad5747aab2dbdff
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 12:10:29 2025 -0500
Fix bug in post_wqe_rma
commit 4f4897b70c95f4e160a496469ce78303ebca90a0
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 08:22:17 2025 -0500
Use better variable name
commit 13f83532132f028e0669a0baff98ffebd5f6f530
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 08:18:42 2025 -0500
Remove atomics for cqe64 access
commit 9d0dcb3d125ee2e29440b43e57704f0e71b838fd
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 29 22:40:28 2025 -0500
Use volatile on cqe polling
commit 44e75211435055447ad1f4a08ce37c7eebc02e5a
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 29 21:35:15 2025 -0500
Debug synchronization
commit 0abce72b38f7aade806444e1e442a52066941b14
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 28 11:18:25 2025 -0500
Minor changes
commit 2b8c7c12203081e4af2611eca42af1cec32b28d0
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 28 09:39:16 2025 -0500
Implement mt queues
commit c58e6031dc5ac905b64fd0be3e6bd3ea98b0dd24
Merge: d7b33a87 cb69467f
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 28 10:07:48 2025 -0500
Merge branch 'abouteil/gpuib_bare-dlmalloc' into bpotter/gpuib_bare-04_28_25-devel
commit cb69467f46e8974ae0e5a7945f4c7c01ecb53454
Author: Aurelien Bouteiller <abouteil@amd.com>
Date: Mon Apr 28 10:13:08 2025 -0400
dlmalloc: resolve drift with ibgda branch
commit b1eb1f375a58b49bbf2a635191c528dd3c49be0a
Author: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
Date: Wed Apr 9 11:57:07 2025 -0400
Add unit tester for dlmalloc, rework single_heap, pow2bins unit testers accordingly
* add dlmalloc get_used/get_avail, and have all strats allocators also have a get_used
* Rework memallocator unit tests: bin size is per strat, alignment is verified in singleheap
Signed-off-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
commit f8ff728719fa5039cd4280762a37e8a295e0790c
Author: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
Date: Fri Mar 28 14:17:49 2025 -0400
Add dlmalloc_strat allocator strategy Use mspace variant to ease encapsulation Make pow2bins and dlmalloc cmake selectable
Signed-off-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
commit d7b33a870b8d5a43ecdee5712b4d0c7624821d94
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 27 15:56:54 2025 -0500
Use SND DBR offset
commit f8f5094dd87d6b495d2b647a4a40c87862d1d35b
Merge: 397e058f 9ef5fa1e
Author: Brandon Potter <BKP@users.noreply.github.com>
Date: Sun Apr 27 11:09:03 2025 -0500
Merge pull request #74 from ROCm/ytemucin/gpuib_bare-04-25-25
Ytemucin/gpuib bare 04 25 25
commit 9ef5fa1e2f195cd7f0700fa3defd70004fe9acc1
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 16:50:02 2025 -0500
Check default_ctx_ ptr before freeing
commit 2bd8ffd20f890f6c4456fd025a4074b868dfd8ee
Author: Avinash Kethineedi <avinash.kethineedi@amd.com>
Date: Mon Apr 14 09:18:57 2025 -0500
Update backend to use provided MPI communicator during library initialization (#79)
* Update backend to use provided MPI communicator during library initialization, default to `MPI_COMM_WORLD`
* Update `rocshmem_my_pe` and `rocshmem_n_pes` host APIs
- Return values from backend if initialized; otherwise, fallback to MPI_Singleton.
commit 2bba0d133f05db927185eb314108a2608f064e25
Author: Edgar Gabriel <edgargabriel@users.noreply.github.com>
Date: Mon Apr 14 12:02:09 2025 -0500
Revamp the uniqueId code to support subgroups of processes (#80)
* add code for bootstrapping
the bootstrapping code has been extracted from the MSCCLPP library,
which in parts is based on the code from NVIDIA. The code has been
modified to match the specific requirements of the rocSHMEM library.
* add code to use the new uniqueId bootstrapping
* adjust init_attr example
extend the rocshmem_init_attr example to use two disjoint groups
of processe, in order to trigger the new code path.
* add env variable for bootstrap timeout
* Update examples/rocshmem_init_attr_test.cc
Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>
* Update src/rocshmem.cpp
Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>
---------
Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>
commit 4c40fe180f1eabe208baf2d8b79045abc48da6bb
Author: Yiltan <yiltan@amd.com>
Date: Fri Apr 25 11:48:59 2025 -0500
Required changes to compile with deepep
- three missing apis (barriers and fence)
- Enable -fpic
commit 397e058f4decef93045ab015647e247936feb83e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 05:01:05 2025 -0500
Cleanup debug statements
commit f12dc302067a4c72c88e143ca3dc80da4df2a07e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 04:52:21 2025 -0500
Disabler tester and TicketMutex
commit 637ba31aeff8c46460edcaf93bb1c91b96dbee6a
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 04:41:35 2025 -0500
Remove monitor thread
commit 9366976d7082062e6cbd5e6804060caefa93afc7
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 04:32:25 2025 -0500
Revert "Revert "Remove print statements""
This reverts commit fdff1dcf9f1a8ca5ff5f07e8fd7da50097991d15.
commit fe0d4fafe056394b59eb10cb060913241bc26b64
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 04:31:13 2025 -0500
Revert "Revert "Turn off debug""
This reverts commit 11a754c40cc2b07a4f6ef87030532a1ff3fdc02e.
commit d79fbf06ff4ac385c8ecc95e3a623d9847fe928a
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 04:30:33 2025 -0500
Fix THE OTHER bug
commit 11a754c40cc2b07a4f6ef87030532a1ff3fdc02e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 04:02:33 2025 -0500
Revert "Turn off debug"
This reverts commit 0584485ee0b5b0b772a1ecbb8afc167f91e09853.
commit fdff1dcf9f1a8ca5ff5f07e8fd7da50097991d15
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 03:53:16 2025 -0500
Revert "Remove print statements"
This reverts commit 4f6fee0eca48c69f2581e9aca31cad4b67b11201.
commit 0584485ee0b5b0b772a1ecbb8afc167f91e09853
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 22 20:06:32 2025 -0500
Turn off debug
commit 4f6fee0eca48c69f2581e9aca31cad4b67b11201
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 03:46:29 2025 -0500
Remove print statements
commit aef4122cf9d0aaf917337f495993bf024310263c
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 03:41:44 2025 -0500
Fixes THE bug
commit 9fa906740bcb751adc3870a7dca85bdd60cc95d1
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 03:04:24 2025 -0500
Undo tester changes
commit 120d91f739d8e6d167240a70c7a9f8c5a2657f2a
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 02:58:27 2025 -0500
Viola?
commit 024f9c1042237ecc15af4041e17b96d0a0efd4fa
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 02:24:12 2025 -0500
Add debug statments for dest_info
commit 961499146e7d85a82764df51592541c5f0149854
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 02:08:36 2025 -0500
Flip ctx destory
commit b0fc2833a82d3ea7ceaa086f8715a3fc06325c9b
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 01:56:13 2025 -0500
Move ctx out of shared memory
commit bd77f4cc7883175debea18480bebbd5c363ea5a0
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 01:43:32 2025 -0500
Add a second context create
commit c43d26f67a301fecc1d5c4f1039d29ef415aaf5c
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 22 20:04:33 2025 -0500
Simplify CQE checks
commit e1b384a980958410e93ec0c5e79572dad40698f2
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 22 18:09:28 2025 -0500
Use DPRINTF instead of printf
commit dc7b6304076f9b4698fb7b3dee58aa33d1610e97
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 22 13:13:35 2025 -0500
Remove ibv_fork_init
commit 3ae9e9815095acd2403fe1fef189793e73d996d2
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sat Apr 19 21:15:48 2025 -0500
Try to use hipHostMalloc
commit 70e1ff54868b891d899095e693fad116ca427e32
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sat Apr 19 21:05:11 2025 -0500
Use hipHostMalloc instead of default allocator
commit ea35cf47976957b28e896edb39f60f616579eed2
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sat Apr 19 16:05:20 2025 -0500
rkey/lkey debug
commit 5837f148df96a8b71cd23a6a8fba5ab475656cf0
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sat Apr 19 15:32:46 2025 -0500
Convert rkey/lkey back to BE
commit 22a916d565dde29725d562280021ede456144634
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sat Apr 19 15:29:35 2025 -0500
rkey/lkey debug
commit d2995708b818b6fea0d6d2b8649deafaa89faf7e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sat Apr 19 14:49:03 2025 -0500
Add monitor thread
commit b3036fe91a00f81e13a9b6ff8b973e3c5e9a59bc
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sat Apr 19 11:50:31 2025 -0500
Add more debug messages
commit bba391f662c5e0d08483edeb72be4c5e8a09dd47
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sat Apr 19 11:17:53 2025 -0500
Minor changes to debug statements
commit 2d87c1185474089abf89b8f58733f8bea4c73bda
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sat Apr 19 10:41:42 2025 -0500
Allocate network queue pair memory in host memory
commit 8bda2c170e125f325a68212c049a422aefa43c63
Author: Brandon Potter <brandon.potter@amd.com>
Date: Fri Apr 18 01:29:37 2025 -0500
dbrec debugging
commit 74eaae19d999e20f231f40eebeed11211f56891b
Author: Brandon Potter <brandon.potter@amd.com>
Date: Fri Apr 18 01:02:19 2025 -0500
Dump qp debug info
commit 3bea876d7f9f9729dbf143a8457327f9696741c7
Author: Brandon Potter <brandon.potter@amd.com>
Date: Fri Apr 18 00:33:29 2025 -0500
More debug info
commit 058eb7a1f66f5fb410eefdcfe38f710f8c95b3de
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 23:48:16 2025 -0500
Debug information
commit def7da96d8b4d78be628561aa7b692866ce5f56e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 23:20:06 2025 -0500
Change init attr cap
commit 3de749f72e7f302773e15dcc5d528927f9b1cc97
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 23:08:57 2025 -0500
Bugfix on param type
commit bd1c0db5b035d187d4f34c70b660e6bd1600d882
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 23:05:13 2025 -0500
More debug
commit 71472506517da890d635d75fc3e06d182f29aa52
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 22:20:01 2025 -0500
Debug effort
commit ae2cf6aa89818203dc21ad14015e3aca0c89d193
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 14:14:25 2025 -0500
Remove unused functions
commit 483f12cac9df35f8482b369b02d28b9dffb48bba
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 14:03:08 2025 -0500
Remove host-side calls into the qps
commit f77a4f360e25cc4a9eb5d6d8adc906b2a061f1a4
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 13:07:30 2025 -0500
Add device object file
commit 109c2e42889c475c5fe95c91de2af1b5e94ed2bb
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 13:07:16 2025 -0500
Add ticket mutex file
commit 59c36a2e7a13c558858efc9bcaffb960af0f7fb5
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 13:05:59 2025 -0500
Try to protect doorbell with mutex
commit 579601dee2f21cf9e818aec9629d541f9b44a28a
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 12:03:31 2025 -0500
Cleanup doorbell ringing code
commit 82e446dd3d73d496d9479ffde04dfea6bbd30304
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 10:43:03 2025 -0500
more doorbell prints
commit 1d428caa847a82db086cf371f1c2a1b10c2c5c10
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 10:33:01 2025 -0500
Add print statements
commit abdd15872434d9c73d4eac54aacd31473e3ab654
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 10:14:04 2025 -0500
Increase blueflame back to two reg and add prints
commit 8ef245a39fbb8d13b851f99406a10d3a7d6df7cd
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 10:02:52 2025 -0500
Add print statuements
commit 5a8874866c8e6e53601d733d153c858cbc77c417
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 16 20:52:46 2025 -0500
Minor modifications to printf debug
commit 61a02e8cce0a9c9e9832fc8b742c1f7e780a4e66
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 16 20:46:54 2025 -0500
Remove ipc unit tests
commit bdba6adba80fb734350ff89d241c98e4e5c471fa
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 19:33:53 2025 -0500
Add print statements
commit 238c65bc60e94dee32282277af9283a4e04beba3
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 16:25:00 2025 -0500
Remove optional doorbell ringing support
commit 8aae494e048888a793be6ab56c21f0577785624b
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 14:40:51 2025 -0500
Only allocate space for one blueflame register
commit 591b45b553712cdbd7d452e7b2e40386edd659eb
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 14:25:05 2025 -0500
Convert protected members to private
commit 8024ba1e5f5b560523c6f7c9584e9681eb6b36a3
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 13:53:25 2025 -0500
Fixes
commit 5d427906c41bc1936b8a3156da9dbfd28a84ced7
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 12:13:49 2025 -0500
Debug - omit address
commit 110d98b48d5bddd2b6b9898912eed69999b34c46
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 12:05:02 2025 -0500
Uncomment some code
commit de6be1a04290c42803028e5927fc09ead978ed2d
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 11:39:41 2025 -0500
Modify print
commit d6a1d2115c2c24a59699a9b4a8d9d83ef09cf694
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 11:36:37 2025 -0500
Change tester arguments
commit b0ce33992a932638d7e7632b20874e1f6f7fb337
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 11:34:18 2025 -0500
Add prints
commit a3e6111259fcce8fa21bc55050d8e5ec639f8956
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 11:32:05 2025 -0500
Add print statements
commit a6da7c32bb11f33046a6c82f2c607b1698f1486f
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 11:27:47 2025 -0500
Add device-side print
commit 9a4a79a9a00f80b19d8859dc7df83e4062fdf301
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 11:18:04 2025 -0500
Add wqe debug host print
commit 6c8bb7cd7db827bb8f600e772b8938249f217074
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 10:55:13 2025 -0500
Initialize wqe fields without host post call
commit 4f7c7b94a23907205295ef8aa329b1b9576a7308
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 10:30:06 2025 -0500
Remove endian conversion since it's done on host
commit 2e97c16adf59c4bb6d8e63ef246ad07e5c423c04
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 10:27:27 2025 -0500
Set rkey/lkey using backend
commit 5b40fab11d692257a18c0e410aeb0f30415cdc94
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 09:02:51 2025 -0500
bugfix endian
commit bfad8ff80cf079ac7df57a62da0f656d1d84d798
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 08:59:08 2025 -0500
endian conversion
commit 2d2405c4eb4de901f2363639c82041e8a16be803
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 08:38:14 2025 -0500
Enable tester
commit d9a992511993d66c860de9e6ddc506e5a4e87f50
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 08:31:39 2025 -0500
Add in rkey/lkey writes
commit f04aad5cc99fc79b8884795fcadc3b6f16b43af3
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 08:28:15 2025 -0500
Add rkey/lkey check
commit c39f781f343053f83a9c2bf4bbb24d4a1fa13368
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 08:13:29 2025 -0500
Add documentation, psuedocode, and modify
commit f4397c0451d8c9d2e0e8184adb904db06dfb02aa
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 21:32:20 2025 -0500
Finish removing fence
commit 93144311c33ed86f77bb903e2340690ce38f2271
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 21:20:23 2025 -0500
Remove fence
commit 149ad98c6e55fa1f7ce44d6d58eb404a0862bd84
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 20:53:42 2025 -0500
Style change
commit 5608d393d52b79c3b0c2ebeb3d5aefb00732d563
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 20:50:51 2025 -0500
Remove comments
commit 419fc03139e29cb0de86336448d2aed1425041df
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 20:42:45 2025 -0500
Straight line code
commit eec14a54211f9da876d34fc368c58d3dff3d9032
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 18:01:42 2025 -0500
Remove singlethreadpolicy
commit a6f7023fd13c14af1c1870990084471a54567b17
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 17:49:00 2025 -0500
Minor fixes
commit ed51cca1d5f3f195c0ea98a8162a9b2aac6dbbb8
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 14:49:32 2025 -0500
Style changes for backend
commit aa1928382a8841c158f3be3715f0a806657f742b
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 14:28:05 2025 -0500
Minor fixes
commit 039b3b6a168887190be3a3bbdd3a3d65920a1e2e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 14:18:19 2025 -0500
Remove inlining mechanism
commit 68c5dc8f6522a40da025b8163424c0e48911c5b9
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 13:59:43 2025 -0500
Remove unused header file
commit 377a1fc3a6dda454a1c4efd288e8efaf3226e205
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 13:54:31 2025 -0500
Fix comment and variable name
commit ef32def751f7c4c3aaf2e43b38ecc220568f9e3e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 13:31:05 2025 -0500
Encapsulate members in queue_pair
commit 079c9b337907d331095156da6468007b43d77d85
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 13:19:17 2025 -0500
Style change
commit d5ea67eb8e949003c2f7f2dcc737f7b8e3e34df2
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 13:16:06 2025 -0500
Cleanup for queue_pair class
commit 3bed59f7dd8dbab2716ae5d67368f673924c9f63
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 12:53:33 2025 -0500
Add documentation for segments
commit f25d2581db56e2a3495d64fb889a1ed5fb099069
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 12:10:35 2025 -0500
Remove unused struct
commit 1939224c0777d65512d8b255ec2f58afbd270910
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 12:07:33 2025 -0500
Remove method
commit 7ef084ebcaabad6d5ace155b5642e7720cd7b90d
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 12:04:16 2025 -0500
Remove unused variable
commit 3f7f356d499b1628abc31b13cea5305a9aecf1de
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 11:52:19 2025 -0500
Cleanup files
commit e9ee4bf908a83a89d0176fad034b53b27108e33a
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 11:35:56 2025 -0500
Style changes for queue_pair and segment_builder
commit b9a697901ada72c4b00b96c3b3df99301de4fce7
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 11:16:14 2025 -0500
Remove weird + 1 offset
commit 0ea6f3438f31b19b1cfba1dd17ceeb736b5c90d5
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 11:13:49 2025 -0500
Rename sq fields
commit f8d7ca9bf0fe22f1108e0b938ac3e4a6d1e0ac87
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 10:55:31 2025 -0500
Remove unused headers
commit 840eb360a2323687aba187a302136d02b2a7bf2d
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 10:49:30 2025 -0500
Cleanup gpu_ib context files
commit 3349683a72eebaca70880058547e11ac6e9a21d6
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 23:03:50 2025 -0500
Continue document MLX structures
commit 000b54b8dd123c2e9913467b51e954700ae328ea
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 22:46:54 2025 -0500
Document gpu queue-pair MLX structures
commit 3151fd34c06b35fbf93f9a707924119c472a7b51
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 22:08:21 2025 -0500
Bugfix for host RDMA_WRITE WQEs
commit 441fa32e5031a3eab71468b76c94ad3de4456ebb
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 22:04:16 2025 -0500
Add host-side initial RDMA_WRITE WQEs back
commit 1afe02368fb862f5b8ceb65c1bd9e504d20f3a74
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 21:46:16 2025 -0500
Try to remove host-side post_wqe
commit 3bb4b5527e8b17a4a66ab2544031ee2a73f36bbd
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 21:39:40 2025 -0500
Always allocate queues in gpu memory
commit 4aca3989ec244012f32dc401c53eb5f5263cf04e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 21:16:14 2025 -0500
Bugfix for connection class
commit ceb0ebeb387f71a622ac10c0f276725cfe654d43
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 21:07:45 2025 -0500
Refactor connection class
commit 8b31d8927c021ed07d2766b9d05d155162df8fd3
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 20:53:28 2025 -0500
Refactor some files
commit 2998e9bf0565bf29ef39306445990034b30389a8
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 20:18:05 2025 -0500
Update connection class
commit 519e9f7538160f0d2246c189a635528c39841475
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 19:12:06 2025 -0500
Cleanup connection and network classes
commit c49d2e7097218cd6a7ef8ce565a8811a4b6061cb
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 18:50:50 2025 -0500
Remove unused member
commit a2e2bd020b821176923ac62ca5f57645a8113f93
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 17:54:29 2025 -0500
Add uncached heap option
commit 629099694ed8d3a6c7480a6c342086676d6a8a9f
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 17:05:53 2025 -0500
Device mem for cq/sq queues
commit 768a2211f36092b5fb869cb491dc24b4d65a2991
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 16:49:40 2025 -0500
Change heap allocation policies
commit 6afa39dbf092dcfede2f5d39ae1d4ebe2e350fdc
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 16:15:36 2025 -0500
Remove compile options and cleanup
commit 40a5c52cedd993d4d47196916f2a551555f1e007
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 15:24:29 2025 -0500
Cleanup coalescer files
commit 37ea4b4332f2efd14b789a77259030bbe68ec77f
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 15:20:26 2025 -0500
Cleaup files
commit 9742294b1356eed3978af8a9a7c58a717df2feb2
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 15:11:17 2025 -0500
Cleanup rocshmemgpu and team files
commit a9c11ce854be01399ebcbc7eb510ea6cab984c4e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 14:56:34 2025 -0500
Cleanup gpu ib team files
commit cca52a6656b8d7dece78556c60b80deb55e4cccc
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 14:38:53 2025 -0500
Add inline and cleanup
commit 87cde8bfa470d0f90cf9f32af1014437cd9d29f7
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 14:25:27 2025 -0500
Cleaup file
commit d7d619b6d647a84b9fa0b2e90ffdeef32d7c3c04
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 13:50:20 2025 -0500
Cleanup host files
commit 8edc427a9185b4b0c9abe5bfb8a33cc5d597faa1
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 13:18:50 2025 -0500
Minor style changes to context_device
commit 66c8eb38303005628404641371f92e19060fcfbb
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 13:03:30 2025 -0500
Remove unused constants
commit 2ccb34ee6046790ba1b0266deb91a0ede2bc17b0
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 12:57:57 2025 -0500
Remove unnecessary init functions
commit b9a0ac2b8416b858124079f7e9a3cf3ad251bd4c
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 12:45:48 2025 -0500
Remove manage memory stubs
commit 6143152a019e3ad0491233e5554ce4c4422b7e91
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 12:34:55 2025 -0500
Remove comment
commit f56a95acf6aa58ffb21fddacc7ccf7fd570c6d30
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 12:23:41 2025 -0500
Remove unused ThreadImpl types
commit e9bb49011f16364d0c3f8644733792768a4bc946
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 12:03:34 2025 -0500
Move constant into different file
commit af0340d5d299205b9f9f1b285a74cdad33e929e7
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 11:53:50 2025 -0500
Remove g_ret mechanisms
commit 72d86f16e8d1df6a6d7fa5adba7b74e0b276c488
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 11:36:42 2025 -0500
Remove unused externSharedBytes method
commit c4f6e08c63dcef653b00459a6555a8552fbe616d
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 11:25:52 2025 -0500
Remove unused variables
commit 4cde4a4cc2300dd5733a10dca98ca0ea860c737f
Author: Brandon Potter <brandon.potter@amd.com>
Date: Fri Apr 11 11:15:24 2025 -0500
Tear out internal references to removed atomics
commit d335eac148b5bc3e4d828b530b16d1498e2999d1
Author: Brandon Potter <brandon.potter@amd.com>
Date: Fri Apr 11 10:57:54 2025 -0500
Remove unused atomic types
commit 0089c11fb0bbb538f70e1c95637dc77b7d5687cf
Merge: 5b265666 b8dc6a2e
Author: Brandon Potter <BKP@users.noreply.github.com>
Date: Fri Apr 11 10:18:00 2025 -0500
Merge pull request #71 from Yiltan/yiltan-cleanup-april-11
Yiltan cleanup april 11
commit b8dc6a2edf380c538659724f693f635d9959f049
Author: Yiltan <yiltan@amd.com>
Date: Fri Apr 11 10:09:02 2025 -0500
removed unused collevtive buffers
commit 4e459483c2402712200237fb6cf32709d28b2a70
Author: Yiltan <yiltan@amd.com>
Date: Fri Apr 11 10:02:05 2025 -0500
removed USE_SINGLE_NODE
commit cf262a984e91486dd4324af2b7130052fe081966
Author: Yiltan <yiltan@amd.com>
Date: Fri Apr 11 10:00:38 2025 -0500
removed network impl off
commit 4e6188d2875cd9c19ed2ef346e87e011a29c020b
Author: Yiltan <yiltan@amd.com>
Date: Fri Apr 11 08:42:38 2025 -0500
removed reliable connection into connection
commit 274f68f0d165486f1734945c24a8d054a926d3c4
Author: Yiltan <yiltan@amd.com>
Date: Fri Apr 11 08:37:34 2025 -0500
remove rocshmem_calc.hpp
commit 97c7ad44b205aa8fa7ae1127d1b0e8d60803c977
Author: Yiltan <yiltan@amd.com>
Date: Fri Apr 11 08:33:17 2025 -0500
Removed more unused files
commit 5b265666f561db70508094c472af3f655551d9dd
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 16:58:20 2025 -0500
Remove straggler wait_until variants
commit 13bdb0c58e5256f7ee36dc99453649b083e649df
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 16:51:14 2025 -0500
Remove get variants
commit 6f7ac10561a325059b395f9ed7a903ae468eda69
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 14:42:32 2025 -0500
Remove unnecessary interfaces
commit df386a2c9e27bb34b0aaf890b355f31ca396b438
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 12:12:43 2025 -0500
Tear out SYNC, WG_RMA, related functional tests
commit af6dcfdcb0ae88ccebbacfc4a3a2f2a0a1eebb49
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 11:13:55 2025 -0500
Tear out signal ops from include and dependencies
commit fc78420fe44b0be6667aa359dc8eec9f9ed6c306
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 10:46:37 2025 -0500
Remove debug header
commit 583edb968262bb14b4a72c62324760051016c0b6
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 10:42:21 2025 -0500
Tear out collectives from include and dependencies
commit a47c80e4e2c1a12315598146b5a0f8f4133e78f3
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 08:46:57 2025 -0500
Remove empty RC functions
commit 47e5387a42eec1b31fa152832cef0062311875ae
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 08:36:03 2025 -0500
Remove qe dumper and debug
commit 10baf8d2471a0d0ac4de141ba11ca96a9e7d6a4c
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 08:32:06 2025 -0500
Remove helper_macros header since dependencies removed
commit 5a13783e7e3519444723308710452a25edb1e98f
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 08:28:41 2025 -0500
Remove dev_mono_linear strategy
commit b5666de736ff8d274eff0eef038b761383cc9419
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 08:20:40 2025 -0500
Remove container strategies
commit 6a3adf5ca423fddf3e38f6a2746690054b706841
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 08:15:05 2025 -0500
Remove bitwise gtest and matrix container
commit 8346c73a6ac79b35a0372ac19222a81d9f287bd9
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 08:10:18 2025 -0500
Remove array container
commit 7bb8e7efa0d35727f911bd9eccb4eb0b37dc9d59
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 23:46:33 2025 -0500
Remove DC transport files
commit 9f9de2338eff67d7984a8c8bf197e83bdc088bc4
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 22:37:33 2025 -0500
Remove relative pathing for includes
commit 92fe8e5bd94a788785aa61e1aec5d023f451baf4
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 22:07:21 2025 -0500
Remove todo notes
commit 7202d143147cea0240b80f89b95fb1e7a16401b1
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 21:58:23 2025 -0500
Remove extra line
commit 615236311a6b54a4fae13e19f0ea7b9d4796c5d4
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 21:49:29 2025 -0500
Merge backend classes
commit 7e9a7b1c46be201d8822b284b1f049d4c9737b2e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 14:19:27 2025 -0500
Remove USE_RO and USE_IPC conditions
commit adebaa285f29a8fdefe446b859e6df1957f81ae5
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 13:52:36 2025 -0500
Tear out IPC call points
commit a9f912bbca0fc71efa3381980ae3da789f79d637
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 10:28:48 2025 -0500
Tear out hdp_policy
commit b8de7035a738076a3c8e7c05f307505834c14e60
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 10:19:29 2025 -0500
Convert backend_type to GPUIB only
commit 582cfcde612f86c3987cd411c29e4992e0125847
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 10:06:24 2025 -0500
Tear out IPC conduit
commit dcab3a2b268d1724a0649f82204d73884c3160f0
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 10:01:55 2025 -0500
Tear out RO conduit
commit 7226902bc4a50a6223c665af2e0607607af77a6b
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 09:57:25 2025 -0500
Tear out atomic and notifier files
commit 4295c43867294887b12711848a36b7d98e272066
Merge: d7809b3b 99942d91
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 09:13:31 2025 -0500
Merge branch 'gpu-ib-working-draft' into bpotter/gpuib_mini
commit 99942d919baedb27cf7d33962a35667f749d02ee
Merge: 42fa4e9f dc61fb61
Author: Yiltan <ytemucin@amd.com>
Date: Wed Apr 9 10:09:19 2025 -0400
Merge pull request #67 from Yiltan/gpu-ib-working-draft
Removed HDP code and error checking to ibv_* functions
commit d7809b3b5f86a6d5fdf704af36628be80a8e68ab
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 08:40:46 2025 -0500
Remove unused wrapper class
commit e12c08e6fa5caf08a2ce5ec1a1be84e4b3bf8dcc
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 00:05:57 2025 -0500
Remove unused EBO spinlock
commit 483ec9dc43195a9322f630f3e164359d95d9d3bc
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 8 23:56:15 2025 -0500
Remove slab heap
commit d0c0991d42eec0756c9e6778a315b1a94a467744
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 8 23:46:33 2025 -0500
Remove unused unit test for ipc
commit 4f1661199c911924f9b1d3349231ff11769476c8
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 8 23:44:45 2025 -0500
Fix store_asm function and util memcpy funcs
commit dd72a4f4e28f4c2f0d99910a88e8457fa6564f7e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 8 23:05:42 2025 -0500
Replace wallClk code with hip function
commit 89e19320818902d4d7062f47afd7d39dcb8686c7
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 8 22:44:11 2025 -0500
Remove unused __read_clock function
commit a3b67d91ef02bcfbe626f8f6fb8ab9ab1bca610d
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 8 22:18:13 2025 -0500
Remove unused forward_list
commit f6767d8a48afc7f0a90193bfa58aa0dbf0c40d16
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 8 22:01:15 2025 -0500
Disable verification of functional tests
commit 6158e946e29989f22d7fdc922e41c5402f063617
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 8 22:00:52 2025 -0500
Increase functional test loop size to 200
commit dc61fb6102d30af41bc0dae6d9eb7ba40bc75394
Author: Yiltan <yiltan@amd.com>
Date: Mon Apr 7 12:55:32 2025 -0500
Added error checking to verbs functions
commit 8df72764da0447ecc91d9eb59fe01c3cd66b7f46
Author: Yiltan <yiltan@amd.com>
Date: Mon Apr 7 10:21:06 2025 -0500
removed unused file
commit f6beb9ef97e164b38645301a4dec075410eb41ee
Author: Yiltan <yiltan@amd.com>
Date: Mon Apr 7 10:19:07 2025 -0500
removed hdp comments
commit bf032889e27bb1611dfd07d0d69c5dbe45a93ccd
Author: Yiltan <yiltan@amd.com>
Date: Mon Apr 7 09:54:46 2025 -0500
fixed dc
commit 42fa4e9fe2137e595ba47e02d271887cc6a8dd28
Author: Yiltan <yiltan@amd.com>
Date: Thu Apr 3 14:17:54 2025 -0500
cant lock cq if on device mem
commit 27dd09ca5f92538f505223840b34c4fef70b812e
Author: Yiltan <yiltan@amd.com>
Date: Thu Apr 3 08:06:37 2025 -0500
null ptr
commit f5f0efe88869faa2e3549c11fd6082553a548ed9
Author: Yiltan <yiltan@amd.com>
Date: Wed Apr 2 15:25:44 2025 -0500
comment out hdp
commit 0b631d593cad7ed7ca58716d5678ee41dc0c43c2
Author: Yiltan <yiltan@amd.com>
Date: Wed Apr 2 12:50:00 2025 -0500
GPU_IB Compiles
commit 9ba9b1fb6299491b54dd9a328df4702931947a05
Author: Yiltan <yiltan@amd.com>
Date: Wed Apr 2 10:07:04 2025 -0500
Add GPU IB back
* Revert "Only issue a single completion per wavefront (#199)" (#205)
This reverts commit 90761d552392ca1f5261fec2e6a08455b0ebc368.
(cherry picked from commit 99b4c93e1f8c9177bf1c236b86732c1209847519)
* GDA Cmake modifications, move topology to gpu_ib specific folder
* Do not use ../thing.h
* Use WF_SIZE: AMDGCN_WAVEFRONT_SIZE is deprecated
* 2-way merge between context_ipc and context_gpuib
* Select MTU based on network config (#214)
* rocSHMEM GDA BNXT POC (#213)
* rocSHMEM GDA PoC for Thor 2 (233.2.76.0)
(cherry picked from commit d0d5c51528e362858f5dc38a46d8214ac519b044)
* Rename gpu_ib to gda
* Renaming part2: includes and cmakery
* Fix DISPATCH macro; use backend_comm when needed; some GPUDevices where
left
* Consolidate GDA_CHECK_NNULL/CHECK_ZERO/CHECK_HIP to look and feel
similar
* Update copyrights to the new style
* Rework default-ctx init, missing heap init, missing qpe field
* backend_gda: single init, use systematic naming for setup/cleanup,
prefix team structures,
* setup_wrk_psync must precede setup_teams etc
* silence recasting error
* Some remnants of GDADevice and missing friend classes, public some
fields, it compiles
* Fix redefinitaion of CHECK_HIP in functional testers, we still have a
duplicate definition that would probably be better having only one
* typo in backend_type
* Undo unneeded change to functional test driver
* Add -lnuma
* ctx must be initialized after qps
* gda: Disable non-functional tests (#216)
* Do not try to run functional tests that are not implemented
* Revert "Increase functional test loop size to 200"
This reverts commit 6158e946e29989f22d7fdc922e41c5402f063617.
* Make a specific test case for gda
* Disabled further tests that do not currently pass with explanation as to
why disabled
(cherry picked from commit 27c5c6ff09f259e2b59fbe5934b88751ba47cbfc)
* gda_devel: teams with MPI initalization (#229)
* Fix missing communicator initialization
* Reenable team functional testers
---------
Co-authored-by: Edgar Gabriel <Edgar.Gabriel@amd.com>
(cherry picked from commit 8d700a986f5e64e40b45babddc8e84d8d8028dea)
* [GDA] Query for the correct GID index (#215)
* Added GID query code for CX7/Thor2 NICs
(cherry picked from commit 2bc0d6c719a0f43955cef1bbcec77261ae797e54)
* Reorder code to make ipc and gda more similar
* Do not double free Wrk_Sync, uniform styling with ipc
* Remove unused includes
* Abort when using not-implemented device functions
* BNXT Compiles
* Silence compiler warnings
* Cleanup unused .h
* Uniform indentation between ipc and gda
* gda: add cleanups, address todos
* Disable pingpong tests, enable defaultctxtest
* Reenable testing non-fetching amos
* build scripts: use a single script backed for all gda variants
enable configuring INSTALL_PREFIX and BUILD_TYPE from the command line
same order in all scripts
* fix: prevent double free in `GDADefaultContextProxy` with custom move assignment
* The default move assignment, invoked during initialization of
`default_context_proxy_`, caused the default context’s QPs to be freed
prematurely because the destructor is triggered by the xrvalue after
initialization.
* Undo changes to the amo standard tester during gda_devel dbaee371, as
they cause RO failures
---------
Co-authored-by: Yiltan <ytemucin@amd.com>
Co-authored-by: Edgar Gabriel <Edgar.Gabriel@amd.com>
Co-authored-by: Yiltan <yiltan@amd.com>
Co-authored-by: avinashkethineedi <Avinash.kethineedi@amd.com>
Co-authored-by: bpotter <brandon.potter@amd.com>
ammallya
pushed a commit
that referenced
this pull request
Jan 21, 2026
* Import gda_devel back into develop
Squashed commit of the following:
commit d9e2fed2f7e55d266c7dfcacc4641b92a3b008ed
Author: Brandon Potter <BKP@users.noreply.github.com>
Date: Thu Jul 24 14:50:47 2025 -0500
Only issue a single completion per wavefront (#199)
commit 6b6e41ef3c955d914c83cc77cecbf8c4ec6a363e
Author: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
Date: Thu Jul 24 14:12:35 2025 -0400
non-fetching amos are implicit nbi, we do not need the terminal quiet. (#179)
commit 78feb0e15ba864b6bfd1b4ae3365e0312d7170c5
Author: Alsop, John <johnathan.alsop@amd.com>
Date: Tue Jul 8 10:25:43 2025 -0700
Relax ibgda synchronization (#191)
* rocshmem mcm: relax ibdga orderings
convert all SEQ_CST orderings in queue_pair to RELAXED except:
-system scope ring_doorbell access: required to flush push buffer
(unless data is uncached - in which case a waitcnt is sufficient)
-agent scope leader thread read in post_qpe_rma: unclear why this
is necessary, but when relaxed, the code breaks. either the waitcnt
or the L1inv associated with agent scope SEQ_CST is needed for
functionality.
* Undo changing atomic_signal_fence from SEQ_CST to RELAXED as this
appears to have no performance advantage and we are not entirely sure is
correct
---------
Co-authored-by: Aurelien Bouteiller <abouteil@amd.com>
commit 9eb45465775f5f00140788c65adceeabf83d4268
Author: Edgar Gabriel <edgargabriel@users.noreply.github.com>
Date: Mon Jul 7 13:56:19 2025 -0500
Make gda_devel branch work without MPI library (#188)
* First cut on adding the no-mpi path to gpu_ib
more functions to follow.
add mpi_init_singleton stuff
* make gda compile with no-mpi support
* gda_device without mpi support
* fixes for functional tests
- disable the mpi_init_singleton tests in the unit tests.
There is no point in fixing them on this branch to adjust to the new structure/logic.
- replace MPI_Barrier with rocshmem_barrier_all in tester.cpp
- I missed one Allgather statements in gda_device.cpp, add the non-MPI
version for that call as well
* Update src/gpu_ib/gda_device.cpp
Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
* Update tests/functional_tests/CMakeLists.txt
Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
---------
Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
commit 3766e4293c070efde091b9d1675aeef3cccdf701
Author: Brandon Potter <BKP@users.noreply.github.com>
Date: Thu Jun 26 19:12:49 2025 -0500
Check for counter load order update in send queue (#178)
commit 255c240b2d001cea13a3c8c77cc0a049dd598631
Author: Avinash Kethineedi <avinash.kethineedi@amd.com>
Date: Thu Jun 26 15:10:44 2025 -0500
Refactor Barrier_all and Sync_all to use default context (GDA) (#175)
- Removed context-specific implementations of barrier_all and sync_all
- Added barrier_all and sync_all to the default context implementation
- Updated functional tests to use the default context for barrier_all and sync_all
commit 1c5d004eb56f420ede1cc7cbf563c618a2d6c5d8
Author: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
Date: Tue Jun 24 14:24:48 2025 -0400
Reeneable Release by default (#168)
commit c7b90bc78a605da418912b51af339fb3747c3b74
Author: Brandon Potter <BKP@users.noreply.github.com>
Date: Tue Jun 24 12:20:22 2025 -0500
Fix issues with queue_pair (#167)
* Add amo fetch_add and non_fetch add self tester
* Validate both ways
* Intermediate debug for atomic hang
* Fixes for amo test
* Convert to release build
* Revert SYSTEM to AGENT for scope
* Restore tester arguments
* Make nonfetch amo into blocking call
commit caec3441855135510a4747b64d5d8ebc88a8eea0
Author: Aurelien Bouteiller <abouteil@amd.com>
Date: Mon Jun 23 22:30:00 2025 -0400
bugfix: prevent reuse of sqe items before they are ready
commit b5c474b7573029b84a7aeee417fc8fbe9402f227
Author: Edgar Gabriel <edgargabriel@users.noreply.github.com>
Date: Tue Jun 17 09:17:24 2025 -0500
change default compilation mode for gda_devel (#162)
for the moment, switch to Debug builds being the default, since it seems
to be more stable with DeepEp
commit 2d771c8f335ffb552589f6f0b3cd60275c87506d
Author: Yiltan <ytemucin@amd.com>
Date: Thu Jun 12 16:08:32 2025 -0400
Add Broadcom support for gda_devel (#148)
* Added bnxt headers
* Updated bnxt headers to compile with rocSHMEM
* Preliminary BNXT Support
* Update direct verbs to 2025/05/30 drop
* Use umem_reg to create queues
commit 51cf8ee72bf18550947b9bde0926fc5f68900f46
Author: Andrew Boyer <andrew.boyer@amd.com>
Date: Tue May 20 17:01:39 2025 -0400
gpu_ib ionic: Address review comment (#137)
commit 822541e7f7ed56857185779d91a62f4fac362fbd
Author: Brandon Potter <BKP@users.noreply.github.com>
Date: Tue May 20 15:57:17 2025 -0500
Check RMA functional test data in GPU kernel (#91) (#132)
Co-authored-by: Yiltan <ytemucin@amd.com>
commit 5dc74b6fa605f7703b22cbf7035196bbd6ab306a
Author: Andrew Boyer <andrew.boyer@amd.com>
Date: Tue May 20 16:35:07 2025 -0400
gpu_ib ionic: add gpu_ib provider for ionic (#133)
Port gpu_ib ionic changes from earlier proof-of-concept codebase.
Build with GPUIB_IONIC=1 to enable ionic and disable mlx5.
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Andrew Boyer <andrew.boyer@amd.com>
commit f64fc480e2d960bf7a88c80b1896d6980b0e9fc1
Author: Andrew Boyer <andrew.boyer@amd.com>
Date: Fri May 16 09:07:43 2025 -0400
gpu_ib: Cleanups to Mlx5 provider to ease Ionic integration (#129)
Keep both pd_orig and pd_parent.
Add some helpers for lane mask etc.
Add generic defines in a few places.
commit d546e43c71120544366f2fa4496ca1ee32a1ede4
Author: Andrew Boyer <andrew.boyer@amd.com>
Date: Thu May 15 14:07:33 2025 -0400
gpu_ib: Fix up putmem_wave() (#128)
Add a thread ID check to GPUIBContext::putmem_wave() so that only one
thread gets through.
Since the context layer checks, the QP layer doesn't need to. Thus
QueuePair::put_nbi() and QueuePair::put_nbi_wave() are the same and
can be combined.
Signed-off-by: Andrew Boyer <andrew.boyer@amd.com>
commit cf6231593a5b6d9370c605bc0f63a8806baf73bc
Author: Edgar Gabriel <edgargabriel@users.noreply.github.com>
Date: Thu May 15 11:41:21 2025 -0500
re-add code to select closest NIC to a GPU (#127)
commit e7f3911f173f42caf48d05d4ec41f69a1e4569fc
Author: Brandon Potter <BKP@users.noreply.github.com>
Date: Mon May 12 17:09:00 2025 -0500
Fix MPI_Comm bug (#123)
commit 866d52768b1131d9ba2b85c537e0a425039189a1
Author: Avinash Kethineedi <avinash.kethineedi@amd.com>
Date: Fri May 9 13:13:08 2025 -0500
Fix Barrier API implementation and add missing variants (#121)
- Fixed issues in the existing Barrier API
- Allocated sync buffers of team using the symmetric heap
- Added missing thread-level and wavefront-level Barrier APIs
- Updated functional tests to cover all Barrier variants
commit caa4dc3c4ed3330a98b69d88aba57699b1c135b4
Author: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>
Date: Thu May 8 16:58:58 2025 -0400
Missing variable in ibgda branch and use create_ctx to avoid default ctx (#120)
in num_pes and my_pe
commit 483636e380bbaf67d92ae386b8ab99156415a078
Author: Brandon Potter <BKP@users.noreply.github.com>
Date: Thu May 8 14:36:46 2025 -0500
Refactor several classes and bugfixes (#115)
* Merge backend connection and network classes
* Use agent scope instead of system scope for counters
* Remove monitor thread
commit 83e7a0487194c6ed34fcf9449e0f02e9d5934229
Author: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>
Date: Thu May 8 14:52:52 2025 -0400
Add verification, fix only rank0 runs the test (#114)
commit 3469aea496e0d7afcb01059969bbc5c99082fa0e
Author: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>
Date: Thu May 8 10:55:40 2025 -0400
new tester: put to all pes from all lanes concurrently - ibgda (#113)
* Add put to all pes from all lanes concurrently
* This runs on ro 64(8x8) pes, the workload increases with the num_pes so it gets very slow at scale
* Adapt for ibgda branch
commit 3e10a287e8ff9f8b4e2d40bb4969b226049726f6
Author: Avinash Kethineedi <avinash.kethineedi@amd.com>
Date: Wed May 7 18:20:12 2025 -0500
Fix and extend Barrier_All API support (#110)
- Fixed issues in the existing Barrier_All API implementation
- Added missing thread-level and wavefront-level Barrier_All APIs
- Updated functional tests to cover all Barrier_All variants
commit c5a369c2247b67555eb773aa8b2c77d723e28104
Author: Brandon Potter <BKP@users.noreply.github.com>
Date: Wed May 7 11:45:12 2025 -0500
Serialize entrance into queue pair code by PE (#108)
commit 5e916dad8757fdd7cb7294aef9d1148074c367d6
Author: Yiltan <ytemucin@amd.com>
Date: Wed May 7 12:38:58 2025 -0400
Fix ibv_reg_mr when using subcommunicators (#104)
commit aa65c8a7ecdb495da30a08cedd336ae9256ce5b5
Author: Edgar Gabriel <edgargabriel@users.noreply.github.com>
Date: Tue May 6 11:10:12 2025 -0500
add code for determining closest NIC to a GPU (#100)
add code for detecting the closest NIC given a GPU device ID.
The code is based on the same functionality in Transferbench, and has
been stripped down to the required functionality in rocSHMEM. (Note,
there is probably more code that could be removed/simplified probably).
There are two interfaces that are of interest:
- int GetClosestNicToGpu(int gpuIndex, char **dev_name): returns the
id of the NIC in the device list as well as the name of the device
(if dev_name is not a nullptr);
- void DisplayTopology(bool outputToCsv): prints out the entire
topology detected on the node. THis does not happen automatically,
but could be integrated in the future with some debugging output
when the user sets an environment variable.
commit 1bd5c302759dcfd0201d549c2620d53f57cf011f
Author: Brandon Potter <BKP@users.noreply.github.com>
Date: Tue May 6 11:09:56 2025 -0500
Fix several bugs of gda_devel branch (#103)
* Revert "Use 32-bit counter values"
This reverts commit 464374e5f7157cb4124d01d662103056a04a933c.
* Call hipMemset after allocation on QueuePair members
* Undo previous relaxations and use SEQ_CST atomics
* Remove placement new on QueuePair creation
* Bugfix on outstanding wqe table off by one
commit 8f1fef97a809d829ee05495b0f55cf43e610b99f
Author: Brandon Potter <BKP@users.noreply.github.com>
Date: Tue May 6 11:09:40 2025 -0500
Remove unused code (#102)
* Remove unused code
* Remove unused connection method
commit dd675b459db961d1f366c33a95528c4946179c02
Author: Brandon Potter <brandon.potter@amd.com>
Date: Fri May 2 15:45:51 2025 -0500
Add AMO support
commit 025569c252b91ad4be46a70cd94ec2af117b9167
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 23:13:34 2025 -0500
Change names around
commit b80ccee955edecdda320b38d4c395ee9aeb4ae43
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 22:34:48 2025 -0500
Remove unused code
commit b2227a72817c130535bfad43e931878da3d799b1
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 22:23:47 2025 -0500
Replace do-while with while
commit 464374e5f7157cb4124d01d662103056a04a933c
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 22:11:22 2025 -0500
Use 32-bit counter values
commit 6e6c2c9587c8e57ea669b49ed5b8d40aa17da4e0
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 22:08:22 2025 -0500
Relax synchronization
commit 5c95d2967b675d3cb568b2624923d1cefdb6d26e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 21:58:42 2025 -0500
Remove unused method
commit 91bdc47b4c2c488315aa7b0e27235cd6046032bb
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 21:48:53 2025 -0500
Use __shfl for broadcast
commit 5a04575d731c37fed6ed8be7ad49483ee23781f1
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 21:40:44 2025 -0500
Relax order
commit ccd29bb037a8a726351e9cde4f3df9e64242545f
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 21:29:58 2025 -0500
Relax synchronization
commit 21f26f2d31549e3c9c71c2b0ddabe2548d2c59f6
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 21:10:27 2025 -0500
Rename sq variables
commit d921a5165d63bed51f8fed6cf84a0c47f7df94f4
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 20:37:54 2025 -0500
Rename variables in quiet
commit 8d83f6bfb9dac1315748f0a9337c993e4ce4609e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 20:27:59 2025 -0500
Rename quiet counter variables
commit 41d303d37fed401fd807fcf85af74637c3bcb68d
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 20:24:15 2025 -0500
Refactor quiet
commit 6fdae5426a8cf30e2208af8a3f0e5e31c78674f6
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 20:05:19 2025 -0500
Replace some lds broadcasts with __shfl
commit 5ed8835fa8e4008519dd3ca5abed7155a51ea825
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 19:42:47 2025 -0500
Use constant for wavefront size instead of literal
commit d9f24ff7f7bd87a574febd6a07e5d79fbe71b708
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 19:38:42 2025 -0500
Remove debug statements
commit 7fa040d11fe0f886b319e87b536748558dda8de8
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 15:55:59 2025 -0500
Fixed several bugs - stable
commit a923e6eecacb9398031f41350e6a464895064479
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 12:10:29 2025 -0500
Fix bug in post_wqe_rma
commit d55ce6183a199ef06d6ff16f9aefa329e99e3875
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 08:22:17 2025 -0500
Use better variable name
commit 4536ccde50119e05abdc38450e19c665f21ecc8b
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 30 08:18:42 2025 -0500
Remove atomics for cqe64 access
commit 48722eaf3145507bda7346cce14c6c62995aa342
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 29 22:40:28 2025 -0500
Use volatile on cqe polling
commit e08e52ab9476a564a015d27ba23c14690f0dd425
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 29 21:35:15 2025 -0500
Debug synchronization
commit e7ebad19140caf3f6ecba0d770f0a127cd3db421
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 28 11:18:25 2025 -0500
Minor changes
commit ed41a9635f3058b8b2ee7cd58486d54f3bd35d4f
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 28 09:39:16 2025 -0500
Implement mt queues
commit f91cefd62058b44c79c382a746787145d11bf953
Merge: eb18b0c0 59908366
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 28 10:07:48 2025 -0500
Merge branch 'abouteil/gpuib_bare-dlmalloc' into bpotter/gpuib_bare-04_28_25-devel
commit 59908366d9c436ef4dd8c77038a9ce31da49f202
Author: Aurelien Bouteiller <abouteil@amd.com>
Date: Mon Apr 28 10:13:08 2025 -0400
dlmalloc: resolve drift with ibgda branch
commit fe527fa9bf8f604056151ab5baf617ff1d686be6
Author: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
Date: Wed Apr 9 11:57:07 2025 -0400
Add unit tester for dlmalloc, rework single_heap, pow2bins unit testers accordingly
* add dlmalloc get_used/get_avail, and have all strats allocators also have a get_used
* Rework memallocator unit tests: bin size is per strat, alignment is verified in singleheap
Signed-off-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
commit 8714dc647a2b5982812ff4405ad82ad43ebc509e
Author: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
Date: Fri Mar 28 14:17:49 2025 -0400
Add dlmalloc_strat allocator strategy Use mspace variant to ease encapsulation Make pow2bins and dlmalloc cmake selectable
Signed-off-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
commit eb18b0c0e22616f6154a8d02963ad28b63ec4733
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 27 15:56:54 2025 -0500
Use SND DBR offset
commit 114e8df3f0f265546e1a1f876a9af79b7f9aa547
Merge: ed7fb58a d192f5b6
Author: Brandon Potter <BKP@users.noreply.github.com>
Date: Sun Apr 27 11:09:03 2025 -0500
Merge pull request #74 from ROCm/ytemucin/gpuib_bare-04-25-25
Ytemucin/gpuib bare 04 25 25
commit d192f5b6164f9b4bb5305118688f262bc95993e6
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 16:50:02 2025 -0500
Check default_ctx_ ptr before freeing
commit 60b641a2e5f1fe3ec2848366a5bc0daae7652bb9
Author: Avinash Kethineedi <avinash.kethineedi@amd.com>
Date: Mon Apr 14 09:18:57 2025 -0500
Update backend to use provided MPI communicator during library initialization (#79)
* Update backend to use provided MPI communicator during library initialization, default to `MPI_COMM_WORLD`
* Update `rocshmem_my_pe` and `rocshmem_n_pes` host APIs
- Return values from backend if initialized; otherwise, fallback to MPI_Singleton.
commit 474929f8ae3fd254a740626ce50935a223992b6c
Author: Edgar Gabriel <edgargabriel@users.noreply.github.com>
Date: Mon Apr 14 12:02:09 2025 -0500
Revamp the uniqueId code to support subgroups of processes (#80)
* add code for bootstrapping
the bootstrapping code has been extracted from the MSCCLPP library,
which in parts is based on the code from NVIDIA. The code has been
modified to match the specific requirements of the rocSHMEM library.
* add code to use the new uniqueId bootstrapping
* adjust init_attr example
extend the rocshmem_init_attr example to use two disjoint groups
of processe, in order to trigger the new code path.
* add env variable for bootstrap timeout
* Update examples/rocshmem_init_attr_test.cc
Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>
* Update src/rocshmem.cpp
Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>
---------
Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com>
commit b123c12f10bfd170da8daf3fbfcfd48d153d8f45
Author: Yiltan <yiltan@amd.com>
Date: Fri Apr 25 11:48:59 2025 -0500
Required changes to compile with deepep
- three missing apis (barriers and fence)
- Enable -fpic
commit ed7fb58aa54907d493d02e6492c08eb58e0b10ad
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 05:01:05 2025 -0500
Cleanup debug statements
commit b161c046f472ed4c4519b874bceffb9afb5c02e1
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 04:52:21 2025 -0500
Disabler tester and TicketMutex
commit e51a24ea9d494d472c0f5b759f0e275982940777
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 04:41:35 2025 -0500
Remove monitor thread
commit 36159551971025520af1f2207c65231eca8b65e5
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 04:32:25 2025 -0500
Revert "Revert "Remove print statements""
This reverts commit 763fd7032f9c09a2f642184baa9cb927da414e64.
commit ce7db5f03b796c4e7e994b98b3e5a88922607a23
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 04:31:13 2025 -0500
Revert "Revert "Turn off debug""
This reverts commit c9e1c3b1c4300fb7f6b65ba9882f9651d4362221.
commit 8e0d801f8643566f02592a70e50b7de1b25322c7
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 04:30:33 2025 -0500
Fix THE OTHER bug
commit c9e1c3b1c4300fb7f6b65ba9882f9651d4362221
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 04:02:33 2025 -0500
Revert "Turn off debug"
This reverts commit 03303d10ad2155911632888676e045baaea3c2ca.
commit 763fd7032f9c09a2f642184baa9cb927da414e64
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 03:53:16 2025 -0500
Revert "Remove print statements"
This reverts commit ae65f024a00e4dc416e3c6efd8f10e4665e0dbbd.
commit 03303d10ad2155911632888676e045baaea3c2ca
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 22 20:06:32 2025 -0500
Turn off debug
commit ae65f024a00e4dc416e3c6efd8f10e4665e0dbbd
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 03:46:29 2025 -0500
Remove print statements
commit a63ceff9a74e8811b6749215859e0836ee11ae40
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 03:41:44 2025 -0500
Fixes THE bug
commit e5276bb50eefc717dca52cf726d3e14aaaa198c6
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 03:04:24 2025 -0500
Undo tester changes
commit 6ce6be2618bc27cb8fdfc0a0c772c3f44952631a
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 02:58:27 2025 -0500
Viola?
commit 29f9a063697302641a3a4f18d29eef9db633d966
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 02:24:12 2025 -0500
Add debug statments for dest_info
commit 89647908c7c1a7ceb25f242e41e6db6c8242b9f3
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 02:08:36 2025 -0500
Flip ctx destory
commit a703401ccd1409bfe7c73dccace002df2be4e10e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 01:56:13 2025 -0500
Move ctx out of shared memory
commit 10e458e76b3b79e3be847bd124d6b423e7d01874
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 23 01:43:32 2025 -0500
Add a second context create
commit 29ad27b160dbcbbda22ef0f246bd71bdd862ccd9
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 22 20:04:33 2025 -0500
Simplify CQE checks
commit 65fefc2436adfa59146ce67d36e89cdd0c8eb2fc
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 22 18:09:28 2025 -0500
Use DPRINTF instead of printf
commit d296987d01a326d26838cb16f07159a2b6ae23c0
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 22 13:13:35 2025 -0500
Remove ibv_fork_init
commit f80c18e8c23ad9be784bf9f1699369dd4972eea0
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sat Apr 19 21:15:48 2025 -0500
Try to use hipHostMalloc
commit 89ca8ab6b710a8c2d257075cc83321bd6c90503f
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sat Apr 19 21:05:11 2025 -0500
Use hipHostMalloc instead of default allocator
commit a4ceb2c7c8717a7a09a4ff50b9d8681f21ad66b0
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sat Apr 19 16:05:20 2025 -0500
rkey/lkey debug
commit bfef39a60609626938d2fb58374e9dbc32b60292
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sat Apr 19 15:32:46 2025 -0500
Convert rkey/lkey back to BE
commit 573a7391afd238fabe6be80093f4a2a0c95b1164
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sat Apr 19 15:29:35 2025 -0500
rkey/lkey debug
commit ca5ad13b943028c1f561b5a37e0935dea135ee8e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sat Apr 19 14:49:03 2025 -0500
Add monitor thread
commit a952a997ca80ca5363cc52c8a3753035d0377cb5
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sat Apr 19 11:50:31 2025 -0500
Add more debug messages
commit 2d2a7813f980ed549951f852e652a7a718bc5928
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sat Apr 19 11:17:53 2025 -0500
Minor changes to debug statements
commit eef3a7fb22df67c0117c7909702d6e8b366b8b37
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sat Apr 19 10:41:42 2025 -0500
Allocate network queue pair memory in host memory
commit ea412f1de3a320a422d37b170f2bbd9727ec56e5
Author: Brandon Potter <brandon.potter@amd.com>
Date: Fri Apr 18 01:29:37 2025 -0500
dbrec debugging
commit ad0be143f90c4727439046316491ccc9731e11f1
Author: Brandon Potter <brandon.potter@amd.com>
Date: Fri Apr 18 01:02:19 2025 -0500
Dump qp debug info
commit 6c5224fc5a05d3fbac43db6c6127131eedd927da
Author: Brandon Potter <brandon.potter@amd.com>
Date: Fri Apr 18 00:33:29 2025 -0500
More debug info
commit 4063a883f3cd8dab48574688499e289a6ef9a668
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 23:48:16 2025 -0500
Debug information
commit a95ac6d3220571537dd8e1f13545d9b530202cea
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 23:20:06 2025 -0500
Change init attr cap
commit 774324af9a21f93777324a63eee6efc12d33aac3
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 23:08:57 2025 -0500
Bugfix on param type
commit f1117d0444f8e6384b006d5bface2a3586ff7e68
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 23:05:13 2025 -0500
More debug
commit eabfa80f11ed65736f4ce07767c5e7e546d50c51
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 22:20:01 2025 -0500
Debug effort
commit 88f45b8e56fb8fdbad9d0923c65593288b4d1f58
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 14:14:25 2025 -0500
Remove unused functions
commit f2267c7ec5dfa5c6370ee23d80988cc8f32ca00d
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 14:03:08 2025 -0500
Remove host-side calls into the qps
commit 2659ed54a2b26699a4f41a4ad609f2f1387f1ea3
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 13:07:30 2025 -0500
Add device object file
commit d1a143a3e19c19b47a0440980c60e3afe72e6c7b
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 13:07:16 2025 -0500
Add ticket mutex file
commit 93e6e87497a24e2520bd0a09e2122fe2435f43bc
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 13:05:59 2025 -0500
Try to protect doorbell with mutex
commit 0d8f0de88c5bfed013a5edc83ae4bea098647c42
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 12:03:31 2025 -0500
Cleanup doorbell ringing code
commit d3acb6a090743ca70c386464cbbe2110353246f0
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 10:43:03 2025 -0500
more doorbell prints
commit 46f9ad9d3e3228cdcaa35c8fe09ff4469999c782
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 10:33:01 2025 -0500
Add print statements
commit 8cb1ce8468348b17e14055182eac15bbddc12757
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 10:14:04 2025 -0500
Increase blueflame back to two reg and add prints
commit 6276f16edaac719ccc3a98c6013e399deb1a5210
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 17 10:02:52 2025 -0500
Add print statuements
commit 27d2c1384de697cfdf0a91e7b516a8b305cb92ed
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 16 20:52:46 2025 -0500
Minor modifications to printf debug
commit 5e03fd5261e621916d190a231419b88e1870cd62
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 16 20:46:54 2025 -0500
Remove ipc unit tests
commit 548d040ef5dea7f415512774fffdfb151e10716a
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 19:33:53 2025 -0500
Add print statements
commit ebc1198c4e545bcbac1ea628fdbff3329aea663e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 16:25:00 2025 -0500
Remove optional doorbell ringing support
commit b487be0f0b23731276549a7e5d7badfe559082ab
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 14:40:51 2025 -0500
Only allocate space for one blueflame register
commit 3b9ba1321c4a525f6a60d29a39ab07756678d7ae
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 14:25:05 2025 -0500
Convert protected members to private
commit 52b561e4835a8f4cb6552f8a57a98773139dd011
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 13:53:25 2025 -0500
Fixes
commit dc0da4a2e406bc74a6a1ad7b847363717ef502ae
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 12:13:49 2025 -0500
Debug - omit address
commit acc9af949a7102c753f67efb223918a2c24d982d
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 12:05:02 2025 -0500
Uncomment some code
commit 71387ed3b6d01403c516257c537b64a2cf244d20
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 11:39:41 2025 -0500
Modify print
commit f402241c2fa09d0e68cbcfdeaebfaa9f5c347ff0
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 11:36:37 2025 -0500
Change tester arguments
commit d39a9402558fb2ca4aac02de880eb40463c09f6e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 11:34:18 2025 -0500
Add prints
commit ba1b432112168845fce2684d84e47c239eefa554
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 11:32:05 2025 -0500
Add print statements
commit 98cbfd1ab5143409db810d4a98868c8b50c36602
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 11:27:47 2025 -0500
Add device-side print
commit 19bab0ee0f49426b6068b2cecc4594c4ea9a1b5c
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 11:18:04 2025 -0500
Add wqe debug host print
commit 6cb777e0a1f4ff54271cb6fd57bf3002e4b92629
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 10:55:13 2025 -0500
Initialize wqe fields without host post call
commit 83c8988c1381ef48581222f7eef64e9a547da1de
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 10:30:06 2025 -0500
Remove endian conversion since it's done on host
commit c967c2831efb8ea8df73b501b6cec5a724605abe
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 10:27:27 2025 -0500
Set rkey/lkey using backend
commit 3e9ecfca664cf3c4c53c9ab57c952b37500b9155
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 09:02:51 2025 -0500
bugfix endian
commit 2ceb456263d31972db5da64d15cc10a9a05922d9
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 08:59:08 2025 -0500
endian conversion
commit cebc34e78506db5964cc14465e3290f8c2acd351
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 08:38:14 2025 -0500
Enable tester
commit 05ad6d72322cb50b980940974e27df3bfd00f295
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 08:31:39 2025 -0500
Add in rkey/lkey writes
commit 7340f209de4e00958224e639ad7d1b8cdfb685d7
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 08:28:15 2025 -0500
Add rkey/lkey check
commit c22cf537c726c14d5a3a374b152a4d755e23f95f
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 15 08:13:29 2025 -0500
Add documentation, psuedocode, and modify
commit d984f5b11a26360b55d34b5d46b15b6f012f870b
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 21:32:20 2025 -0500
Finish removing fence
commit c4bd2fbccd523bc0120da99df102697a5ee4180f
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 21:20:23 2025 -0500
Remove fence
commit 5ba16a45b233a739c52d2e2c0ea2cb63314b8812
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 20:53:42 2025 -0500
Style change
commit 23ef8626de0df20bc9fa420847ca64d7612d5e25
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 20:50:51 2025 -0500
Remove comments
commit 4f98c3b8a763d479de4ce6995b6faf6f38fed90e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 20:42:45 2025 -0500
Straight line code
commit 265c812ecb1aab73608e0f7e97a4ebe5b711e2b2
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 18:01:42 2025 -0500
Remove singlethreadpolicy
commit 6472ce17427eb6ca4a7bada95499cdfdcf9fb82e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 17:49:00 2025 -0500
Minor fixes
commit f0ebda6a7b5f9232856831cd4367e2eb58d9ff1e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 14:49:32 2025 -0500
Style changes for backend
commit 8a5a46d939e9cdfc6d6064c7978adbac7b9576dc
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 14:28:05 2025 -0500
Minor fixes
commit a914cb31cba2480b30fba737beb528f53ae59a75
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 14:18:19 2025 -0500
Remove inlining mechanism
commit 319649049827458464a887318b898be092745f08
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 13:59:43 2025 -0500
Remove unused header file
commit 40facc053b1317701a8bb1376ef4d9921531ba33
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 13:54:31 2025 -0500
Fix comment and variable name
commit 4358b0e4161e8f0a7558a3e3d3cb83d1d2754213
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 13:31:05 2025 -0500
Encapsulate members in queue_pair
commit 99347175b1d83d2590ea01fa255905598053fcb8
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 13:19:17 2025 -0500
Style change
commit 45c258c703693cef3b33baf269c22da53da24be1
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 13:16:06 2025 -0500
Cleanup for queue_pair class
commit 06aa050ffa85aa4ab8dcdc9df9e07a5675d7e0cc
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 12:53:33 2025 -0500
Add documentation for segments
commit 1dbd02f4ab4860bebaa495cf5d31ab17dd38700e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 12:10:35 2025 -0500
Remove unused struct
commit d290cbf477878a84f9a1f4e5edcae41d6a047bac
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 12:07:33 2025 -0500
Remove method
commit df37aa233a3f616630585667e2e7313cc296e5d1
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 12:04:16 2025 -0500
Remove unused variable
commit 0ade3759464a56c1830b59b78dff744846edf10b
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 11:52:19 2025 -0500
Cleanup files
commit e9a20b1d762c702141fe386b13263db526a02920
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 11:35:56 2025 -0500
Style changes for queue_pair and segment_builder
commit f09ce8a7b9a4dae0158747199840d8a7c8ad16fd
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 11:16:14 2025 -0500
Remove weird + 1 offset
commit 964b4a1e99bbf43406a4d735606b7aacb43e553e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 11:13:49 2025 -0500
Rename sq fields
commit e5a2eb170a8cd05ba481cbb226528c8a6c8810ce
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 10:55:31 2025 -0500
Remove unused headers
commit 90e53b9240900bf9eeb2427a93a602ce30a4edff
Author: Brandon Potter <brandon.potter@amd.com>
Date: Mon Apr 14 10:49:30 2025 -0500
Cleanup gpu_ib context files
commit 93de9e9c1a8037f4f1f9570e90115132310b86ca
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 23:03:50 2025 -0500
Continue document MLX structures
commit b900305b0fe88cce5de4729d6b54c189caf2fe48
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 22:46:54 2025 -0500
Document gpu queue-pair MLX structures
commit a9e2ea1c78e753343759f46e295e97d0dd21b6a1
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 22:08:21 2025 -0500
Bugfix for host RDMA_WRITE WQEs
commit 330d4f47383c42cd3aeec250fdb88bd263932820
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 22:04:16 2025 -0500
Add host-side initial RDMA_WRITE WQEs back
commit f63d61363300ab60b91d11466634ac4b5799ce4e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 21:46:16 2025 -0500
Try to remove host-side post_wqe
commit 180d088b65680a8c352b6159e05f5d7f9eb83c9d
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 21:39:40 2025 -0500
Always allocate queues in gpu memory
commit 8376221faabfd6eb00a379ec6d4ef1be9e03bc5e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 21:16:14 2025 -0500
Bugfix for connection class
commit 980827e582b6914f034bd773a2b48351ba310184
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 21:07:45 2025 -0500
Refactor connection class
commit 2206134e0e8eede6bb6ebb02be6da4dc33b9ddf2
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 20:53:28 2025 -0500
Refactor some files
commit d608c5c77ab23e4609e43917c8c2aa9d3ad79d1b
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 20:18:05 2025 -0500
Update connection class
commit 5aa7017f473509585aa1acb702adb86391ddb89f
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 19:12:06 2025 -0500
Cleanup connection and network classes
commit 0159d8d615364c24b2bee2ca31a76d1a5f44fbd9
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 18:50:50 2025 -0500
Remove unused member
commit 4dd77e49ab2ac6fc8d6b4e380f16d2dfcd6f9de4
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 17:54:29 2025 -0500
Add uncached heap option
commit a62b0ccec7e0b19677d4dac3669392e8c6838921
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 17:05:53 2025 -0500
Device mem for cq/sq queues
commit 9877461f13fde11f7b647b84fd74208fca7dc4ad
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 16:49:40 2025 -0500
Change heap allocation policies
commit 73217f1a50edd38514092bc1a32827b53ee466ae
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 16:15:36 2025 -0500
Remove compile options and cleanup
commit 41f9349df1ce17a052981e0945c4b81309ae74bb
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 15:24:29 2025 -0500
Cleanup coalescer files
commit 0b4559e87e5bdb590cb5864f4978a2a36628cb8c
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 15:20:26 2025 -0500
Cleaup files
commit 70a654e1f4828f210eb70ea2fd49a0b4748d1bb7
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 15:11:17 2025 -0500
Cleanup rocshmemgpu and team files
commit d0a5f62192d6d4d3f9d6bfa5e5699ca9c920e1d5
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 14:56:34 2025 -0500
Cleanup gpu ib team files
commit 0e4fa1472af4b6d4b0de9b091ad1b74a65a781c8
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 14:38:53 2025 -0500
Add inline and cleanup
commit 145095464e10de9d0fb78c8ccaab8b279b602808
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 14:25:27 2025 -0500
Cleaup file
commit 2e09aa27640e10d057da9d00f83ceca8d90efce5
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 13:50:20 2025 -0500
Cleanup host files
commit 7d870601f63652495d3ce792c3e2385788529275
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 13:18:50 2025 -0500
Minor style changes to context_device
commit 1ce38ae151fdf4603bfef0ae4af5c8bbc6dbf0f7
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 13:03:30 2025 -0500
Remove unused constants
commit 55e3b6f8e96f979febe234d5e4b64d6ed5de2a8e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 12:57:57 2025 -0500
Remove unnecessary init functions
commit dbd208ea6e3526d80778d5fca0f07a6e6d5dc869
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 12:45:48 2025 -0500
Remove manage memory stubs
commit dcd655ca3c6ba7a3367fe7f1e2f59349b2010306
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 12:34:55 2025 -0500
Remove comment
commit 4e5fea4ca3789f3d5fb2b7334a124b79125138c7
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 12:23:41 2025 -0500
Remove unused ThreadImpl types
commit ff3ed3d4f5cd51a2e7f1cb95a4553887ff12ca2f
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 12:03:34 2025 -0500
Move constant into different file
commit 90598f25b324f0ae899e47e56cc5eea1acb4b098
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 11:53:50 2025 -0500
Remove g_ret mechanisms
commit 7ec81c5ce211f287b51091b95df0cf0822cc4997
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 11:36:42 2025 -0500
Remove unused externSharedBytes method
commit b62944d2193ce5c026b266426fb9d3f8a0c57938
Author: Brandon Potter <brandon.potter@amd.com>
Date: Sun Apr 13 11:25:52 2025 -0500
Remove unused variables
commit b5cd7f266f3d942258b94522ef5408a17a988263
Author: Brandon Potter <brandon.potter@amd.com>
Date: Fri Apr 11 11:15:24 2025 -0500
Tear out internal references to removed atomics
commit db21ad6b145a903306fad6c76b87a3b4992556a9
Author: Brandon Potter <brandon.potter@amd.com>
Date: Fri Apr 11 10:57:54 2025 -0500
Remove unused atomic types
commit 123cfb5a9c2b9e5c59c5cb3e8d058878a570143c
Merge: 684dd1c4 c26503f3
Author: Brandon Potter <BKP@users.noreply.github.com>
Date: Fri Apr 11 10:18:00 2025 -0500
Merge pull request #71 from Yiltan/yiltan-cleanup-april-11
Yiltan cleanup april 11
commit c26503f345cf9f7412fc96184a12641985ad46de
Author: Yiltan <yiltan@amd.com>
Date: Fri Apr 11 10:09:02 2025 -0500
removed unused collevtive buffers
commit 4079afa7191c4bd538152f9b55e72b27464f8d34
Author: Yiltan <yiltan@amd.com>
Date: Fri Apr 11 10:02:05 2025 -0500
removed USE_SINGLE_NODE
commit 3dcb1edb7e79788a0505eb89cf64f7a38e000df3
Author: Yiltan <yiltan@amd.com>
Date: Fri Apr 11 10:00:38 2025 -0500
removed network impl off
commit a35d9e32b65138b26281bbd7c29da58a89b6cf23
Author: Yiltan <yiltan@amd.com>
Date: Fri Apr 11 08:42:38 2025 -0500
removed reliable connection into connection
commit 99177002760b2e598d3ec69a5ae69b20495a9c80
Author: Yiltan <yiltan@amd.com>
Date: Fri Apr 11 08:37:34 2025 -0500
remove rocshmem_calc.hpp
commit 2bd9fb4e99b38af6c661e28348c972aa75ca3adc
Author: Yiltan <yiltan@amd.com>
Date: Fri Apr 11 08:33:17 2025 -0500
Removed more unused files
commit 684dd1c43452982f8b1791aae868d744b9d61d91
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 16:58:20 2025 -0500
Remove straggler wait_until variants
commit 7358ce0b3e128894729bdf08638f2e55e99f9147
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 16:51:14 2025 -0500
Remove get variants
commit 3559a12eb5ed316c2fedb010d5db3667cf2bb215
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 14:42:32 2025 -0500
Remove unnecessary interfaces
commit 9836fdf63cefbd9d8cdaf028fa21f688e20d865f
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 12:12:43 2025 -0500
Tear out SYNC, WG_RMA, related functional tests
commit b35b8d86dc7e24a837021a9a6901c48795e5904b
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 11:13:55 2025 -0500
Tear out signal ops from include and dependencies
commit 54cb94c46f561be054ed9cfdf592c023eed2f3c8
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 10:46:37 2025 -0500
Remove debug header
commit 99e1fee3554b680bdd194f6127ee16ff02165c87
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 10:42:21 2025 -0500
Tear out collectives from include and dependencies
commit 6cc6dfecdf0092507df51eea67531a7d4b844067
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 08:46:57 2025 -0500
Remove empty RC functions
commit 811a4b48d6a43f2381d43c9f8fdb020e1bd24916
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 08:36:03 2025 -0500
Remove qe dumper and debug
commit 785d15563d5a4ca50c832769d0c4ed46e60ddd69
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 08:32:06 2025 -0500
Remove helper_macros header since dependencies removed
commit 30ef452825c5d7dae949d2bead11ca4818e59784
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 08:28:41 2025 -0500
Remove dev_mono_linear strategy
commit ad8106bae31fffe8c731fd21007f7ca9bad9abe0
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 08:20:40 2025 -0500
Remove container strategies
commit 2d3e883a1bf67c9fee171214832802f96bf2e9bf
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 08:15:05 2025 -0500
Remove bitwise gtest and matrix container
commit 54c6e819185237901745396b03d76230c4f75594
Author: Brandon Potter <brandon.potter@amd.com>
Date: Thu Apr 10 08:10:18 2025 -0500
Remove array container
commit f92ffc42822dc46e9e887eaff8608a666a2b0116
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 23:46:33 2025 -0500
Remove DC transport files
commit 9a89d0450e7629a97fec70b48b6b94d1d65eb9ff
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 22:37:33 2025 -0500
Remove relative pathing for includes
commit 5b8c7294d780dc08a11fa1dbc93b48a96a75c13f
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 22:07:21 2025 -0500
Remove todo notes
commit 7e920054296a152a410bb576f680adb85063ae45
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 21:58:23 2025 -0500
Remove extra line
commit 53cbd975f97937f62f6f7673f8529124723c494e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 21:49:29 2025 -0500
Merge backend classes
commit d5f10791544ce14d7d6d7149bd926e7d034926dc
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 14:19:27 2025 -0500
Remove USE_RO and USE_IPC conditions
commit 3f60ae758fb8378f5d97f0663e603cda3d3d51f0
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 13:52:36 2025 -0500
Tear out IPC call points
commit 7b0239fc69a79c34fa5f6eecc46ac767ef9b65a3
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 10:28:48 2025 -0500
Tear out hdp_policy
commit db02bd53c6d2b92cb5bbd78268f9e6dd48e868fe
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 10:19:29 2025 -0500
Convert backend_type to GPUIB only
commit 86af41f0e38af3017376666c23231e03bccc7d67
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 10:06:24 2025 -0500
Tear out IPC conduit
commit 48cc0f326686af412ab6b37d8a208061c85bfc33
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 10:01:55 2025 -0500
Tear out RO conduit
commit ebdbeb08464633f79f8dd9c315d9b714565d1b83
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 09:57:25 2025 -0500
Tear out atomic and notifier files
commit 9c7d699b6fa03a974a6cfba0584e912a9026dcd2
Merge: 4d1213ef 62906a5f
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 09:13:31 2025 -0500
Merge branch 'gpu-ib-working-draft' into bpotter/gpuib_mini
commit 62906a5fbd1882a92c807057b49c6b929b6c829e
Merge: 81693634 d2326a15
Author: Yiltan <ytemucin@amd.com>
Date: Wed Apr 9 10:09:19 2025 -0400
Merge pull request #67 from Yiltan/gpu-ib-working-draft
Removed HDP code and error checking to ibv_* functions
commit 4d1213ef2d6d413883c67453295c7d5c88667a3e
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 08:40:46 2025 -0500
Remove unused wrapper class
commit 8d80c0dd18579c2d42d96ac6d91d3fe09ffb5f71
Author: Brandon Potter <brandon.potter@amd.com>
Date: Wed Apr 9 00:05:57 2025 -0500
Remove unused EBO spinlock
commit 282f6dbb6203a3cbed909cf2d660f5e148da4960
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 8 23:56:15 2025 -0500
Remove slab heap
commit f83b20bd713ae25313d93967c3c5877fa794f90d
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 8 23:46:33 2025 -0500
Remove unused unit test for ipc
commit 8eaf49e37cbccf5e5bd4e5743f81fe4264c23580
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 8 23:44:45 2025 -0500
Fix store_asm function and util memcpy funcs
commit 376357961cd69e2bc1fd96ea69f9e296a6114ce6
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 8 23:05:42 2025 -0500
Replace wallClk code with hip function
commit 8c34400fd1dc040a95974e0bb5db362d7a6e7550
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 8 22:44:11 2025 -0500
Remove unused __read_clock function
commit eea7e817537451013ca8a498bd413833b464f766
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 8 22:18:13 2025 -0500
Remove unused forward_list
commit 8869129a122d7b1c42a3f00754960a331be051e6
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 8 22:01:15 2025 -0500
Disable verification of functional tests
commit 6e35940cdbb041cac2f1e03590be79934516b281
Author: Brandon Potter <brandon.potter@amd.com>
Date: Tue Apr 8 22:00:52 2025 -0500
Increase functional test loop size to 200
commit d2326a15cff0a17ab90c9e9885c4a5aff3de1137
Author: Yiltan <yiltan@amd.com>
Date: Mon Apr 7 12:55:32 2025 -0500
Added error checking to verbs functions
commit 3a1034d64f8a086687ce6143b392651f9c12b0b9
Author: Yiltan <yiltan@amd.com>
Date: Mon Apr 7 10:21:06 2025 -0500
removed unused file
commit e874bba7865806cbbf7a758c74115ec32e8b7867
Author: Yiltan <yiltan@amd.com>
Date: Mon Apr 7 10:19:07 2025 -0500
removed hdp comments
commit 45ee7e3c69afdb0679785ad63c84f586ee4df41c
Author: Yiltan <yiltan@amd.com>
Date: Mon Apr 7 09:54:46 2025 -0500
fixed dc
commit 81693634b31507ada6fb892e9ce645f29fb70841
Author: Yiltan <yiltan@amd.com>
Date: Thu Apr 3 14:17:54 2025 -0500
cant lock cq if on device mem
commit 34acee47d6ba5c6a9a5af669e8f5f855d479c292
Author: Yiltan <yiltan@amd.com>
Date: Thu Apr 3 08:06:37 2025 -0500
null ptr
commit a7431148884c5c782b1e301020ee8c30dee82643
Author: Yiltan <yiltan@amd.com>
Date: Wed Apr 2 15:25:44 2025 -0500
comment out hdp
commit a0865b4d1b77e2a55726af20fda628cf9ee33c94
Author: Yiltan <yiltan@amd.com>
Date: Wed Apr 2 12:50:00 2025 -0500
GPU_IB Compiles
commit a1940e0d99b9c59b879b4ba97cdcdd349e2c8396
Author: Yiltan <yiltan@amd.com>
Date: Wed Apr 2 10:07:04 2025 -0500
Add GPU IB back
* Revert "Only issue a single completion per wavefront (#199)" (#205)
This reverts commit d9e2fed2f7e55d266c7dfcacc4641b92a3b008ed.
(cherry picked from commit c931145560e357b267a9b693c56de6915458702f)
* GDA Cmake modifications, move topology to gpu_ib specific folder
* Do not use ../thing.h
* Use WF_SIZE: AMDGCN_WAVEFRONT_SIZE is deprecated
* 2-way merge between context_ipc and context_gpuib
* Select MTU based on network config (#214)
* rocSHMEM GDA BNXT POC (#213)
* rocSHMEM GDA PoC for Thor 2 (233.2.76.0)
(cherry picked from commit abe172f74d32c26ae714ee329088fcb39f07da60)
* Rename gpu_ib to gda
* Renaming part2: includes and cmakery
* Fix DISPATCH macro; use backend_comm when needed; some GPUDevices where
left
* Consolidate GDA_CHECK_NNULL/CHECK_ZERO/CHECK_HIP to look and feel
similar
* Update copyrights to the new style
* Rework default-ctx init, missing heap init, missing qpe field
* backend_gda: single init, use systematic naming for setup/cleanup,
prefix team structures,
* setup_wrk_psync must precede setup_teams etc
* silence recasting error
* Some remnants of GDADevice and missing friend classes, public some
fields, it compiles
* Fix redefinitaion of CHECK_HIP in functional testers, we still have a
duplicate definition that would probably be better having only one
* typo in backend_type
* Undo unneeded change to functional test driver
* Add -lnuma
* ctx must be initialized after qps
* gda: Disable non-functional tests (#216)
* Do not try to run functional tests that are not implemented
* Revert "Increase functional test loop size to 200"
This reverts commit 6e35940cdbb041cac2f1e03590be79934516b281.
* Make a specific test case for gda
* Disabled further tests that do not currently pass with explanation as to
why disabled
(cherry picked from commit dc0b8e889621b8a6c4685c44394350947f9b547c)
* gda_devel: teams with MPI initalization (#229)
* Fix missing communicator initialization
* Reenable team functional testers
---------
Co-authored-by: Edgar Gabriel <Edgar.Gabriel@amd.com>
(cherry picked from commit 2c74a458600e848751b8b6f9890fd0e677c2a4c2)
* [GDA] Query for the correct GID index (#215)
* Added GID query code for CX7/Thor2 NICs
(cherry picked from commit 3b7c42d745756ed5866e501614d0d54ef8fc072f)
* Reorder code to make ipc and gda more similar
* Do not double free Wrk_Sync, uniform styling with ipc
* Remove unused includes
* Abort when using not-implemented device functions
* BNXT Compiles
* Silence compiler warnings
* Cleanup unused .h
* Uniform indentation between ipc and gda
* gda: add cleanups, address todos
* Disable pingpong tests, enable defaultctxtest
* Reenable testing non-fetching amos
* build scripts: use a single script backed for all gda variants
enable configuring INSTALL_PREFIX and BUILD_TYPE from the command line
same order in all scripts
* fix: prevent double free in `GDADefaultContextProxy` with custom move assignment
* The default move assignment, invoked during initialization of
`default_context_proxy_`, caused the default context’s QPs to be freed
prematurely because the destructor is triggered by the xrvalue after
initialization.
* Undo changes to the amo standard tester during gda_devel dd675b45, as
they cause RO failures
---------
Co-authored-by: Yiltan <ytemucin@amd.com>
Co-authored-by: Edgar Gabriel <Edgar.Gabriel@amd.com>
Co-authored-by: Yiltan <yiltan@amd.com>
Co-authored-by: avinashkethineedi <Avinash.kethineedi@amd.com>
Co-authored-by: bpotter <brandon.potter@amd.com>
[ROCm/rocshmem commit: 69bd4bfe445a71712bf3aa9465ee3f6b535d0dbc]
ammallya
pushed a commit
that referenced
this pull request
Jan 30, 2026
ammallya
pushed a commit
that referenced
this pull request
Jan 30, 2026
[ROCm/rocjpeg commit: 9116f11]
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change-Id: I4f9ac6ce807d4d670a19ae84fe553eb3a7484d96
🔁 Imported from ROCm/rdc#58
🧑💻 Originally authored by @rocm-devops