pyverbs: Optimize CQ poll to use batch ibv_poll_cq#1720
Open
PetruMicu wants to merge 1 commit intolinux-rdma:masterfrom
Open
pyverbs: Optimize CQ poll to use batch ibv_poll_cq#1720PetruMicu wants to merge 1 commit intolinux-rdma:masterfrom
PetruMicu wants to merge 1 commit intolinux-rdma:masterfrom
Conversation
Signed-off-by: Petru Micu <micu.petru2899@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
ibv_poll_cq(cq, num_entries, wc) is the libibverbs API for retrieving work completions from a Completion Queue. The num_entries parameter exists specifically to allow the caller to retrieve multiple CQEs in a single call.
Problem
The previous implementation of CQ.poll(num_entries) did not use this batching capability. Instead, it called ibv_poll_cq in a loop with num_entries=1, retrieving one completion at a time:
This approach has two problems:
Call overhead multiplied by N. Each ibv_poll_cq invocation carries overhead: the verbs dispatch, memory barriers required to read the CQ ring buffer, and any provider-level locking. Calling it N times means paying that cost N times, even when all N completions are already sitting in the CQ.
Misuse of the API. The num_entries parameter exists precisely to amortize that overhead across a batch. Walking the CQ ring once for N entries is fundamentally cheaper than walking it N times for 1 entry each, both in terms of CPU cycles and memory access patterns.
Change
Replace the loop with a single ibv_poll_cq call that passes the full num_entries count into a stack-allocated ibv_wc array:
The return value (npolled, wcs) is unchanged — full API compatibility is preserved. Callers that handle partial results (e.g. _poll_cq() in tests/utils.py) continue to work correctly, since ibv_poll_cq already returns however many completions are available up to num_entries.
Impact
In high-throughput workloads where poll(N) is called with N > 1, this reduces the number of ibv_poll_cq invocations from N to 1, eliminating redundant memory barrier overhead and CQ ring traversals. The improvement scales directly with the batch size.