pyverbs: Optimize CQ poll to use batch ibv_poll_cq by PetruMicu · Pull Request #1720 · linux-rdma/rdma-core

PetruMicu · 2026-03-22T10:28:59Z

Background
ibv_poll_cq(cq, num_entries, wc) is the libibverbs API for retrieving work completions from a Completion Queue. The num_entries parameter exists specifically to allow the caller to retrieve multiple CQEs in a single call.

Problem
The previous implementation of CQ.poll(num_entries) did not use this batching capability. Instead, it called ibv_poll_cq in a loop with num_entries=1, retrieving one completion at a time:

while npolled < num_entries:
    rc = v.ibv_poll_cq(self.cq, 1, &wc)  # single entry, repeated N times
    ...
    npolled += 1

This approach has two problems:

Call overhead multiplied by N. Each ibv_poll_cq invocation carries overhead: the verbs dispatch, memory barriers required to read the CQ ring buffer, and any provider-level locking. Calling it N times means paying that cost N times, even when all N completions are already sitting in the CQ.

Misuse of the API. The num_entries parameter exists precisely to amortize that overhead across a batch. Walking the CQ ring once for N entries is fundamentally cheaper than walking it N times for 1 entry each, both in terms of CPU cycles and memory access patterns.

Change
Replace the loop with a single ibv_poll_cq call that passes the full num_entries count into a stack-allocated ibv_wc array:

wcs_c = <v.ibv_wc *>malloc(num_entries * sizeof(v.ibv_wc))
npolled = v.ibv_poll_cq(self.cq, num_entries, wcs_c)
for i in range(npolled):
    wcs.append(WC(...wcs_c[i]...))

The return value (npolled, wcs) is unchanged — full API compatibility is preserved. Callers that handle partial results (e.g. _poll_cq() in tests/utils.py) continue to work correctly, since ibv_poll_cq already returns however many completions are available up to num_entries.

Impact
In high-throughput workloads where poll(N) is called with N > 1, this reduces the number of ibv_poll_cq invocations from N to 1, eliminating redundant memory barrier overhead and CQ ring traversals. The improvement scales directly with the batch size.

Signed-off-by: Petru Micu <micu.petru2899@gmail.com>

Optimize CQ poll to use batch ibv_poll_cq

69fc31b

Signed-off-by: Petru Micu <micu.petru2899@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyverbs: Optimize CQ poll to use batch ibv_poll_cq#1720

pyverbs: Optimize CQ poll to use batch ibv_poll_cq#1720
PetruMicu wants to merge 1 commit intolinux-rdma:masterfrom
PetruMicu:cq-batch-poll

PetruMicu commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PetruMicu commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant