Skip to content

Conversation

@EriJFle
Copy link

@EriJFle EriJFle commented Oct 29, 2025

Hi,

This work introduces a CuPy-based GPU implementation of the Wilcoxon rank-sum test for rank_genes_groups().

Key changes

GPU acceleration: Rank computation, group rank sums, z-scores, and two-sided p-values are computed on the GPU (CuPy + cupyx.scipy.special.ndtr).

Vectorized group operations: Replace per-group loops with a single matrix multiply group_matrix.T @ ranks to obtain all group rank sums at once.

GPU-native rank computation: Mid-ranks and tie-correction implemented via CuPy primitives (cp.argsort, cp.cumsum) to mirror Scanpy semantics.

Dynamic GPU memory management: _choose_chunk_size() queries cp.cuda.runtime.memGetInfo() to size gene chunks adaptively (avoids OOM and maximizes throughput).

Testing:
Added tests/test_rank_genes_groups_wilcoxon.py to ensure same output as Scanpy’s wilcoxon rank_genes_groups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants