Skip to content

feat(tl): add ancestral_linkage#52

Merged
colganwi merged 3 commits intomainfrom
feat/ancestral-linkage
Mar 27, 2026
Merged

feat(tl): add ancestral_linkage#52
colganwi merged 3 commits intomainfrom
feat/ancestral-linkage

Conversation

@colganwi
Copy link
Copy Markdown
Collaborator

Summary

  • Adds tl.ancestral_linkage to measure how closely related cells of different categories are on the lineage tree
  • Pairwise mode (target=None): computes a category × category linkage matrix stored in tdata.uns['{key}_linkage']
  • Single-target mode (target=<cat>): computes per-cell distance to the nearest cell of the given category, stored in tdata.obs['{target}_linkage']
  • Supports metric='path' (branch-length path distance) and metric='lca' (LCA depth)
  • Optional test='permutation' with parallel fork-based workers (n_threads)
  • Optional symmetrize for the pairwise matrix
  • by_tree=True adds per-tree breakdowns in the stats table
  • Adds tqdm as a package dependency (used for permutation progress)

Test plan

  • Run conda run -n pycea python -m pytest tests/test_ancestral_linkage.py — 35 tests covering pairwise/single-target modes, known values, symmetrization, permutation tests, parallel execution, and edge cases

🤖 Generated with Claude Code

…atedness

Computes pairwise or single-target linkage scores between cell categories
using path distance or LCA depth on the lineage tree. Supports permutation
testing, parallel execution (fork-based), symmetrization, and per-tree stats.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 24, 2026

Codecov Report

❌ Patch coverage is 98.34254% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.72%. Comparing base (5725107) to head (ed8ce9b).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/pycea/tl/ancestral_linkage.py 98.33% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #52      +/-   ##
==========================================
+ Coverage   92.91%   93.72%   +0.81%     
==========================================
  Files          34       35       +1     
  Lines        2554     2916     +362     
==========================================
+ Hits         2373     2733     +360     
- Misses        181      183       +2     
Files with missing lines Coverage Δ
src/pycea/tl/__init__.py 100.00% <100.00%> (ø)
src/pycea/tl/ancestral_linkage.py 98.33% <98.33%> (ø)

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8a6a745aaa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


# Choose strategy: Dijkstra handles the natural "closest" direction for each metric
is_named = isinstance(aggregate, str)
use_dijkstra = is_named and ((aggregate == "min" and metric == "path") or (aggregate == "max" and metric == "lca"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Compute lca+max scores from all targets, not nearest path

Routing aggregate='max' with metric='lca' through the Dijkstra shortcut is not generally correct: Dijkstra picks the target leaf with minimum path distance, but maximizing LCA depth depends on both path length and target depth ((d_src + d_tgt - path)/2). When leaves have unequal depths (non-ultrametric trees), the best LCA target can be farther by path, so this branch underestimates linkage in pairwise, single-target, and permutation computations.

Useful? React with 👍 / 👎.

Comment on lines +83 to +86
if "tree_distances" in tdata.obsp:
D = tdata.obsp["tree_distances"]
if isinstance(D, np.ndarray):
precomputed = D
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Validate cached tree_distances before reusing

The all-pairs path reuses tdata.obsp['tree_distances'] whenever it exists as a dense array, but it never verifies that this cache was computed with the same metric, depth_key, or tree selection. If users previously ran tree_distance with different parameters, this function silently consumes stale distances and returns incorrect linkage values for mean/max/custom aggregates.

Useful? React with 👍 / 👎.

Comment on lines +192 to +193
else: # min
sym = np.minimum(arr, arr_T)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Raise on invalid symmetrize mode

Unknown symmetrize values currently fall into the else branch and are treated as 'min', so a typo (for example, 'meen') silently changes analysis output instead of failing fast. This makes results hard to trust because invalid user input produces a valid-looking but incorrect matrix.

Useful? React with 👍 / 👎.

colganwi and others added 2 commits March 24, 2026 14:31
…d permutation test

Adds alternative='two-sided' to support two-tailed p-values. Default None
preserves existing one-sided behavior (more-related direction).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove stale-cache usage in _all_pairs_scores: tdata.obsp['tree_distances']
  was reused without checking metric/depth_key/tree, potentially returning
  incorrect linkage values for mean/custom aggregates.
- Guard lca+max Dijkstra shortcut with an ultrametric check: on non-ultrametric
  trees the nearest-path target is not always the deepest-LCA target; raises
  ValueError with actionable message instead of silently underestimating linkage.
  Adds TODO for a future non-ultrametric fast path.
- Raise ValueError on unknown symmetrize values instead of silently falling
  through to 'min'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@colganwi colganwi merged commit aa117e1 into main Mar 27, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant