chore: typos, formatting and refactor by selmanozleyen · Pull Request #145 · scverse/annbatch

selmanozleyen · 2026-02-11T21:24:48Z

Hi, there were some things bugging me and I wanted to clean some stuff and point out some inconsistencies but I also didn't want this to be in other unrelated PR's so here is a contained PR consisting of some of the things that bugged me. I will revert some commits based on whatever you guys would want to undo.

Here is an ai-based report of the changes by commits

Changes

Breaking: Rename add_adatas to add_anndatas for consistent naming (selmanozleyen@2558b40, selmanozleyen@304dcbb and a60774c)
- Renamed DatasetCollection.add_adatas -> DatasetCollection.add_anndatas
- Updated the docstrings and tests accordingly. Might be also a different fix like doing vice versa
Fix "Wether" typo in io.py (5232e12)
- DatasetCollection.is_empty docstring: "Wether" -> "Whether"
Define _collection_added as a class attribute in loader.py (5d14e5c)
- _collection_added: bool = False is now declared on the Loader dataclass directly, instead of being accessed via getattr(self, "_collection_added", False).
- For this warning: https://pylint.readthedocs.io/en/latest/user_guide/messages/warning/attribute-defined-outside-init.html which is reasonable imo
Fix "who" typo in loader.py (d841e13)
- Docstring in Loader.use_collection: "The collection who on-disk datasets" -> "The collection whose on-disk datasets"
Ruff format test_dataset.py (36af588)
- Reformatted a lambda expression's parameter list in test_dataset.py (whitespace-only change by ruff)

other mypy fixes but I don't mind them that much

4958749 -- add torch.* and h5py.* to mypy ignore_missing_imports
830f2d4 -- fix Mapping.copy() call in write_sharded callback (dict(dataset_kwargs))
6d6067a -- wrap categories in pd.Index for Categorical.from_codes
12830d5 -- replace match/case with if/elif for better mypy narrowing in _create_chunks_for_shuffling

codecov · 2026-02-11T21:27:21Z

Codecov Report

❌ Patch coverage is 92.85714% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 91.50%. Comparing base (d47240f) to head (50293fb).

Files with missing lines	Patch %	Lines
src/annbatch/io.py	90.90%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #145      +/-   ##
==========================================
- Coverage   93.71%   91.50%   -2.21%     
==========================================
  Files          11       11              
  Lines         811      812       +1     
==========================================
- Hits          760      743      -17     
- Misses         51       69      +18

Files with missing lines	Coverage Δ
src/annbatch/loader.py	`89.29% <100.00%> (-3.32%)`	⬇️
src/annbatch/io.py	`93.08% <90.90%> (ø)`

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

review-notebook-app · 2026-02-12T16:08:23Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

for more information, see https://pre-commit.ci

src/annbatch/io.py

This reverts commit 12830d5.

CHANGELOG.md

for more information, see https://pre-commit.ci

ilan-gold

Just more places for consistency. Why go with anndata over adata? I'm really asking for feedback, I think adata is used for the actual object generally i.e., adata = read_h5ad but not anndata = read_h5ad

ilan-gold · 2026-02-23T23:46:56Z

src/annbatch/io.py

    adatas = []
    categoricals_in_all_adatas: dict[str, pd.Index] = {}
    for i, path in tqdm(enumerate(paths), desc="loading"):
        adata = load_adata(path)


Would move this argument also to load_anndata

src/annbatch/io.py

selmanozleyen · 2026-02-24T08:45:31Z

Just more places for consistency. Why go with anndata over adata? I'm really asking for feedback, I think adata is used for the actual object generally i.e., adata = read_h5ad but not anndata = read_h5ad

For me it both was same so I wanted to minimize the breaking impact and guessed Loader was used more. Also thought, as you said, "anndata is a class name vs adata is an instance name". Like we take any anndata not a specific adata.

I would go with whatever is more adapted in the ecosystem but I don't have any insights on that.

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

felix0097 · 2026-02-24T19:44:00Z

LGTM!

Another minor fix: Could you provide a the number of iterations here. This gets broken by using enumerate. Just set total=len(paths):

annbatch/src/annbatch/io.py

Line 181 in d47240f

for i, path in tqdm(enumerate(paths), desc="loading"):

this has been bugging me for ages

Moreover, the progress bar descriptions should get updated:
Here, I would use Creating collection
Here, I would use: Extending collection
Here, I would use Lazy loading anndatas
Here, I would start with a capital letter as well

Would be cool if you could add this to this PR. Otherwise, I can create a separate PR myself @selmanozleyen

ilan-gold · 2026-02-25T13:39:30Z

Also thought, as you said, "anndata is a class name vs adata is an instance name". Like we take any anndata not a specific adata.

"anndata" is the library name. "AnnData" is the class name. In any case, some spatial data folks concur with adata:

https://scverse.zulipchat.com/#narrow/channel/315789-data-structures/topic/API.20Terminology/near/575771345

docs/notebooks/example.ipynb

selmanozleyen added 11 commits February 11, 2026 20:56

typo

5232e12

_collection_added' defined outside

5d14e5c

consistent naming with add_anndatas

2558b40

ruff format

36af588

typo2

d841e13

adapt add_anndatas change to tests

304dcbb

add torch and h5py to mypy ignore_missing_imports

4958749

fix Mapping.copy() call in write_sharded callback

830f2d4

wrap categories in pd.Index for Categorical.from_codes

6d6067a

add asserts for match/case narrowing and rename idxs variable

0f5aa1d

is none == is none works better with mypy

12830d5

selmanozleyen marked this pull request as draft February 12, 2026 10:03

other add_anndatas renames + changelog

a60774c

[pre-commit.ci] auto fixes from pre-commit.com hooks

9c495b3

for more information, see https://pre-commit.ci

selmanozleyen marked this pull request as ready for review February 12, 2026 16:13

selmanozleyen requested a review from felix0097 February 12, 2026 17:04

Merge branch 'main' into fix/typos-n-cleanup

38f408a

ilan-gold approved these changes Feb 13, 2026

View reviewed changes

src/annbatch/io.py Outdated Show resolved Hide resolved

selmanozleyen and others added 4 commits February 13, 2026 12:10

Merge branch 'main' into fix/typos-n-cleanup

3ff60d2

Revert "is none == is none works better with mypy"

7f33e78

This reverts commit 12830d5.

update changelogs

462ca48

updatechangelog again

400ee88

ilan-gold reviewed Feb 13, 2026

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

CHANGELOG.md Outdated Show resolved Hide resolved

update changelog

fea4e70

selmanozleyen requested a review from ilan-gold February 13, 2026 19:39

selmanozleyen and others added 2 commits February 17, 2026 13:45

Merge branch 'main' into fix/typos-n-cleanup

425bf6e

[pre-commit.ci] auto fixes from pre-commit.com hooks

ac14eb9

for more information, see https://pre-commit.ci

selmanozleyen self-assigned this Feb 19, 2026

ilan-gold reviewed Feb 23, 2026

View reviewed changes

selmanozleyen and others added 5 commits February 24, 2026 10:30

Merge branch 'main' into fix/typos-n-cleanup

43909a2

Update src/annbatch/io.py

8d23871

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

anndata_paths

0129dc8

load_adata to load_anndata

d8c9577

fix mistake

50293fb

felix0097 approved these changes Feb 24, 2026

View reviewed changes

selmanozleyen added 2 commits February 25, 2026 15:10

rename from anndata to adata

a485103

update changelog

b1af3b6

selmanozleyen requested a review from ilan-gold February 25, 2026 14:18

ilan-gold approved these changes Feb 26, 2026

View reviewed changes

ilan-gold added 2 commits February 26, 2026 13:41

Apply suggestions from code review

8716f94

Apply suggestion from @ilan-gold

c885049

Conversation

selmanozleyen commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

other mypy fixes but I don't mind them that much

Uh oh!

codecov bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

review-notebook-app bot commented Feb 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ilan-gold left a comment

Choose a reason for hiding this comment

Uh oh!

ilan-gold Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

selmanozleyen commented Feb 24, 2026

Uh oh!

felix0097 commented Feb 24, 2026

Uh oh!

ilan-gold commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

selmanozleyen commented Feb 11, 2026 •

edited

Loading

codecov bot commented Feb 11, 2026 •

edited

Loading

ilan-gold commented Feb 25, 2026 •

edited

Loading