Skip to content

chore: typos, formatting and refactor#145

Open
selmanozleyen wants to merge 30 commits intoscverse:mainfrom
selmanozleyen:fix/typos-n-cleanup
Open

chore: typos, formatting and refactor#145
selmanozleyen wants to merge 30 commits intoscverse:mainfrom
selmanozleyen:fix/typos-n-cleanup

Conversation

@selmanozleyen
Copy link
Member

@selmanozleyen selmanozleyen commented Feb 11, 2026

Hi, there were some things bugging me and I wanted to clean some stuff and point out some inconsistencies but I also didn't want this to be in other unrelated PR's so here is a contained PR consisting of some of the things that bugged me. I will revert some commits based on whatever you guys would want to undo.

Here is an ai-based report of the changes by commits

Changes

  • Breaking: Rename add_adatas to add_anndatas for consistent naming (selmanozleyen@2558b40, selmanozleyen@304dcbb and a60774c)

    • Renamed DatasetCollection.add_adatas -> DatasetCollection.add_anndatas
    • Updated the docstrings and tests accordingly. Might be also a different fix like doing vice versa
  • Fix "Wether" typo in io.py (5232e12)

    • DatasetCollection.is_empty docstring: "Wether" -> "Whether"
  • Define _collection_added as a class attribute in loader.py (5d14e5c)

  • Fix "who" typo in loader.py (d841e13)

    • Docstring in Loader.use_collection: "The collection who on-disk datasets" -> "The collection whose on-disk datasets"
  • Ruff format test_dataset.py (36af588)

    • Reformatted a lambda expression's parameter list in test_dataset.py (whitespace-only change by ruff)

other mypy fixes but I don't mind them that much

  • 4958749 -- add torch.* and h5py.* to mypy ignore_missing_imports
  • 830f2d4 -- fix Mapping.copy() call in write_sharded callback (dict(dataset_kwargs))
  • 6d6067a -- wrap categories in pd.Index for Categorical.from_codes
  • 12830d5 -- replace match/case with if/elif for better mypy narrowing in _create_chunks_for_shuffling

@codecov
Copy link

codecov bot commented Feb 11, 2026

Codecov Report

❌ Patch coverage is 92.85714% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 91.50%. Comparing base (d47240f) to head (50293fb).

Files with missing lines Patch % Lines
src/annbatch/io.py 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #145      +/-   ##
==========================================
- Coverage   93.71%   91.50%   -2.21%     
==========================================
  Files          11       11              
  Lines         811      812       +1     
==========================================
- Hits          760      743      -17     
- Misses         51       69      +18     
Files with missing lines Coverage Δ
src/annbatch/loader.py 89.29% <100.00%> (-3.32%) ⬇️
src/annbatch/io.py 93.08% <90.90%> (ø)

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@selmanozleyen selmanozleyen marked this pull request as draft February 12, 2026 10:03
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@selmanozleyen selmanozleyen marked this pull request as ready for review February 12, 2026 16:13
@selmanozleyen selmanozleyen self-assigned this Feb 19, 2026
Copy link
Collaborator

@ilan-gold ilan-gold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just more places for consistency. Why go with anndata over adata? I'm really asking for feedback, I think adata is used for the actual object generally i.e., adata = read_h5ad but not anndata = read_h5ad

adatas = []
categoricals_in_all_adatas: dict[str, pd.Index] = {}
for i, path in tqdm(enumerate(paths), desc="loading"):
adata = load_adata(path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would move this argument also to load_anndata

@selmanozleyen
Copy link
Member Author

Just more places for consistency. Why go with anndata over adata? I'm really asking for feedback, I think adata is used for the actual object generally i.e., adata = read_h5ad but not anndata = read_h5ad

For me it both was same so I wanted to minimize the breaking impact and guessed Loader was used more. Also thought, as you said, "anndata is a class name vs adata is an instance name". Like we take any anndata not a specific adata.

I would go with whatever is more adapted in the ecosystem but I don't have any insights on that.

@felix0097
Copy link
Collaborator

LGTM!

Another minor fix: Could you provide a the number of iterations here. This gets broken by using enumerate. Just set total=len(paths):

for i, path in tqdm(enumerate(paths), desc="loading"):

this has been bugging me for ages

Moreover, the progress bar descriptions should get updated:
Here, I would use Creating collection
Here, I would use: Extending collection
Here, I would use Lazy loading anndatas
Here, I would start with a capital letter as well

Would be cool if you could add this to this PR. Otherwise, I can create a separate PR myself @selmanozleyen

@ilan-gold
Copy link
Collaborator

ilan-gold commented Feb 25, 2026

Also thought, as you said, "anndata is a class name vs adata is an instance name". Like we take any anndata not a specific adata.

"anndata" is the library name. "AnnData" is the class name. In any case, some spatial data folks concur with adata:

https://scverse.zulipchat.com/#narrow/channel/315789-data-structures/topic/API.20Terminology/near/575771345

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants