fix: sampler docs by ilan-gold · Pull Request #125 · scverse/annbatch

ilan-gold · 2026-01-28T09:15:52Z

No description provided.

codecov · 2026-01-28T09:19:04Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.80%. Comparing base (c9a5971) to head (a4c94e9).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #125   +/-   ##
=======================================
  Coverage   90.80%   90.80%           
=======================================
  Files          10       10           
  Lines         805      805           
=======================================
  Hits          731      731           
  Misses         74       74

Files with missing lines	Coverage Δ
src/annbatch/abc/sampler.py	`100.00% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

felix0097

I would definitely add an example here @selmanozleyen on how to actually implement this. One very simple example:

How do I write out my own sampler (one example class, e.g. weighted sampler, or fully random sampler)
How do I plug this into the rest of your code base e.g. how to actually use this

Moreover, did you check how this behaves on the full Tahoe dataset for e.g. fully random sampling. With the old code the memory usage blew up significantly. If this is still the case we should add a warning/caveats section

cc @ilan-gold

felix0097 · 2026-01-28T15:05:49Z

@ilan-gold & @selmanozleyen I've added a detailed doc page on how to implement a custom sampler. Feel free to comment/edit

felix0097 · 2026-01-28T15:56:57Z

ToDo: Update docs to reflect #127

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

…nto ig/docs_sampler

ilan-gold · 2026-02-05T10:36:19Z

+  │  0-99   │ 100-199 │ 200-299 │ 300-399 │ 400-499 │ 500-599 │ 600-699 │ 700-799 │ 800-899 │ 900-999 │
+  └─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘
+
+  LoadRequest with chunks = [slice(200,300), slice(700,800), slice(0,100), slice(500,600)]:


Maybe good to unalign the chunk boundaries to show that alignment with virtual chunk boundaries is not necessary?

I've added a comment to clarify this 👍

I think these should be unaligned still, which I am now more convinced of because it's a virtual concatenation of datasets, not zarr chunks (see below)

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

…nto ig/docs_sampler

ilan-gold · 2026-02-06T12:42:16Z

+  ┌─────────────────────────────────────────────────────────────────────────┐
+  │  0   1   2   3  ...  99 100 101  ...  199 200  ...  299 300  ...  399   │
+  │                                                                         │
+  │  [Chunk 200-299]    [Chunk 700-799]   [Chunk 0-99]   [Chunk 500-599]    │


Another reason I'd like the chunks to be a bit more complex - we order in-memory by on-disk dataset. So Chunk 0-99 definitely goes first. We could loosen things but that's how it works right now

See https://annbatch--125.org.readthedocs.build/en/125/generated/annbatch.types.LoadRequest.html#annbatch.types.LoadRequest note at the bottom

ilan-gold · 2026-02-06T12:43:06Z

+
+  Batch 1 (4 observations):
+  ┌───────────────────────────────────────────────────────────────────┐
+  │  indices [0, 50, 150, 250]                                        │


Still think these should be actually random indices

ilan-gold · 2026-02-06T12:47:16Z

+│                                                                                  │
+│     Disk (sequential reads per chunk)                Memory (shuffled together)  │
+│  ┌───────────────┐                                ┌──────────────────────┐       │
+│  │ Chunk 0: 0-3  │  ═══════════╗                  │  8  2 11  0  5  9    │       │


Here again, would do Dataset 0 virtual-concatenation indices: 0-3

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

fix: sampler docs

b882f60

ilan-gold added the skip-gpu-ci Whether gpu ci should be skipped label Jan 28, 2026

ilan-gold requested a review from felix0097 January 28, 2026 09:25

felix0097 reviewed Jan 28, 2026

View reviewed changes

felix0097 added 2 commits January 28, 2026 16:03

Add PyCharm related files

07d4461

Add documentation about custom samplers

a13aca0

Clarify docs for chunking

709afe0

Add details about splits param

e75dbc9

ilan-gold commented Jan 29, 2026

View reviewed changes

Comment thread docs/custom-sampler.md Outdated

Comment thread docs/custom-sampler.md Outdated

Comment thread docs/custom-sampler.md

Comment thread docs/custom-sampler.md Outdated

Comment thread docs/custom-sampler.md Outdated

felix0097 and others added 4 commits January 29, 2026 15:03

Update docs/custom-sampler.md

61550d3

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Update docs/custom-sampler.md

da17212

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Rewrite best practice section

157e5a7

Merge branch 'ig/docs_sampler' of github.com:laminlabs/arrayloaders i…

7c9a045

…nto ig/docs_sampler

ilan-gold commented Feb 5, 2026

View reviewed changes

felix0097 and others added 13 commits February 5, 2026 12:57

Update docs/custom-sampler.md

4256f36

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Update README.md

809cd44

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Update docs/custom-sampler.md

737d005

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Update docs/custom-sampler.md

c6ead7f

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Update docs/custom-sampler.md

1800367

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Remove (optional) comment from validate

1c2d2ba

Update docs/custom-sampler.md

0b0f40e

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Update docs/custom-sampler.md

395740a

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Clarify docs

46d0a44

Apply suggestion from @ilan-gold

31fb1b4

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Apply suggestion from @ilan-gold

6ed76cd

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Apply suggestion from @ilan-gold

b19a935

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Apply suggestion from @ilan-gold

b4d1d58

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

felix0097 and others added 16 commits February 5, 2026 16:33

Apply suggestion from @ilan-gold

05c2658

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Apply suggestion from @ilan-gold

7bb00aa

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Apply suggestion from @ilan-gold

89bab36

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Apply suggestion from @ilan-gold

b795d22

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Apply suggestion from @ilan-gold

7cf0845

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Apply suggestion from @ilan-gold

596a0cd

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Apply suggestion from @ilan-gold

dd157f2

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Apply suggestion from @ilan-gold

a434a94

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Apply suggestion from @ilan-gold

60147de

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Apply suggestion from @ilan-gold

b3c8a7e

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Clean up docs

ef83106

Merge branch 'ig/docs_sampler' of github.com:laminlabs/arrayloaders i…

9477123

…nto ig/docs_sampler

Fix readthedocs

63572dd

Fix readthedocs2

4f9c851

chore: more general solution

15dff0b

Merge branch 'main' into ig/docs_sampler

cdfbc0d

ilan-gold commented Feb 6, 2026

View reviewed changes

ilan-gold and others added 8 commits February 6, 2026 13:48

fix: correct intersphinx

8549f27

Update docs/custom-sampler.md

51bd365

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Update docs/custom-sampler.md

3135a48

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Update docs/custom-sampler.md

1ddaf47

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Update docs/custom-sampler.md

82bb03b

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Update docs/custom-sampler.md

3fc358f

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Update docs/custom-sampler.md

6e73045

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Update docs/custom-sampler.md

a4c94e9

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

felix0097 self-assigned this Feb 10, 2026

Conversation

ilan-gold commented Jan 28, 2026

Uh oh!

codecov Bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

felix0097 left a comment

Choose a reason for hiding this comment

Uh oh!

felix0097 commented Jan 28, 2026

Uh oh!

felix0097 commented Jan 28, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ilan-gold Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

felix0097 Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

ilan-gold Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ilan-gold Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

ilan-gold Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

ilan-gold Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ilan-gold Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jan 28, 2026 •

edited

Loading