Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #125 +/- ##
=======================================
Coverage 90.80% 90.80%
=======================================
Files 10 10
Lines 805 805
=======================================
Hits 731 731
Misses 74 74
🚀 New features to boost your workflow:
|
felix0097
left a comment
There was a problem hiding this comment.
I would definitely add an example here @selmanozleyen on how to actually implement this. One very simple example:
- How do I write out my own sampler (one example class, e.g. weighted sampler, or fully random sampler)
- How do I plug this into the rest of your code base e.g. how to actually use this
Moreover, did you check how this behaves on the full Tahoe dataset for e.g. fully random sampling. With the old code the memory usage blew up significantly. If this is still the case we should add a warning/caveats section
cc @ilan-gold
|
@ilan-gold & @selmanozleyen I've added a detailed doc page on how to implement a custom sampler. Feel free to comment/edit |
|
ToDo: Update docs to reflect #127 |
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
…nto ig/docs_sampler
| │ 0-99 │ 100-199 │ 200-299 │ 300-399 │ 400-499 │ 500-599 │ 600-699 │ 700-799 │ 800-899 │ 900-999 │ | ||
| └─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘ | ||
|
|
||
| LoadRequest with chunks = [slice(200,300), slice(700,800), slice(0,100), slice(500,600)]: |
There was a problem hiding this comment.
Maybe good to unalign the chunk boundaries to show that alignment with virtual chunk boundaries is not necessary?
There was a problem hiding this comment.
I've added a comment to clarify this 👍
There was a problem hiding this comment.
I think these should be unaligned still, which I am now more convinced of because it's a virtual concatenation of datasets, not zarr chunks (see below)
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
…nto ig/docs_sampler
| ┌─────────────────────────────────────────────────────────────────────────┐ | ||
| │ 0 1 2 3 ... 99 100 101 ... 199 200 ... 299 300 ... 399 │ | ||
| │ │ | ||
| │ [Chunk 200-299] [Chunk 700-799] [Chunk 0-99] [Chunk 500-599] │ |
There was a problem hiding this comment.
Another reason I'd like the chunks to be a bit more complex - we order in-memory by on-disk dataset. So Chunk 0-99 definitely goes first. We could loosen things but that's how it works right now
There was a problem hiding this comment.
|
|
||
| Batch 1 (4 observations): | ||
| ┌───────────────────────────────────────────────────────────────────┐ | ||
| │ indices [0, 50, 150, 250] │ |
There was a problem hiding this comment.
Still think these should be actually random indices
| │ │ | ||
| │ Disk (sequential reads per chunk) Memory (shuffled together) │ | ||
| │ ┌───────────────┐ ┌──────────────────────┐ │ | ||
| │ │ Chunk 0: 0-3 │ ═══════════╗ │ 8 2 11 0 5 9 │ │ |
There was a problem hiding this comment.
Here again, would do Dataset 0 virtual-concatenation indices: 0-3
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
No description provided.