Support multiple data providers #207

nfnt · 2025-12-09T14:52:35Z

Handle the case of multiple data providers providing the same dataset. For that case we assume that a dataset name is used uniquely for the same dataset. With that assumption, we can inform workers of the multiple data providers for a dataset and have them choose between them when downloading data slices. Right now, workers choose randomly if there are multiple data providers available.

Tested locally by having 2 data providers with the MNIST dataset. Workers pulled slices from both of them.

Remarks:
While this allows for simple load balancing when pull data, this won't provide any fault tolerance if data providers are no longer available: The scheduler determines the available data providers once. For fault tolerance, this needs to be done for each data slice request from workers. Furthermore, data providers add themselves into the DHT using start_providing. It'll take a while after a provider is gone that this record is removed from the DHT. We would have to account for this by checking the actual availability of data providers as reported by the DHT.

codecov · 2025-12-09T14:59:31Z

Codecov Report

❌ Patch coverage is 0% with 32 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
crates/scheduler/src/bin/hypha-scheduler.rs	0.00%	19 Missing ⚠️
crates/worker/src/executor/bridge.rs	0.00%	7 Missing ⚠️
crates/scheduler/src/scheduling/data_scheduler.rs	0.00%	5 Missing ⚠️
crates/data/src/bin/hypha-data.rs	0.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Handle the case of multiple data providers providing the same dataset. For that case we assume that a dataset name is used uniquely for the same dataset. With that assumption, we can inform workers of the multiple data providers for a dataset and have them choose between them when downloading data slices. Right now, workers choose randomly if there are multiple data providers available.

orlandohohmeier · 2025-12-09T15:00:24Z

crates/data/src/bin/hypha-data.rs

+    // Only a single record is stored for this key in the DHT.
+    // I.e. if there are multiple data providers with the same dataset,
+    // we might overwrite an existing record.
+    // For now, we assume that a dataset name is always used for the same record.
+    // With that assumption, overwriting existing records is okay.
+    // We might need to change this in the future.


This is captured in: #203

github-actions · 2025-12-11T07:42:18Z

🎉 This PR is included in version 1.0.0-alpha.39 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

nfnt requested review from l45k and orlandohohmeier December 9, 2025 14:52

nfnt force-pushed the nfnt/support-multiple-data-providers branch from 851ad9d to 8283838 Compare December 10, 2025 13:43

orlandohohmeier approved these changes Dec 10, 2025

View reviewed changes

nfnt merged commit db773ec into alpha Dec 11, 2025
8 of 9 checks passed

nfnt deleted the nfnt/support-multiple-data-providers branch December 11, 2025 07:36

github-actions bot added the released on @alpha label Dec 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support multiple data providers #207

Support multiple data providers #207

Uh oh!

nfnt commented Dec 9, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 9, 2025 •

edited

Loading

Uh oh!

orlandohohmeier Dec 9, 2025

Uh oh!

Uh oh!

github-actions bot commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Support multiple data providers #207

Support multiple data providers #207

Uh oh!

Conversation

nfnt commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

orlandohohmeier Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nfnt commented Dec 9, 2025 •

edited

Loading

codecov bot commented Dec 9, 2025 •

edited

Loading