-
Notifications
You must be signed in to change notification settings - Fork 96
Description
This proposal is based on the fact that verifiers can work on any completed contribution in any order. As a result, we can scale the number of verifiers up or down during the round if there's a need. It also eliminates the verifier drop logic.
Key parts:
- treat verifier as a computational unit in a pool or a queue (like a message queue)
- a contribution is sent to the verifier with a timeout. If the verifier got stuck, the contribution is sent to another verifier
- verifiers can join or leave the pool
Current design:
- verifiers join the queue and assigned to the round
- when the round starts, coordinator assigns tasks for each verifier
- if one of the verifiers got stuck, another verifier can't pick up the work
- no way to change the number of verifiers during the round
- there's a need to handle verifier drop logic
How to reuse existing solutions:
We have the reliability score checks for contributors in order to estimate their ability to complete the round before the round starts. It is implemented in a way very similar to the setup logic: contributor receives the challenge, does the computation and sends it back. The computation is close to actual contribution process, and it allows us to estimate the time to complete the round for this contributor. But what's the difference between the setup and the reliability check? It's the communication protocol. The contributor (and similarly verifier) uses HTTP protocol, requesting the state of the round periodically. Then contributor checks assigned tasks, and decides which task to download next, and then uploads the result. As you can see, a lot of the logic in the setup is handled by contributor and verifier. In contrast, reliability checks are implemented over WebSocket protocol, and the implementation explores the idea of more simple contributor/verifier. Thanks to bidirectional protocol, coordinator can send the work to the contributor during reliability checks, and the contributor sends the work back when it's done. The verifier part of reliability checks is not implemented yet, but the design is similar.
To summarise:
I'd like to propose using bidirectional protocol to address the verifier scalability problem, and to use reliability checks implementation as a foundation.