Skip to content

Scalable verifiers proposal #299

@ibaryshnikov

Description

@ibaryshnikov

This proposal is based on the fact that verifiers can work on any completed contribution in any order. As a result, we can scale the number of verifiers up or down during the round if there's a need. It also eliminates the verifier drop logic.
Key parts:

  • treat verifier as a computational unit in a pool or a queue (like a message queue)
  • a contribution is sent to the verifier with a timeout. If the verifier got stuck, the contribution is sent to another verifier
  • verifiers can join or leave the pool

Current design:

  • verifiers join the queue and assigned to the round
  • when the round starts, coordinator assigns tasks for each verifier
  • if one of the verifiers got stuck, another verifier can't pick up the work
  • no way to change the number of verifiers during the round
  • there's a need to handle verifier drop logic

How to reuse existing solutions:
We have the reliability score checks for contributors in order to estimate their ability to complete the round before the round starts. It is implemented in a way very similar to the setup logic: contributor receives the challenge, does the computation and sends it back. The computation is close to actual contribution process, and it allows us to estimate the time to complete the round for this contributor. But what's the difference between the setup and the reliability check? It's the communication protocol. The contributor (and similarly verifier) uses HTTP protocol, requesting the state of the round periodically. Then contributor checks assigned tasks, and decides which task to download next, and then uploads the result. As you can see, a lot of the logic in the setup is handled by contributor and verifier. In contrast, reliability checks are implemented over WebSocket protocol, and the implementation explores the idea of more simple contributor/verifier. Thanks to bidirectional protocol, coordinator can send the work to the contributor during reliability checks, and the contributor sends the work back when it's done. The verifier part of reliability checks is not implemented yet, but the design is similar.

To summarise:
I'd like to propose using bidirectional protocol to address the verifier scalability problem, and to use reliability checks implementation as a foundation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions