High-availability validator setup
This Request for Proposals is closed, meaning we are not looking for any more proposals on this topic at the moment.
- Status: Closed
- Proposer: mmagician
- Projects you think this work could be useful for: Polkadot & Kusama Validators
Project Description 📄
Summary
Validator leader selection via Raft consensus. Inspired by internal discussions & certus.one active-active validator setup.
Present state
Currently, the recommended setup is to have one active node per validator. The main advantage of this approach is that it removes the danger of equivocation, thus preventing slashing. The key drawback is the lack of a ready backup machine to takeover the responsibility of producing blocks, voting on finality etc. in case the primary one fails.
The drawback can be somewhat remedied by having a backup node in sync, but without access to the session keys necessary for authoring blocks. The process of replacing the keys, however, is manual. Furthermore, the session keys cannot be replaced mid-session and this introduces a time delay for when the validator is active again.
Solution
Rather than relying on one validator node to perform the work, what if we had multiple nodes equally capable of signing messages, yet still avoiding the risk of equivocation? The proposed approach relies on recognising a distinction between signing a message and submitting it.
Since all our validator nodes are trusted and we don't worry about censorship resistance, we can leverage a leader-follower model to ensure high availibility. Raft is a good candidate - it offers a simple consensus mechanism, fault-tolerance and performance. It ensures only one leader is ever in charge of interacting with the blockchain, while the followers simply receive the state updates and automatically replace the leader in case of a failure.