A game changer for building robust distributed systems


By Tanya Petersen, EPFL
Thursday, 09 November, 2023


A game changer for building robust distributed systems

Consensus is one of the fundamental problems in distributed systems. It allows a group of machines to maintain multiple copies of data and update them consistently, even when a fraction of the machines might fail.

Take the example of three servers that need to store three copies of data and keep track of any updates to information so that all three servers remain consistent. If one server fails, the remaining two must keep the data consistent and allow updates to continue normally as if there was no failure.

Current state-of-the-art consensus protocols to achieve consensus rely on one computer node being designated a leader at any given time, continually supervising and handling any updates to data. If the leader fails another node wakes up and takes over, but there’s a challenge. How long should another node wait before taking over from an unresponsive leader?

“If the leader fails or the network is bad, the problem with the classic consensus protocols is that there’s the very tricky question of how you decide how big or small the timeout should be,” explained Professor Bryan Ford, Head of the Decentralized and Distributed Systems Laboratory (DEDIS) in EPFL’s School of Computer and Communications Sciences (IC). “If you set it too big, then when a leader fails, you might be waiting a long time and the system is just dead. On the other hand, consider if you set the timeout too short — this is where the real disaster can happen.

“Suppose the old leader hasn’t failed, suppose the network is just a little slower than you thought it was, the next leader comes and tries to take over, but the way all the existing protocols work, the new leader’s actions will cancel what the old leader’s actions did so it can no longer finish what it was doing and all its work is wasted. These kinds of issues can cause major reliability problems and these leader-based protocols can fail entirely if there’s a deliberate denial of service attack,” he continued.

To overcome these challenges, DEDIS researchers have been investigating a rarely used class of consensus algorithms, known as asynchronous consensus protocols. Unlike current leader-based protocols, their asynchronous cousins are not vulnerable to leader failures and denial-of-service attacks. But there’s a big trade-off — prior asynchronous protocols are much less efficient under normal conditions, and that’s one reason they are almost never deployed.

For the first time, Ford said, their QuePaxa protocol changes this dynamic. “We’ve come up with a win-win. What is new and unique to QuePaxa is that it’s an asynchronous consensus protocol that finally achieves efficiency equivalent to the widely deployed leader-based protocols under normal network conditions. QuePaxa is just as fast, efficient, low latency and low cost in terms of network bandwidth, under normal conditions.”

The new algorithm is designed in such a way that one leader at a time is usually expected to lead the task of making progress, but a second leader can come in and help in the same round without interfering with the first one. A third leader could even join and help the other two finish the work more quickly. There will be some redundancy of effort, but the non-leaders don’t destructively interfere. Short delays don’t cause leaders to cancel each other’s work as with current protocols.

Another advantage of QuePaxa is that it is also extremely robust under bad conditions such as noisy networks, high communication delays, unpredictably- varying network delays or deliberate denial-of-service attacks.

“Under these conditions existing consensus protocols will just die completely. QuePaxa will keep going; it’s much more robust,” he continued. “In any place where there are significant concerns about performance, reliability or vulnerability to these kinds of attacks I believe this is a game changer for robustness reasons and this should be the new standard consensus protocol.”

The DEDIS team has already built an open source prototype of QuePaxa, which is available on the well-known GitHub repository. The new protocol has already gone through an artefact evaluation review process at SOSP, where peer reviewers have tested its capabilities.

Image credit: iStock.com/shulz

Related Articles

You are here

The benefits of using cellular services and Wi-Fi to keep track of valuable goods.

A guide to implementing remote monitoring

Remote monitoring enables people and companies to collect data from locations where conventional...

Nanowire networks learn and remember like a human brain

Scientists at the University of Sydney have demonstrated nanowire networks that can exhibit both...


  • All content Copyright © 2024 Westwick-Farrow Pty Ltd