I survived David Beazley’s weeklong course on the Raft consensus algorithm that powers technologies like Kubernetes, MongoDB, and Neo4j.
Image from https://raft.github.io/
The Raft Consensus Algorithm is a way for a gaggle of computers to agree on a sequence of events, or a “log” of events. Raft is useful for things like databases – once your data spans multiple computers, you will have consistency issues. Imagine a group of friends who want to go on vacation. They each have an itinerary in mind, but if each one has a different idea for what to do on a given day, there will be chaos. In this scenario, each friend is a computer and the itinerary is the sequence of events that they’re trying to keep synchronized.
In broad swaths, Raft is about keeping those logs synchronized through a democratic election process. Initially, every server starts out as a peer and then after a time, one will decide to solicit votes from their peers to become a leader. If a majority vote for that peer, then that server will become a leader and only then will the group as a whole become operational.
Going back to the vacation itinerary analogy, everyone starts out on the same page until one of the friends takes the initiative to start planning. Once everyone votes a leader in, people can start to add items to the itinerary by telling the leader, who then replicates those items to its followers.
Having a majority of servers operational and acknowledging the leader is crucial to this algorithm due to the fact that servers may go down, or the network may break down. This algorithm is intended to maintain availability in the face of failure. So only data that has been “committed” by a majority of the servers is acknowledged by the leader. If less than a majority of peers has been able to write an “itinerary item,” then that item is considered un-committed and can be overwritten by the next item that comes in.
There are countless more details that go into the algorithm, like timeouts triggering new elections in case a leader goes down, log replication rules for log entries that conflict with each other in case there are rogue leaders, and carefully tracking term and item numbers to validate that a leader is, in fact, legitimate, as well as concerns about how to interface with external applications.
The class, which Beazley calls “Rafting Trip” (https://dabeaz.com/raft.html), consisted of 30-60 minute long discussion sessions followed by up to 2 hours of coding, with a break in the middle around 1pm central time, and then another afternoon session. All in all, days went from 9:30am to 5:30pm. I was grateful to have my wife (who is also juggling work!) and my dad watching the kids while I focused on the class; otherwise this would have been impossible.
It was a humbling experience. For comparison, Beazley likened the complexity to writing an operating system by the last day (today), or even harder, and he would know since he taught the operating systems course at a university level.
Beazley’s instruction style was informal, but also probing. He treated us all as colleagues simply exploring an interesting and devilishly complex problem – he even coded a Raft implementation himself alongside us. It was really refreshing to be able to focus on a problem in depth with some like-minded programmers instead of playing kick-the-can with technical debt because of business needs.
I decided to attempt to use Rust to implement the challenge solutions, which was certainly enlightening, if exceedingly painful (in a good way, I promise).
Every time I choose Rust for a challenging programming problem it does its best to make me regret it, and then I come out with a stronger conviction to use it again next time. Rust’s appeal is mostly masochistic; convince me otherwise.
Since the algorithm has to do with multiple systems in play, I not only sharpened up on my multithreaded programming, but also on fundamentals of networking. Unfortunately, I didn’t get far enough along to have my multithreaded programs talk to each other, but I was able to get Udp packets sent from program to program and finally exercise my brain on the differences between sharing state within a thread (reference counted: Rc
), across threads (atomically reference counted: Arc
), and various mechanisms for sharing or mutating data like RefCell
, Mutex
, and RwLock
, which are for interior mutability, mutually exclusive threadsafe access, and concurrent threadsafe access, respectively.
My strategy when implementing Raft’s election process was to focus on a Channel
trait that I could mock out in order to mimic messages being sent. The thought was to make the system testable by faking interactions between servers, which I called RaftBuddy
s because my brain was melting at the time and it amused me. Later, I could implement the Channel
trait to actually talk to the network. Alas, everyone has a plan until the plan punches you in the face with a faulty Raft implementation.
// My poor vestigial Channel implementation
use crate::raft_message::RaftMessage;
#[derive(PartialEq, Eq, Clone, Default)]
pub struct RaftChannel {
pub queue: Vec<RaftMessage>,
}
pub trait Channel {
fn push(&mut self, message: RaftMessage);
fn pop(&mut self) -> Option<RaftMessage>;
fn all_messages(&mut self) -> Vec<RaftMessage>;
}
impl Channel for RaftChannel {
fn push(&mut self, message: RaftMessage) {
todo!()
}
fn pop(&mut self) -> Option<RaftMessage> {
todo!()
}
fn all_messages(&mut self) -> Vec<RaftMessage> {
todo!()
}
}
#[derive(PartialEq, Eq, Clone, Default)]
struct TestRaftChannel {
queue: Vec<RaftMessage>,
}
impl Channel for TestRaftChannel {
fn push(&mut self, message: RaftMessage) {
self.queue.push(message);
}
fn pop(&mut self) -> Option<RaftMessage> {
self.queue.pop()
}
fn all_messages(&mut self) -> Vec<RaftMessage> {
self.queue.clone()
}
}
(Do I actually think all_messages
is a good idea as a public interface? No, but it made testing easier and I was more focused on desperately shoving the wreckage of my Raft implementation around, like a college student trying to make his dorm room presentable before his parents visit.)
I got as far as leader elections requiring a majority before the end of the week. A lot of it was managing my energy levels. Oftentimes, I would just lie on the floor and cry think about the problem. I thought this was going to be difficult coming in, and I still underestimated how difficult it was.
If I were to go back in time, I would probably make a point to do the projects in Ruby. It was hard enough to complete the projects, but learning the proper data structures to do multithreaded Rust as well definitely distracted from the Raft algorithm itself. However, now that I’ve stumbled through an initial attempt, I am pretty happy with what I learned and if I were to redo the course, I’d gladly do it in Rust again.
On that point, it did seem like Rust’s type safety, when applied to multithreaded code, did end up having some advantages over the dynamically typed Python code that many of my classmates were using. While we all wrote multithreaded code, Rust’s enforcement of the appropriate data types gave me a lot more trouble upfront, whereas other participants in the class seemed to have more issues debugging issues with shared mutable state. So I’m not sure a Ruby rewrite would be much easier – rather, it would just present a different set of problems.
That’s why when David asked what we would do differently, my only real feedback was that I would have started a month ago. The algorithm is so heinously complex that I don’t think any particular change would have been any easier. I’m certain I would have faced just as many issues in the end.
This course and the algorithm were both ridiculously difficult, humbling, and extremely painful. I would highly recommend taking it if you ever have a chance, and now I’m eyeing David Beazley’s other courses as well.
Links:
Leave a Reply