Distributed Systems - Reaching Consensus, Part 1: Paxos
@SudhirNakka07|December 28, 2024 (9m ago)8 views
Introduction

This is Part 1 of a two-part series on consensus algorithms. If you’re looking for Raft, see Part 2: /2024/raft.
In the realm of distributed systems, achieving agreement among multiple nodes is one of the most fundamental challenges. Consensus algorithms ensure data consistency and availability in the presence of failures. This part focuses exclusively on Paxos.
Understanding the Consensus Problem
What is Consensus?
Consensus in distributed systems refers to the process by which multiple nodes agree on a single value or course of action despite potential failures or differences in their initial states.
A consensus algorithm must satisfy three critical properties:
- Agreement: All non-faulty nodes must decide on the same value
- Validity: The decided value must be one that was proposed by some node
- Termination: All non-faulty nodes must eventually reach a decision
The CAP Theorem and Trade-offs
The CAP Theorem (Consistency, Availability, Partition Tolerance) shapes how consensus algorithms are designed and deployed. In a fully asynchronous system, it is impossible to guarantee consensus with even a single process failure. Practical algorithms make timing/failure-detection assumptions.
Paxos: The Foundational Algorithm
Origins and Design Philosophy
Paxos, developed by Leslie Lamport, was the first rigorously proven consensus algorithm. It assumes an asynchronous network and crash-stop failures (no Byzantine failures). Paxos guarantees a single chosen value and that non-faulty processes learn it.
How Paxos Works
Paxos defines three roles:
- Proposers: Initiate proposals
- Acceptors: Vote and determine chosen value
- Learners: Learn the chosen value
The algorithm operates in multiple phases:
Phase 1: Prepare
- A proposer selects a unique proposal number and sends a "prepare" request to a majority of acceptors
- Acceptors respond with a promise not to accept proposals with lower numbers
- If an acceptor has previously accepted a proposal, it includes that information in its response

Phase 2: Accept
- If the proposer receives promises from a majority of acceptors, it sends an "accept" request
- The accept request contains the proposal number and either the proposer's value or the highest-numbered value from the prepare responses
- Acceptors accept the proposal if it matches their promise

Learning Phase
Once a majority of acceptors have accepted a proposal, the value is considered chosen and is communicated to all learners.

Multi-Paxos
While basic Paxos solves consensus for a single value, Multi-Paxos extends it to handle a sequence of decisions, essential for replicated state machines.
Advantages and Challenges of Paxos
Advantages:
- Theoretically robust with formal correctness proofs
- Handles network delays, message loss, and node failures gracefully
- Foundation for many distributed systems
Challenges:
- Complex to understand and implement correctly
- Requires careful handling of edge cases and failure scenarios
Continue to Part 2: Raft → /2024/raft