Sudhir Nakka

Distributed Systems - Reaching Consensus, Part 1: Paxos

December 28, 2024 (9m ago)8 views

Introduction

Paxos diagram showing Proposers, Acceptors, and Learners across Prepare, Accept, and Learning phases

This is Part 1 of a two-part series on consensus algorithms. If you’re looking for Raft, see Part 2: /2024/raft.

In the realm of distributed systems, achieving agreement among multiple nodes is one of the most fundamental challenges. Consensus algorithms ensure data consistency and availability in the presence of failures. This part focuses exclusively on Paxos.

Understanding the Consensus Problem

What is Consensus?

Consensus in distributed systems refers to the process by which multiple nodes agree on a single value or course of action despite potential failures or differences in their initial states.

A consensus algorithm must satisfy three critical properties:

  1. Agreement: All non-faulty nodes must decide on the same value
  2. Validity: The decided value must be one that was proposed by some node
  3. Termination: All non-faulty nodes must eventually reach a decision

The CAP Theorem and Trade-offs

The CAP Theorem (Consistency, Availability, Partition Tolerance) shapes how consensus algorithms are designed and deployed. In a fully asynchronous system, it is impossible to guarantee consensus with even a single process failure. Practical algorithms make timing/failure-detection assumptions.

Paxos: The Foundational Algorithm

Origins and Design Philosophy

Paxos, developed by Leslie Lamport, was the first rigorously proven consensus algorithm. It assumes an asynchronous network and crash-stop failures (no Byzantine failures). Paxos guarantees a single chosen value and that non-faulty processes learn it.

How Paxos Works

Paxos defines three roles:

The algorithm operates in multiple phases:

Phase 1: Prepare

  1. A proposer selects a unique proposal number and sends a "prepare" request to a majority of acceptors
  2. Acceptors respond with a promise not to accept proposals with lower numbers
  3. If an acceptor has previously accepted a proposal, it includes that information in its response
Paxos diagram showing Proposers

Phase 2: Accept

  1. If the proposer receives promises from a majority of acceptors, it sends an "accept" request
  2. The accept request contains the proposal number and either the proposer's value or the highest-numbered value from the prepare responses
  3. Acceptors accept the proposal if it matches their promise
Paxos diagram showing Acceptors

Learning Phase

Once a majority of acceptors have accepted a proposal, the value is considered chosen and is communicated to all learners.

Paxos diagram showing Learners

Multi-Paxos

While basic Paxos solves consensus for a single value, Multi-Paxos extends it to handle a sequence of decisions, essential for replicated state machines.

Advantages and Challenges of Paxos

Advantages:

Challenges:

Continue to Part 2: Raft → /2024/raft