Non-Blocking Two Phase Commit

Regular 2PC is blocking because we need to wait for all nodes to agree that an operation is commit-able. There are massive performance implications to read-only transactions (could fix with snapshot reads) and lock contention. We can fix this by relying on Paxos.

We make the coordinator as well as each participant a Paxos group with its own shard, and then we can just wait until all groups agree to an operation.

2PC on Paxos

  1. Client requests multi-key operation at coordinator
  2. Coordinator logs request
  3. Coordinator sends prepare
  4. Participants decide to commit/abort, and log result
  5. Coordinator sends a commit/abort
  6. Participants record results

Multi-Key Transactions in a KV Store

Assuming reader-writer locking scheme, and that the application code runs on the client:

The basic approach is to:

Caution: Deadlocks

Deadlocks are very easy to trigger when performing operations across shards. A general solution is to always kill things that need to wait. For instance, with a checking/savings account at a bank:

Deadlocks can be an issue in sharded systems when moving shards across groups.

Google's Bigtable in Retrospect

Jeff Dean of Google said that not supporting distributed transactions was the biggest mistake in the the design of Bigtable. Incremental updates make it a very important feature, so users really wanted them and often tried to implement it themselves on top of Bigtable

Spanner, Google's multi-datacenter KV store uses 2PC over Paxos, and is one of the backbones of Google's ad service.