Two Phase Commit

Category: Distributed Systems

Two Phase Commit

ACID

For updates that span multiple keys, or even multiple updates across different storage systems, we need all-or-nothing semantics so errors can be properly handled. Two-phase commit (2PC) is a protocol that ensures distributed transactions, i.e. groups of operations, are atomic, consistent, isolated, and durable (ACID).

Term Description
Atomic operations appear to either happen as a group, or not at all
Durable Operations that complete stay completed
Isolation Other transactions don't see the results until of earlier transactions unless they were already committed
Consistency linearizability (or some other consistency model)

Two Phase Locking (2PL) - Consistency and Isolation

In 2PL, locks are acquired on all structures touched during the transaction, and are only released upon commit or abort. This provides isolation and consistency for multi-key transactions.

- start transaction -
Phase 1: acquire locks
- commit or abort -
Phase 2: release locks

Redo Logging - Atomicity and Durability

Log all changes to disk, followed by a log commit. If there is a crash before the log commit, abandon the transaction. If it was committed in the log, we can just redo the changes.

Deadlock

Deadlock is when two or more transactions are waiting for locks held by each other in a cycle. To solve this you can stop one of the transactions to break the cycle.

Deadlock prevention is generally a better idea, and you can achieve it by always ordering lock acquisition consistently.

Distributed Transactions

From the two generals problem, it is theoretically impossible to agree on performing some action at the same time. Instead, we agree in virtual time when an operation happens.

Atomic Commit Protocol (ACP)

  • Every node arrives at the same decision
  • Once a node decides, it never changes
  • Transaction is committed only if all nodes vote yes
  • If all processes vote yes the transaction is usually committed
  • If all failures are eventually repaired, the transaction is eventually either committed or aborted

2PC in Detail

2PC is a blocking protocol, meaning that it makes no progress if some participants are unavailable. It has fault tolerance, but is not highly available, which is a fundamental limit of the protocol.

  • For a given transaction a central coordinator sends a prepare
  • Participants commit to commit
  • Acquires locks, prevent/delay conflicting operations
  • Abort if deadlock or if any of the operations cannot be completed
  • Central coordinator decides and tells everyone, then releases all locks

Handling Failures

Participant Fails Before Sending Response

You can maintain a timer on the coordinator to retry prepares. If some threshold is reached, just log a no and abort

If the participant then comes back online, they will need to ask the coordinator for the decision, at which point the coordinator sends an abort to the participant

Participant Fails Before After Sending Vote

If the participant crashes immediately after sending their response. Then either they come back online before the commit is sent, at which point the protocol continues, or they will need to check their log and request the decision from the coordinator, which will resent the commit and the protocol continues.

Coordinator Fails Before Sending Prepare

They would have logged the prepare request, so when they come back online and execute the transaction.

Coordinator Fails After Sending Prepare

If the coordinator fails after sending prepares, but before receiving responses, they must have logged the prepared already, and they need to be resent.

Roles

  • Participants: Nodes that must update data relevant to the transaction
  • Coordinator: node responsible for executing the protocol (might also be a participant)

Messages

  • Prepare: Can you commit the transaction?
  • Commit: commit the transaction
  • Abort: abort the transaction