You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add 2PC implementation plan with corrected protocol
Documents the full pipelined 2PC protocol for coordinator and participant,
including the persistence barrier, serializable isolation (participant holds
MutTxId across all calls in a coordinator transaction), two-phase participant
response (immediate result + deferred PREPARED after durability), abort
paths, commitlog format, and replay semantics.
Identifies the open problem: MutTxId is !Send but must be held across
multiple HTTP requests on the participant side.
The TPC-C benchmark on branch `origin/phoebe/tpcc/reducer-return-value` (public submodule) uses non-atomic HTTP calls for cross-database operations. We need 2PC so distributed transactions either commit on both databases or neither. Pipelined 2PC is chosen because it avoids blocking on persistence during lock-holding, and the codebase already separates in-memory commit from durability.
6
+
7
+
## Protocol (Corrected)
8
+
9
+
### Participant happy path:
10
+
11
+
1. Receive CALL from coordinator (reducer name + args)
12
+
2. Execute reducer (write lock held)
13
+
3. Return result to coordinator (write lock still held, transaction still open)
14
+
4. Possibly receive more CALLs from coordinator (same transaction, same write lock)
15
+
5. Receive END_CALLS from coordinator ("no more reducer calls in this transaction")
16
+
6. Commit in-memory (release write lock)
17
+
7. Send PREPARE to durability worker
18
+
8.**Barrier up** -- no more durability requests go through
19
+
9. In background: wait for PREPARE to be durable
20
+
10. Once durable: send PREPARED to coordinator
21
+
11. Wait for COMMIT or ABORT from coordinator
22
+
12. Receive COMMIT
23
+
13. Send COMMIT to durability worker
24
+
14.**Barrier down** -- flush buffered requests
25
+
26
+
### Coordinator happy path:
27
+
28
+
1. Execute reducer, calling participant reducers along the way (participants hold write locks, return results, but don't commit)
29
+
2. Reducer succeeds
30
+
3. Send END_CALLS to all participants (they can now commit in-memory)
6.**Barrier up** -- no more durability requests go through
34
+
7. Wait for coordinator's own PREPARE to be durable
35
+
8. Wait for all participants to report PREPARED
36
+
9. Send COMMIT to all participants
37
+
10. Send COMMIT to durability worker
38
+
11.**Barrier down** -- flush buffered requests
39
+
40
+
### Key correctness properties:
41
+
42
+
-**Serializable isolation**: Participant holds write lock from CALL through END_CALLS. Multiple CALLs from the same coordinator transaction execute within the same MutTxId on the participant. The second call sees the first call's writes.
43
+
-**Persistence barrier**: After PREPARE is sent to durability (step 7/8 on participant, step 5/6 on coordinator), no speculative transactions can reach the durability worker until COMMIT or ABORT. Anything sent to the durability worker can eventually become persistent, so the barrier is required.
44
+
-**Two responses from participant**: The immediate result (step 3) and the later PREPARED notification (step 10). The coordinator collects both: results during reducer execution, PREPARED notifications before deciding COMMIT.
45
+
-**Pipelining benefit**: Locks are held only during reducer execution (steps 1-6), not during persistence (steps 7-14). The persistence and 2PC handshake happen after locks are released on both sides.
46
+
47
+
### Abort paths:
48
+
49
+
**Coordinator's reducer fails (step 2):**
50
+
- Send ABORT to all participants (they still hold write locks)
51
+
- Participants rollback their MutTxId (release write lock, no changes)
52
+
- No PREPARE was sent, no barrier needed
53
+
54
+
**Participant's reducer fails (step 2):**
55
+
- Participant returns error to coordinator
56
+
- Coordinator's reducer fails (propagates error)
57
+
- Coordinator sends ABORT to all other participants that succeeded
58
+
- Those participants rollback their MutTxId
59
+
60
+
**Coordinator's PREPARE persists but a participant's PREPARE fails to persist:**
61
+
- Participant cannot send PREPARED
62
+
- Coordinator times out waiting for PREPARED
63
+
- Coordinator sends ABORT to all participants
64
+
- Coordinator inverts its own in-memory state, discards buffered durability requests
65
+
66
+
**Crash during protocol:**
67
+
- See proposal §8 for recovery rules
68
+
69
+
### Open problem: MutTxId is !Send
70
+
71
+
The participant holds MutTxId across multiple HTTP requests (CALL, more CALLs, END_CALLS). MutTxId is !Send (holds SharedWriteGuard). Options:
72
+
73
+
1.**Dedicated blocking thread per participant transaction**: spawn_blocking holds the MutTxId, communicates via channels. HTTP handlers send messages, blocking thread processes them.
74
+
2.**Session-based protocol**: Participant creates a session on first CALL, routes subsequent CALLs and END_CALLS to the same thread/task that holds the MutTxId.
75
+
3.**Batch all calls**: Coordinator sends all reducer calls + args in a single request. Participant executes them all, returns all results, then commits. Single HTTP round-trip, no cross-request MutTxId holding.
76
+
77
+
Option 3 is simplest but limits the coordinator to not making decisions between calls. Option 1 is most general. TBD.
78
+
79
+
## Commitlog format
80
+
81
+
- PREPARE record: includes all row changes (inserts/deletes)
82
+
- COMMIT record: follows PREPARE, marks transaction as committed
83
+
- ABORT record: follows PREPARE, marks transaction as aborted
84
+
- No other records can appear between PREPARE and COMMIT/ABORT in the durable log (persistence barrier enforces this)
85
+
86
+
## Replay semantics
87
+
88
+
On replay, when encountering a PREPARE:
89
+
- Do not apply it to the datastore
90
+
- Read the next record:
91
+
- COMMIT: apply the PREPARE's changes
92
+
- ABORT: skip the PREPARE
93
+
- No next record (crash): transaction is still in progress, wait for coordinator or timeout and abort
0 commit comments