Skip to content

raft: revamp async log storage protocol #124440

Description

@pav-kv

The async log storage protocol (etcd-io/raft#8) can be improved in a few aspects:

  • The ABA problem prevention can be achieved by a more straightforward algorithm when we have the "last accepted term" tracking [raft: advance commit index safely #122690].
  • The need for preventing the ABA problem can be eliminated in the first place, if we delegate the unstable log to storage engine synchronously, and only assume flush/fsync is asynchronous [raft: clean unstable log early #122438].
  • We can remove various fields of raftpb.Message used only by this local protocol, to unclutter Message and make it smaller.

To support the new log write protocol, we are missing the notion of "leader term" - the term of the leader with whom our log is consistent. By raft invariants, all writes to, and acknowledgements from log storage are ordered by (leader term, index). Today, we approximate the "leader term" by using the last entry ID, but it complicates the protocol. It is also hard to reuse for Replication Admission Control because it requires remembering the unacknowledged entry IDs, but these are cleared from the unstable data structure as soon as the entries are written.

The plan is to support the "leader term" tracking:


Epic: CRDB-37515, CRDB-46488
Jira issue: CRDB-38897

Metadata

Metadata

Assignees

Labels

A-kv-replicationRelating to Raft, consensus, and coordination.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-kvKV Team

Type

No type

Fields

No fields configured for issues without a type.

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions