A distributed key-value project built with OpenRaft 0.10.0-alpha.15 and SurrealKV.
English | 简体中文
- Real
openraft::Raftruntime is wired in. - OpenRaft storage traits are implemented:
RaftLogStorageRaftLogReaderRaftStateMachineRaftSnapshotBuilder
- Config-driven multi-node bootstrap is available.
- 3-node election and replication integration tests are available behind a feature flag for CI stability.
- English Usage Guide:
docs/USAGE_GUIDE.md - 中文使用指南:
docs/USAGE_GUIDE_CN.md
src/
app.rs # RaftNode wrapper (startup/write/read/shutdown)
config.rs # Config model + strict validation
main.rs # Process entrypoint
storage.rs # SurrealStorage core
storage_raft_impl.rs # OpenRaft trait implementations
network/ # gRPC client/server
tests/
common/cluster_harness.rs # Shared 3-node integration harness
cluster_observability.rs # Election + replication visibility integration tests
See config.toml.example.
Important fields in [cluster]:
bootstrap: whether this node performsinitializeexpected_voters: expected voter count (self + peers)peers: peer list (node_id,addr)
Strict validation rules in src/config.rs:
- duplicate peer
node_idis rejected - duplicate peer
addris rejected - local node must not appear in
peers - peer
addrmust not equal locallisten_addr expected_votersmust be> 0(if set)expected_votersmust equal1 + peers.len()(strict, regardless ofbootstrap)
Single node:
cargo run -- --config config.toml.exampleMulti-node:
- Prepare one config file per node.
- Set
cluster.bootstrap = trueon exactly one node. - Keep
cluster.expected_votersconsistent across all nodes. - Ensure all peer addresses are reachable.
Regular tests:
cargo testFeature-gated 3-node integration tests:
cargo test --features integration-cluster --test cluster_observabilityRun one integration case:
cargo test --features integration-cluster --test cluster_observability test_three_node_election_observability
cargo test --features integration-cluster --test cluster_observability test_three_node_replication_visibilityThe merge pipeline uses stable MERGE_* codes from src/merge/error_codes.rs.
| Error Code | Meaning | Typical Trigger |
|---|---|---|
MERGE_BASELINE_MISSING |
No baseline checkpoint metadata in snapshot state | merge starts before baseline is persisted |
MERGE_BASELINE_PATH_REQUIRED |
Strict mode requires explicit checkpoint_path |
legacy/incomplete metadata without path |
MERGE_BASELINE_PATH_MISSING |
checkpoint_path is set but directory does not exist |
checkpoint files removed/corrupted |
MERGE_INJECTED_FAILURE |
Test-only injected backend failure | integration/unit fault injection |
MERGE_UNKNOWN |
Fallback code when parser cannot extract a known prefix | non-standard merge error format |
raft_snapshot_merge_failed_total and failed raft_snapshot_merge_duration_ms samples carry:
trigger: merge trigger reason (chain_length,delta_bytes,checkpoint_interval)error_code: stable merge failure code (from table above)node_id,retries,result=failed
Example query dimensions for dashboards/alerts:
- high rate of
error_code="MERGE_BASELINE_PATH_MISSING" - repeated failures on same
node_idwith increasingretries - spike on a specific
trigger
MIT OR Apache-2.0