Skip to content

Integration/qauld ctl#883

Draft
dastansam wants to merge 41 commits into
mainfrom
integration/qauld-ctl
Draft

Integration/qauld ctl#883
dastansam wants to merge 41 commits into
mainfrom
integration/qauld-ctl

Conversation

@dastansam

Copy link
Copy Markdown
Contributor

No description provided.

dastansam added 30 commits June 2, 2026 20:24
Implements the primitives described in plan.md for rotating the Noise
KK session between two peers, without wiring any triggers yet. All
behaviour is exercised only from unit tests in this phase; message
dispatch plumbing (periodic trigger, volume counter, grace-window
tick) arrives in Phase 2.

Design choices baked in:
- Session-id collision resolution: lower new_session_id wins
  (symmetric, local, no PeerId ordering required).
- No signature field on RotateHandshakeSecond — Noise KK already
  authenticates both endpoints via their static keys.
- Grace period default 1 h (configurable via CryptoRotation).

Protobuf
  crypto_net.proto: RotateHandshakeFirst / RotateHandshakeSecond
  messages and matching oneof variants on CryptoserviceContainer.

Config
  storage::configuration::CryptoRotation with enabled=false default,
  added to the Configuration struct as crypto_rotation.
  Upgrade migration and config_persistence test updated.

Storage
  New per-user sled tree "rotation_meta" on CryptoAccount.
  RotationMeta { primary_session_id, pending_initiated_session_id,
  draining_session_id, draining_until, draining_remaining_volume }.
  Get/save/delete helpers, delete_state for abandoning a rotation,
  and a test_account() helper for tests that bypass global state.

Primitives (services/crypto/noise.rs)
  rotate_initiate          — create fresh session_id, KK step 1,
                             record pending_initiated on meta.
  rotate_complete_responder — handle incoming rotate_first; on
                             collision, lower session_id wins; on
                             nonce mismatch, abandon; on success
                             emit rotate_second and move primary
                             into the grace window.
  rotate_finalize_initiator — handle rotate_second for our pending;
                             KK step 2; flip primary.
  drain_expired_rotations  — scan rotation_meta and retire any
                             draining session past its deadline or
                             with zero grace_remaining_volume.

Sessionmanager gets a log-and-drop stub for the new oneof variants;
Phase 2 replaces it with real dispatch.

Tests (6, all pass)
  rotation_meta_roundtrip, rotation_meta_keyed_per_peer,
  drain_leaves_unexpired, drain_retires_time_expired,
  drain_retires_volume_exhausted, drain_noop_on_primary_only_meta.

End-to-end rotation tests (clean rotation, collision, late message
within/past grace, replayed nonce) are deferred to Phase 2 / Phase 4
integration tests because the primitives depend on global Users,
Configuration, and CRYPTOSTORAGE state — constructing those is a
libqaul-init operation, not a unit-test operation.

No behaviour change for existing peers: rotate_* frames are never
sent (trigger wiring lands Phase 2), and incoming rotate_* frames
are logged and dropped for now.
Turns the Phase 1 primitives into a live feature. Rotation is still
gated behind `CryptoRotation::enabled` (default false), so unchanged
defaults give byte-identical behaviour to main for existing peers.

What fires rotation now
  - Outbound send: `Crypto::encrypt` post-hook checks session age vs
    `period_seconds` and `index_nonce_out` vs `volume_messages`; on
    trigger, calls `rotate_initiate` and sends the resulting
    `RotateHandshakeFirst` as a `CryptoserviceContainer` through the
    normal `Messaging::pack_and_send_encrypted_data` path, encrypted
    under the currently-primary session.
  - Inbound receive: `Crypto::decrypt` post-hook checks
    `highest_index_nonce_in` vs `volume_messages` for messages
    arriving on the primary and fires a rotation symmetrically.

Dispatch of incoming rotation frames
  - `sessionmanager::process_rotate_first` calls
    `rotate_complete_responder`, then encrypts the resulting
    `RotateHandshakeSecond` **under the now-draining old session**
    (the initiator hasn't promoted yet) and sends it.
  - `sessionmanager::process_rotate_second` calls
    `rotate_finalize_initiator` to flip primary on the initiator side.
  - Two new helpers — `create_rotate_first_message` and
    `create_rotate_second_message` — mirror the existing
    `create_second_handshake_message` wrapper pattern.

Primary-session resolution
  - `Crypto::resolve_primary_state` consults `rotation_meta` so the
    post-rotation window (where a responder briefly has two Transport
    rows for the same peer) sends subsequent user traffic on the new
    primary, not whichever row `get_state` happens to find first.
  - `Crypto::encrypt` now uses `resolve_primary_state`; the decrypt
    path is unchanged (it already looks up by `message.session_id`).

Draining grace on the decrypt side
  - `Crypto::after_decrypt_rotation` decrements
    `draining_remaining_volume` on each successfully decrypted
    Transport message that arrives on the draining session, so the
    grace budget is honoured per message (separate from the time
    deadline handled by the drain ticker).

CryptoState
  - New `established_at: u64` (ms) with `#[serde(default)]` so
    existing on-disk rows deserialise with 0 and therefore never
    trip the time-based trigger until they re-handshake. Set on KK
    step-2 completion on both sides.

Periodic drain
  - New `rotation_ticker` (60 s) added to both `run`/`event_loop` and
    the `start_instance` loop. On tick, iterates
    `UserAccounts::get_all_users()` and calls
    `CryptoNoise::drain_expired_rotations` per account, gated on
    `cfg.crypto_rotation.enabled`.

Deferred to a follow-up
  - End-to-end integration tests (clean rotation, collision, late
    message within/past grace, replayed nonce). These require
    standing up global `Users`, `Configuration`, and `CRYPTOSTORAGE`
    state, which is a libqaul-init operation; tests belong in a
    dedicated integration harness and will land as Task 11 in a
    follow-up commit.

All 27 existing lib tests still pass.
Six new tests exercising the helpers introduced by Phase 2:

  resolve_primary_state
    - resolve_primary_prefers_meta_designated_row — when
      rotation_meta names a primary and both Transport rows exist,
      the meta-designated one is returned (the post-responder-step
      ambiguity fix).
    - resolve_primary_falls_back_without_meta — legacy get_state
      path when no rotation activity has happened.
    - resolve_primary_ignores_missing_state_for_meta_primary —
      stale-meta safety: fall back to get_state rather than
      returning None.

  after_decrypt_rotation
    - after_decrypt_decrements_draining_volume — a message
      decrypted on the draining session decrements
      `draining_remaining_volume` by exactly one; primary fields
      remain untouched.
    - after_decrypt_saturates_at_zero — saturating_sub prevents
      underflow when the budget is already exhausted.
    - after_decrypt_noop_on_unrelated_session — a session_id that
      matches neither primary nor draining is ignored.

To drive `Configuration::get()` from these tests without the full
libqaul init chain, add `Configuration::init_for_tests(cfg)` — a
`#[cfg(test)]` idempotent installer for the `CONFIG` InitCell.
`Configuration::default()` could not be used: `Internet::default`
reads `DEFCONFIGS` which is only populated by `Libqaul::new`, so the
test fixture builds the Configuration struct literally from the
sub-modules' self-contained defaults.

Full end-to-end rotation tests (clean rotation across two in-
process peers running the real Noise handshake, collision-loss
path, replayed nonce rejection, grace-window expiry in the face of
live traffic) require `Users::init`, `DataBase::init`, and
`CryptoStorage::init` against tempdirs — a non-trivial fixture that
belongs in plan.md's Phase 4 local-mesh integration harness rather
than here.

All 33 libqaul lib tests pass.
Exposes the Phase 1/2 CryptoRotation settings to clients via a
standard module-scoped RPC, and a qaul-cli sub-command set. No
event surface yet — a `RotationEvent` log (`Rotated`,
`GraceExpired`) is a plausible Phase 3 follow-up but is split from
this commit to keep the diff focused.

Protobuf
  - rpc/qaul_rpc.proto: `CRYPTO = 16` in the Modules enum.
  - services/crypto/crypto_rpc.proto (new): `Crypto` oneof
    container with `GetConfigRequest`, `GetConfigResponse`,
    `SetConfigRequest`, `SetConfigResponse`. Every SetConfigRequest
    field is `optional`, so clients send *partial* updates —
    libqaul treats unset fields as "leave untouched".

libqaul
  - `Crypto::rpc(data, user_id, request_id)` (services/crypto/mod.rs):
    decodes the Crypto container, routes GetConfig/SetConfig to
    `handle_get_config` / `handle_set_config`.
  - `handle_set_config` validates each numeric field (rejecting
    zero with a per-field error message — rotating on every
    message, or retiring draining on first message, are near-
    certain client mistakes), applies only the present fields,
    persists via `Configuration::save()`, and echoes the post-
    update config in `SetConfigResponse.applied`.
  - `rpc/mod.rs`: dispatches `Ok(Modules::Crypto)` to
    `Crypto::rpc`.

CLI
  - `clients/cli/src/crypto.rs` (new): `crypto config`,
    `crypto config enable|disable|period <s>|volume <n>|grace <s>
    |grace-volume <n>`, plus `Crypto::rpc` render for both
    GetConfigResponse and SetConfigResponse.
  - Wired into `cli.rs`, `main.rs`, and the `rpc.rs` response
    dispatch.

Tests (all 36 lib tests pass)
  - `rpc_get_config_returns_installed_config` — round trip through
    the real `Rpc` send/receive channel; verifies the response
    matches the installed CryptoRotation fields.
  - `rpc_set_config_partial_update_preserves_other_fields` —
    sends a SetConfigRequest with only `period_seconds`, asserts
    `success=true`, `applied.period_seconds` updated, every other
    field unchanged. Reverts before releasing the test lock.
  - `rpc_set_config_rejects_zero_fields` — asserts validation
    path: `success=false`, error mentions the offending field,
    config left untouched.

  A module-scoped `CONFIG_LOCK: Mutex<()>` serialises tests that
  mutate the process-global `CONFIG` InitCell so they don't race
  Phase 2's after_decrypt_rotation tests, which also read config.

Remaining for a future Phase 3 bump (deferred)
  - Event surface (Rotated / GraceExpired / MessageDroppedPastGrace)
    — needs a ring-buffer event log + emission points at
    `rotate_finalize_initiator`, `drain_expired_rotations`, and the
    past-grace decrypt path. Does not share code with this commit;
    splitting keeps the diff focused.
Completes the Phase 3 split by exposing the three rotation events
from plan.md (`Rotated`, `GraceExpired`, `MessageDroppedPastGrace`)
to clients via a process-global ring buffer log queried through
the Crypto RPC module.

Protobuf
  - crypto_rpc.proto: `RotationEventKind` enum, `RotationEvent`
    message, `GetRotationEventsRequest { since_ms, limit }`,
    `GetRotationEventsResponse { events }`. New variants on the
    `Crypto` oneof.

libqaul
  - services/crypto/events.rs (new): MAX_EVENTS=256 ring buffer in
    a lazy `InitCell<RwLock<VecDeque<RotationEvent>>>`, `record()`
    with oldest-eviction, `query(since_ms, limit)` with oldest→
    newest ordering. Test-only `clear_for_tests()` resets the log
    between assertions.
  - Three emission sites in `CryptoNoise`:
      - `rotate_finalize_initiator` → `Rotated`
      - `drain_expired_rotations` → `GraceExpired` + stamps
        `last_retired_session_id`/`last_retired_at` on the meta.
      - decrypt "session not found" branch → `MessageDroppedPastGrace`
        when the incoming `session_id` matches `last_retired_*`.
  - `RotationMeta` gets `last_retired_session_id: Option<u32>` and
    `last_retired_at: Option<u64>` (both `#[serde(default)]` so
    existing on-disk rows deserialise cleanly). `Default` derived
    so the many struct-literal sites can use `..Default::default()`.
  - `Crypto::rpc` gains the `GetRotationEventsRequest` arm, routed
    to `handle_get_events` which maps the internal `events::*`
    types onto the proto shapes.

CLI (clients/cli/src/crypto.rs)
  - `crypto events [limit]` subcommand fires a
    `GetRotationEventsRequest` and prints a four-column table
    (timestamp_ms, kind, remote_id, primary, draining).

Tests (40 lib tests total, all pass)
  - `event_log_caps_at_max_events` — oldest evicted on overflow.
  - `event_log_query_filters_and_limits` — `since_ms` filter and
    `limit` cap.
  - `drain_emits_grace_expired_and_stamps_meta` — drain path emits
    the event and stamps `last_retired_*`.
  - `rpc_get_events_returns_recorded_events` — end-to-end round
    trip through `Rpc::send_message` / `receive_from_libqaul`.

Tests that mutate the event log hold a dedicated `EVENT_LOG_LOCK`;
`rpc_get_events_returns_recorded_events` additionally holds
`CONFIG_LOCK` (acquired first) to avoid lock-ordering inversions
with Phase 3 config-mutation tests.

Defaults unchanged — `CryptoRotation::enabled = false` still ships
dormant, so no event is emitted on a stock installation.
Adds a TriggerRotationRequest/Response pair to crypto_rpc.proto and
refactors the trigger-fire path into a shared perform_rotation helper so
the manual RPC and the automatic time/volume triggers share send code.
handle_trigger_rotation resolves the default user, validates the remote
PeerId, and reports the previous/new session ids back to the caller.

Mirrors the existing rust/clients/cli crypto commands into qauld-ctl
(config / enable / disable / set / rotate / events) with JSON output so
the pytest integration harness can drive rotation scenarios.

Unit-tests cover the disabled-config and invalid-remote-id rejection
paths; the end-to-end rotation path requires a live libqaul stack and
lives in the upcoming Phase 4 multi-node tests.
Adds the first of five multi-node rotation scenarios from plan.md Phase 4.
Also extends the pytest Node helper with crypto_config / set_crypto_config
/ rotate_with / crypto_events so subsequent scenarios can reuse the
driving code.

The test converges a line-5 mesh, pins rotation config so automatic
triggers cannot fire, then forces a rotation mid-stream between the two
endpoints. It asserts no message loss across pre-rotation, straddling,
and post-rotation traffic and that both peers log a Rotated event whose
draining_session_id matches the sender's previous primary.

Requires meshnet-lab (Linux netns + sudo); not runnable on CI or on
macOS dev machines.
Partitions the recipient off the mesh by swapping to a line-5 variant
that omits the last link, forces a rotation on the still-connected
sender, emits traffic while the peer is unreachable, then heals the
mesh. Asserts all messages land, both peers log matching Rotated
events, and the new primary session id is reflected on both sides.

Topology swap (rather than kill_node) keeps qauld alive on both ends
so this exercises the messaging buffer / DTN path rather than state
reload on the recipient. The restart scenario is tested separately.
Third Phase 4 scenario: two peers rotate with a 15 s grace window on
the recipient, then the drain ticker (60 s interval) must retire the
old draining session and emit a GraceExpired event for the previous
primary. Also asserts that post-rotation traffic on the new primary
delivers end-to-end, confirming that draining the old state did not
disturb the live session.

Notes in the module docstring why the sibling MessageDroppedPastGrace
event stays in unit-test scope — reproducing it in a live mesh would
require injecting ciphertext on an already-retired session, which no
public API exposes.
Fourth Phase 4 scenario. Both peers trip rotation concurrently from a
thread pool, then both emit bi-directional traffic across the collision
window. Asserts both peers log a Rotated event and every message in
both directions (pre-collision, during-collision, post-collision) is
delivered exactly once.

The collision-resolution rule (lower new_session_id wins, loser drops
its HalfOutgoing and adopts the winner's incoming rotate_first) is the
gnarliest rotation edge case in a DTN-tolerant system; this test pins
the observable convergence contract.
Fifth and final Phase 4 scenario: establish, rotate, then stop qauld
on every namespace and restart while the sled database and config
persist on disk. After reconvergence the test sends on the post-
rotation session in both directions and asserts delivery succeeds —
failure would mean either CryptoState or rotation_meta did not
round-trip through storage and the sender had to fall back to a new
handshake.

The in-memory rotation event ring buffer does not survive restart
(documented), so the test does not assert on crypto_events after
start_qaul.
Adds a UserInfo.capabilities bitset (router_net_info.proto) and an
in-memory Capabilities::{ROTATION, LOCAL, supports} API in
router::users. Local accounts stamp Capabilities::LOCAL into their
User row on create / on Router::init-time reload; incoming UserInfo
updates the remote peer's advertised caps through a new
add_with_check_caps / add_with_caps path.

Crypto::perform_rotation now refuses to rotate with any peer that
has not advertised Capabilities::ROTATION. Without the gate, a
legacy binary on the other end would silently drop the
RotateHandshakeFirst frame and leave the initiator stuck on a
dangling HalfOutgoing row — returning early here lets the caller
keep using the existing legacy session instead.

Also adds Users::{set_capabilities_for_tests, init_for_tests} so
unit tests can simulate UserInfo arrivals without running the full
routing stack, plus three phase5 unit tests covering the gate
rejection, gate acceptance, and bitmask semantics.

Defaults for the Phase 5 rollout are already in place: Phase 1
shipped `crypto_rotation.enabled = false` by default, and the
capability advertisement is a constant-at-compile-time bitset this
binary always includes. Flipping the default to `true` and
enabling on test nodes are operational steps.
Adds docs/protocols/Noise-Session-Rotation.md alongside the existing
messaging and BLE protocol docs. Captures the design separately from
plan.md (which mixes design and delivery): goals, why full session
rotation rather than a per-message ratchet, trigger model, the three
wire frames, receiver routing, rotation_meta layout, the capability
negotiation that gates mixed-version peers, the event surface, the
operator/RPC surface, threat model, and rollback procedure.

References the implementation files and the integration test scenarios
so the doc and the code can be navigated together.
Document the connection precedence (--socket / $QAULD_SOCKET / --dir),
the three modes (single-shot / shell / subscribe), the JSON flag, the
command-group index, two recipes, and where logging knobs live. The
prior README was empty.
…stderr)

Make single-shot mode usable by automated harnesses (meshnet-lab):

- Default to silent: the "connected to qauld" banner now requires
  -v/--verbose and is routed to stderr, so `qauld-ctl --json ... | jq` works.
- Add -t/--timeout <secs> (default 10s) wrapping both the preflight and
  the actual response wait via tokio::time::timeout. A non-responding
  daemon no longer hangs forever.
- Surface response-side problems as non-zero exits:
  * malformed RPC envelope, closed connection mid-response, and
    socket read errors now return Err from run()
  * "unprocessable RPC <module> message" arms return Err instead of
    logging at error level and exiting 0
  * SetPasswordResponse with success=false returns Err
  * connections InternetNodesList AddErrorInvalid / RemoveErrorNotFound
    return Err, and the json envelope now includes a `success` field
  * dtn print_status returns Err on status=false
- Route human-readable error messages (User not found, "No user account
  created yet", connections error labels, dtn failure banner) to stderr
  via eprintln!.

Subscribe mode also gates its preamble lines behind --verbose so piping
the event stream produces clean output.
The transports decoder came in from feature/transport-trait after the
Phase 0 sweep, so its 'unprocessable response', 'empty response', and
'transport update FAILED' branches still exited 0 and routed errors
to log/stdout. Bring it in line: errors return Err, failures go to
stderr.
After Phase 1 merge wave, the ctl branch now covers every per-feature
subcommand. Add account update, transports, crypto, and dtn-V2
recipes; document the script-friendly defaults from Phase 0
(--verbose, --timeout, --json, exit codes, stderr discipline).
Fills the proto-coverage gaps so every routable Modules::* variant
has a CLI surface.

- ble {info|start|stop|discovered} -> qaul_rpc_ble. Info / Discovered
  round-trip; Start/Stop are fire-and-forget. Decodes BleDeviceInfo
  into either JSON or a plain-text capability dump.
- rtc {list|request|accept|decline|end} -> qaul_rpc_rtc, behind a
  new cargo feature 'rtc' that mirrors libqaul's feature gate.
  Frame-level RtcOutgoing/RtcIncoming intentionally omitted (not a
  single-shot fit).
- auth {login|logout|status} -> qaul_rpc_authentication. Login emits
  AuthRequest and prints the AuthChallenge; the multi-round-trip
  challenge-response is documented as a known gap (needs shell mode
  or the embedded transport from Phase 3). Logout/Status currently
  return Err because the proto has no dedicated messages for them.

Phase 0 hardening (errors return Err, stderr discipline, --json) is
applied throughout.
Refactor the single-shot dispatch so the same command path works
over either a Unix socket (default) or an in-process libqaul
instance.

- New module rust/clients/qauld-ctl/src/transport/ with:
  * RpcTransport trait — async request(envelope, timeout, expect)
  * SocketTransport — today's framed Unix-socket client
  * EmbeddedTransport — links libqaul, starts an instance in this
    process, bridges via Rpc::send_to_libqaul / receive_from_libqaul
- New cargo feature 'embedded' switches the transport at build time.
  Default build is unchanged (socket).
- main.rs picks the transport based on cfg; run() now takes
  &mut dyn RpcTransport.
- preflight + dispatch + response-decoding all flow through the
  trait, so each subcommand works identically in either mode.
- Shell mode still opens a SocketTransport per command (matches
  prior semantics).

Smoke: cargo build -p qauld-ctl --features embedded produces a
self-contained binary that runs single-shot commands against a fresh
storage dir with no separate daemon — account update / users list /
crypto config / dtn config all round-trip correctly.
Two quality-of-life additions:

- 'qauld-ctl completions <shell>' prints a clap_complete script for
  bash / zsh / fish / powershell / elvish to stdout. Pipe it to the
  right file for your shell.
- 'qauld-ctl run [--qauld-path <path>]' spawns qauld as a child,
  prefixes its stdout/stderr lines with '[qauld]' on our stderr,
  and propagates Ctrl-C with a 5s graceful-shutdown window before
  SIGKILL. Useful for dev iteration: 'qauld-ctl run' in one terminal,
  any number of 'qauld-ctl <cmd>' invocations in another.
Move the RPC plumbing into a new workspace member at rust/qauld-rpc/
so a future TUI binary can depend on it without forking qauld-ctl.

Migrated:
  - RpcCommand trait (the protocol-level contract)
  - RpcTransport trait + SocketTransport + EmbeddedTransport
  - QaulRpc envelope encoding helpers
  - id_string_to_bin / uuid_string_to_bin

qauld-ctl now re-exports RpcCommand and the helpers from qauld-rpc
to keep existing intra-crate import paths working. Per-subcommand
RpcCommand impls (clap-derived) stay in qauld-ctl.

The 'embedded' feature on qauld-ctl now forwards to qauld-rpc's
matching feature. Default and --features embedded builds verified;
end-to-end regression smoke against a fresh daemon still works.
New binary at rust/clients/qauld-tui depending on the shared
qauld-rpc crate.

Layout:
  - Top: tab bar (Users / Feed) + node-id header
  - Middle: list view for the active tab
  - Bottom: live subscribe event log
  - Footer: contextual key hints

Tabs:
  - Users: list known users with name, id, connectivity, profile
    version, bio. Periodic refresh (--refresh, default 3s).
  - Feed: list feed messages. Press 's' to open a modal, type a
    message, Enter to send, Esc to cancel.

The subscribe stream runs in a background tokio task and pushes
formatted event lines into the events panel asynchronously
(chat.message, peers.connected, dtn.delivery_response, etc.).

Keys: Tab/Shift-Tab switch tabs, Up/Down move cursor, r refresh,
s compose (Feed only), q or Ctrl-C quit. Standard TTY required;
gracefully exits with a clear hint when the daemon isn't running.
Sending from the TUI's compose modal triggered libqaul's
'user account id couldn't be decoded:
InvalidMultihash(Error { kind: Io(Kind(UnexpectedEof)) })'
because send_feed was emitting QaulRpc { user_id: Vec::new(), ... }.
libqaul's feed handler tries to decode user_id as a libp2p PeerId
multihash, which fails on empty bytes.

Fix: fetch_default_user now returns both a display label and the
raw PeerId bytes; App caches the bytes (default_user_id) on every
refresh; send_feed takes a &[u8] user_id parameter and refuses to
send with an empty id, surfacing a clear error to the events log
instead of letting the daemon emit a cryptic decode error.
Anchors the InvalidMultihash(UnexpectedEof) fix: send_feed with an
empty user_id must short-circuit before opening a socket and return
a clear error referencing the missing default account.

Verified end-to-end against a live daemon as well: qauld-ctl feed
send (same wire pattern as the TUI fix) produced zero
'couldn't be decoded' errors in qauld.log, while a deliberate
empty-user_id reproducer triggered exactly one — confirming the
error path is specifically tied to the empty-user_id flow the
fix eliminates.
Replaces the separate qauld-tui workspace member with a 'tui'
subcommand on qauld-ctl, gated by a new 'tui' cargo feature
(default on). One binary now covers the single-shot CLI, the
shell REPL, the subscribe stream, the supervised 'run' mode, and
the terminal UI; users only need to install / explain one tool.

Layout changes:
  - rust/clients/qauld-tui/src/{app,data,ui}.rs moved to
    rust/clients/qauld-ctl/src/tui/{app,data,ui}.rs (git mv
    preserves blame).
  - qauld-tui's main.rs body folded into qauld-ctl/src/tui/mod.rs
    as a pub async fn run(cli, refresh_secs). The TUI now reuses
    qauld-ctl's top-level connection flags (--socket / --dir /
    --timeout) instead of duplicating them, and only --refresh is
    new on TuiArgs.
  - qauld-tui dropped from rust/Cargo.toml workspace members and
    the directory removed.

Cargo features:
  - default = ["tui"] — out of the box you get the TUI.
  - cargo build -p qauld-ctl --no-default-features produces a
    scripts-only binary with no ratatui / crossterm linked. Every
    other subcommand is unaffected.

Verified:
  - cargo build -p qauld-ctl (default) builds clean.
  - cargo build -p qauld-ctl --no-default-features builds clean
    and the resulting --help has no 'tui' subcommand.
  - cargo test -p qauld-ctl --bin qauld-ctl
    send_feed_refuses_empty_user_id passes (the regression test
    moved with data.rs and is still wired up).

TESTING-TUI.md added next to README.md with the updated invocation
patterns ('qauld-ctl --dir X tui [--refresh N]').
50c810f landed on main and switched the wire-level profile version
to uint32. The TUI's UserRow mirror still held u64 and refused to
take the new generated type. Match it here.
The initiator currently produces exactly one ciphertext from
encrypt_noise_kk_handshake_1 and then has to either block or drop
further chat messages until the responder sends KK msg 2. Under
DTN the responder may be offline for hours or days, so this is a
real UX hit.

The proposal lets the initiator queue N > 1 encrypted payloads on
top of msg 1, using the partial handshake CipherState captured at
that point. The responder drains them once it processes msg 1.

Lives in docs/proposals/ rather than a plan.md at the repo root,
since feat/crypto-session-rotation already uses plan.md for its
own design.
Adds HandshakeExtraPayload to crypto_net.proto plus a new
handshake_extra variant in the CryptoserviceContainer oneof. The
dispatcher in sessionmanager logs incoming frames at trace for
now; the real handler arrives once the receive-side decrypt path
lands.

Extends CryptoState with pre_cipher_out, pre_index_out,
pre_cipher_in, pre_index_in_highest, pre_index_in_seen, and
pre_bytes_accounted. Each new field carries #[serde(default)]
because CryptoState is bincoded into the existing crypto_state
sled tree — without per-field defaults, deserializing pre-existing
rows would error out. No on-disk migration; existing sessions
decode with zeros and behave as today until they're replaced by
a fresh handshake.

The single CryptoState construction site in
noise.rs::create_crypto_state sets the new fields to their
defaults explicitly.
New struct in libqaul::storage::configuration with the
conservative defaults from the proposal: enabled=false, 64
messages, 1 MiB aggregate, 24 h orphan TTL, 7 d pre-completion
deadline.

Configuration::handshake_extras is #[serde(default)] so existing
config.yaml files continue to load without manual edits. The
v2.0.0-rc.5 upgrade path also seeds the field with defaults
explicitly.

Touches the two cross-crate Configuration fixtures that hand-roll
their own struct (libqaul/tests/config_persistence.rs and
qaul-sim/src/integration.rs) so cargo build --tests stays green.
Two new helpers on CryptoNoise:

  encrypt_noise_kk_handshake_extra — initiator side. Reads
    pre_cipher_out, builds a CipherState at pre_index_out,
    encrypts, persists the incremented index plus byte
    accounting.

  decrypt_noise_kk_handshake_extra — responder side. Reads
    pre_cipher_in, range-checks pre_index against
    HandshakeExtras::max_pre_messages, drops duplicates via the
    pre_index_in_seen bitmap, AEADs the ciphertext. On success
    updates the bitmap, the highest-seen index, and the byte
    accounting.

The cipher snapshot itself is captured by calling
HandshakeState::get_ciphers right after write_message_vec /
read_message_vec for KK msg 1. get_ciphers wraps
SymmetricState::split, which is HKDF over the chaining key and
works at any handshake step. After msg 1 both sides land on the
same ck and therefore the same (c1, c2); we take c1 (initiator-
to-responder direction, by Noise convention) and stash its key
bytes in pre_cipher_out / pre_cipher_in. The post-msg-2 split
derives from a different ck, so the extras key stays independent
from the eventual transport keys.

Both primitives take state: &crate::QaulState first, mirroring
the rest of the instance-based crypto API. decrypt looks up
max_pre_messages via Configuration::get(state) so an operator
can tune the cap at runtime.

bitmap_test / bitmap_set are private helpers; pre_index_in_seen
stays packed at one bit per index. Length is bounded by the
caller's max_pre_messages check, so they don't need their own
range guard.
dastansam added 11 commits June 6, 2026 17:19
Six tests cover the new encrypt/decrypt helpers in isolation by
hand-installing a shared pre_cipher key into a CryptoState.
End-to-end coverage (a real KK msg 1 round-trip with extras
spliced in) lands once Crypto::encrypt and Crypto::decrypt are
wired up.

  extras_round_trip_single — single payload encrypts and
    decrypts; pre_index_out advances; pre_bytes_accounted
    updates.
  extras_decrypt_out_of_order — index 1 received before index 0,
    both decrypt, seen-bitmap reflects both.
  extras_decrypt_drops_duplicate_pre_index — second decrypt at
    the same pre_index returns None, per the duplicates-dropped
    rule.
  extras_decrypt_rejects_index_above_cap — pre_index >=
    max_pre_messages short-circuits before AEAD; bitmap stays
    empty.
  extras_decrypt_returns_none_when_pre_cipher_in_missing —
    orphan case (msg 1 not yet processed). Primitive is
    passive; the orphan buffer in the receive-path work is what
    retries.
  extras_encrypt_returns_none_when_pre_cipher_out_missing —
    guard against a caller that forgot to capture the snapshot
    at msg 1.

Test scaffolding adds CryptoStorage::test_account() — anonymous
in-memory sled databases per call so tests don't need a full
QaulState.database. Each test owns a fresh QaulState via
new_for_simulation() so config and RPC channels stay isolated.
A first-pass analytics view for DTN custody storage. Lives behind the
'DTN' tab next to Users and Feed.

Surfaces:
  - DTN state (used MB / message count / unconfirmed count) refreshed
    on the normal tick, with the cap from DtnConfig rendered alongside.
  - A rolling sparkline of the unconfirmed-count over the last 60
    samples so spikes are visible at a glance.
  - The configured custodian users (DtnConfigResponse.users) in a
    selectable table.
  - A live event log fed by routing dtn.delivery_response events out
    of the existing subscribe stream into a DTN-specific deque,
    leaving the general events panel untouched.

To enable structured routing, the subscribe channel now carries an
EventLine { topic, text } instead of a raw String, and the formatter
gained a dtn.delivery_response arm (accepted/rejected status,
storage node, signature short, reason).
Adds a fourth tab next to Users / Feed / DTN that surfaces
per-transport reachability for this node.

Surfaces:
  - Three KPI cards (LAN / Internet / BLE) each with a peer-count
    headline and a rolling sparkline of that count over the last 60
    refresh samples. LAN also shows a 'local' subline when the
    daemon reports same-node peers.
  - A peers table populated from Router::ConnectionsRequest: one row
    per (peer, transport) pair, showing module, base58 user id,
    hop count to that peer, and best-connection RTT.
  - A 'Peer events' panel that pulls the live peers.connected
    (and reserved peers.disconnected) events out of the subscribe
    stream into a network-specific deque.

Routing logic in App::push_event_line now dispatches by topic so
DTN delivery responses, peer events, and everything else each land
in their own panel — the general Events panel stays clean.
Adds a fifth tab next to Users / Feed / DTN / Network for Noise
session rotation telemetry. Pure poll-based — the crypto module
exposes rotation events via GetRotationEventsRequest rather than a
subscribe topic, so we just fetch on the normal refresh tick and
advance a since_ms floor to avoid refetching.

Surfaces:
  - Config card: master switch (green/red), period/volume triggers,
    grace settings.
  - Counts strip: tally of buffered events by kind (rotated /
    grace_expired / dropped_past_grace), with dropped events shown
    in red when non-zero so silent decrypt failures don't hide.
  - Rotation events table (newest first): timestamp, kind colored
    by severity, remote peer (short id), primary and draining
    session ids.

App::append_crypto_events tracks the newest timestamp_ms it has
seen and the next fetch passes that as since_ms, so the buffer
grows by delta rather than re-pulling the whole log.
Plumbing-only: a new TOPIC_PEERS_DISCONNECTED constant and an
emit_peer_disconnected helper that mirrors emit_peer_connected
(same PeerEvent wire shape, different topic string). The two
helpers now share an emit_peer_event implementation.

No call sites yet. The prune-policy decision (staleness threshold,
per-transport vs global, gossip semantics) is a separate design
question, but having the wire surface in place means:

- qauld-tui (and any future client) can bind 'peers.disconnected'
  today; they'll start receiving events as soon as a prune call
  site fires emit_peer_disconnected.
- The prune logic, when it lands, doesn't have to touch the
  subscribe layer.

Includes a mirroring unit test and a doc-comment update on
PeerEvent in subscribe.proto so the wire-level docs explain the
two topics together.
Adds a new 'crypto.rotation' subscribe topic so push-based clients
(qauld-tui's Crypto tab) see rotation events within ms instead of
waiting up to the 3s poll tick. Payload reuses the existing
qaul.rpc.crypto.RotationEvent proto — no new wire type.

libqaul:
  - TOPIC_CRYPTO_ROTATION constant + emit_crypto_rotation helper
    in rpc/subscribe.rs, mirroring the peers / dtn emitters.
  - New events::record_and_emit(Option<&QaulState>, event) helper
    that records in the in-memory log AND pushes the subscribe
    event in one call. Production sites use it; tests that don't
    have a QaulState continue to call events::record directly.
  - The three production record sites switched over:
    * decrypt's past-grace drop branch (MessageDroppedPastGrace)
    * rotate_finalize_initiator success (Rotated)
    * drain_expired_rotations grace retirement (GraceExpired)
  - drain_expired_rotations now takes Option<&QaulState>; lib.rs's
    rotation ticker passes Some, internal unit tests pass None.
  - New crypto_rotation_event_is_delivered_to_subscribers unit
    test verifies the wire shape.

qauld-tui:
  - EventLine gained a structured 'parsed' field so subscribe
    payloads can carry typed data alongside the rendered string.
  - format_event recognises crypto.rotation and parses the proto
    into a CryptoRotationEvent.
  - App::push_event_line merges crypto.rotation push events into
    the typed crypto_events buffer with (timestamp_ms, kind,
    primary, draining) dedup, so push + poll converge on the same
    view without double-counting.
  - The 3s poll path stays as a backstop for events that fired
    before the subscribe stream was up, and as a fallback when
    the stream drops.
Two interactive affordances that scale with the tables.

Detail drawer (Enter on any row):
  - Fullscreen modal listing the selected row's labelled fields,
    untruncated. Solves the everywhere-short_id problem so users
    can copy full peer ids, signatures, bios, etc.
  - Per-tab schema via App::selected_detail() returning labelled
    (key, value) pairs.
  - Esc / Enter / q dismiss.

Filter ('/'):
  - / opens a text input; rows substring-match (case-insensitive)
    against a concatenation of the tab's relevant fields.
  - Cursor clamps to the filtered count and resets on each
    keystroke so the user is always on the first visible row.
  - Filter persists while navigating; Esc clears it and exits the
    filter mode; Tab switching also clears it (each tab starts
    fresh).
  - Each table's title shows 'filtered N/M for "foo"' when a
    filter is active.

Internals: App grew filtered_users / filtered_feed /
filtered_dtn_custodians / filtered_peers / filtered_crypto_events
iterator helpers so the render fns and selected_detail consume the
same view. InputMode gained Filtering and Viewing variants; the
key handler treats them as exclusive modes that take precedence
over the Normal-mode bindings.
Catches the generated file up to the PeerEvent docstring change in
subscribe.proto (10b2f2e); the source change was committed but the
build output wasn't regenerated and re-committed at the same time.
The merge brought app.rs over from integration where it referenced
'crate::data' (sibling at the qauld-tui bin root). After the move
into qauld-ctl/src/tui/, data is one level up via super, not at the
crate root.
The crypto-session-rotation merge ended up with two copies of
`CryptoStorage::test_account()` — one introduced during conflict
resolution against main's no-rotation_meta version, the other from
crypto-rotation's own implementation that also adds the rotation_meta
tree. Drop the earlier copy so libqaul tests compile.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant