Skip to content

Addsci#21

Merged
Josephrp merged 22 commits into
devfrom
addsci
Mar 16, 2026
Merged

Addsci#21
Josephrp merged 22 commits into
devfrom
addsci

Conversation

@Josephrp

@Josephrp Josephrp commented Mar 15, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR wires up the XET ("scientific") workspace transport layer: it introduces per-workspace XetFolderRuntime ownership, a _xet_transport_registry on the session manager, a shared XET discovery graph (CAS client, bloom filter, catalog, LPD, multicast, gossip, flooding), MediaStreamManager integration, and canonical internal status models (CanonicalTorrentStatus / CanonicalGlobalStats). It also ships several correctness fixes: BEP 44 mutable-signing now uses the spec-compliant bencoded buffer, DHT index decoding is fixed, XetDeduplication.close() is made idempotent, _cache_info no longer races the dedup connection with a raw sqlite3.connect(), tonic?: URI parsing is corrected, and inline debug log blocks are removed from session.py.

  • Critical bug — key_manager silently discarded: AsyncSessionManager.__init__ assigns self.key_manager = key_manager from the parameter at line 3490, then unconditionally overwrites it with self.key_manager: Optional[Any] = None at line 3565. Any caller that passes a live key manager (e.g., the daemon startup path) will have it silently replaced with None, breaking XET identity, allowlist loading, and node-ID generation.
  • Breaking change — BEP 44 signature format: The signing message was changed from a non-standard raw concatenation (salt + seq.to_bytes(8, "big") + data) to the correct BEP 44 bencoded buffer. This is spec-correct, but all mutable data stored by previous builds will fail signature verification after upgrade. There is no migration or fallback path.
  • DESIGNATED mode always downgrades at start: XetSyncManager.start() calls _has_verified_designated_source() before any peer connections exist. The check iterates active torrent peer managers, which are empty at workspace-open time, and always returns False — causing every DESIGNATED-mode workspace to silently downgrade to BEST_EFFORT on startup.

Confidence Score: 2/5

  • Not safe to merge in current state — the key_manager double-assignment silently breaks XET identity/signing for any daemon deployment that wires up a key manager.
  • Two logic bugs with runtime impact (key_manager overwrite, DESIGNATED mode always downgrading), plus a backward-incompatible cryptographic format change with no migration path. The rest of the PR contains solid improvements (BEP 44 fix, idempotent close, correct URI parsing, removed debug logs), but the three issues above need to be addressed before merging.
  • ccbt/session/session.py (key_manager double assignment), ccbt/discovery/dht_storage.py (BEP 44 migration), ccbt/session/xet_sync_manager.py (DESIGNATED mode startup check)

Important Files Changed

Filename Overview
ccbt/session/session.py Adds XET transport registry, discovery graph initialization, media stream manager, and key manager support — but the key_manager parameter is overwritten with None later in __init__, silently breaking XET identity and signing.
ccbt/discovery/dht_storage.py Fixes BEP 44 mutable signing/verification to use the correct bencoded buffer format — but this is a backward-incompatible change that will invalidate all previously stored mutable data signatures with no migration path.
ccbt/session/xet_sync_manager.py Adds mode-downgrade guardrails for CONSENSUS/BROADCAST/DESIGNATED at startup, peer discovery registration, pending-updates snapshot, and file metadata tracking — but the DESIGNATED verification check runs before peer managers are ready, causing guaranteed downgrade to BEST_EFFORT on first start.
ccbt/models.py Adds CanonicalTorrentStatus/CanonicalGlobalStats models, MediaConfig, min_peers_before_dht field (default 10, down from 50), XetSyncConfig transport flags, and field validators for auth_scope and sync_mode.
ccbt/session/xet_realtime_sync.py Replaces inline SHA-256 hashing with _build_file_metadata delegation, skips .git/.xet directories, and implements real CAS-backed peer discovery for pending chunks — significant improvement over the previous stub.
ccbt/discovery/dht_indexing.py Fixes decode path in query_index to bencode-decode bytes before passing to decode_storage_value — previously passed raw bytes directly which would fail for bencoded values.
ccbt/storage/xet_deduplication.py Makes close() idempotent (sets self.db = None after close), adds get_recent_chunks() for cache introspection — resolves the double-close issue flagged in prior review.
ccbt/executor/xet_executor.py Replaces raw sqlite3.connect() in _cache_info with async XetDeduplication context manager — resolves the race condition flagged in prior review.

Sequence Diagram

sequenceDiagram
    participant Daemon
    participant SessionManager as AsyncSessionManager
    participant XetFolder
    participant XetSyncManager
    participant CASClient as P2PCASClient
    participant XetRealtimeSync

    Daemon->>SessionManager: __init__(output_dir, key_manager)
    Note over SessionManager: key_manager param stored at line 3490<br/>then overwritten with None at line 3565
    Daemon->>SessionManager: open_xet_workspace(tonic_source, DESIGNATED)
    SessionManager->>SessionManager: _ensure_xet_discovery_graph()
    SessionManager->>XetFolder: construct XetFolder(session_manager)
    SessionManager->>XetFolder: start()
    XetFolder->>XetSyncManager: start()
    Note over XetSyncManager: _has_verified_designated_source() runs<br/>before any peer connections exist<br/>Always downgrades to BEST_EFFORT
    XetFolder->>XetRealtimeSync: start()
    loop sync cycle every 30s
        XetRealtimeSync->>XetFolder: _build_file_metadata(file_path)
        XetRealtimeSync->>XetSyncManager: get_pending_updates_snapshot()
        XetRealtimeSync->>CASClient: find_chunks_peers_batch(hashes)
        CASClient-->>XetRealtimeSync: peer list per chunk
        XetRealtimeSync->>XetSyncManager: register_discovered_peer(peer)
    end
Loading

Comments Outside Diff (1)

  1. ccbt/session/session.py, line 154-155 (link)

    Silent DHT behavior change: now initializes for every .torrent file when enable_dht=True

    Previously DHT only initialized when the caller passed --enable-dht explicitly or the torrent was a magnet link. The guard dht_explicitly_requested or is_magnet_link has been removed, so now every .torrent file download starts a DHT client when config.discovery.enable_dht=True (the default).

    For users who have enable_dht=True in config (the default) and are downloading many .torrent files, this will:

    • Start a DHT client for every torrent session
    • Potentially generate heavy DHT traffic per-session, risking IP rate-limiting/blacklisting from public DHT nodes

    The commit message and inline comment describe this as an intentional change ("allow DHT as fallback for all torrents"), but the default config value enable_dht=True means all users get this behavior change transparently. If the intent is to allow opt-in, consider introducing a separate enable_dht_for_regular_torrents config key, or document the intentional change in the PR description so it can be deliberately reviewed.

Last reviewed commit: 4290693

Greptile also left 3 inline comments on this PR.

@Josephrp

Copy link
Copy Markdown
Contributor Author

@greptileai , comprehensively review this pr :

Comment thread .gitignore Outdated
Comment thread .github/workflows/release.yml Outdated
Comment thread .github/workflows/release.yml Outdated
Comment thread .github/workflows/release.yml Outdated
Comment thread ccbt/session/checkpointing.py Outdated
Comment thread .github/workflows/compatibility.yml Outdated
Comment thread .github/workflows/test.yml Outdated
Josephrp and others added 4 commits March 15, 2026 23:45
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Tonic <joseph@legml.ai>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Tonic <joseph@legml.ai>
@Josephrp

Copy link
Copy Markdown
Contributor Author

@greptileai , comprehensively update your review based on the changes above :

Comment thread ccbt/security/xet_allowlist.py
Comment thread ccbt/daemon/ipc_server.py
Comment thread ccbt/discovery/dht.py Outdated
Comment thread ccbt/daemon/ipc_server.py
Comment thread ccbt/session/media_stream_runtime.py
@Josephrp

Copy link
Copy Markdown
Contributor Author

@greptileai comprehensively update your review based on the above :

Comment thread ccbt/session/media_stream_manager.py Outdated
Comment thread ccbt/storage/xet_folder_manager.py
Comment thread ccbt/utils/media_launcher.py
Comment thread ccbt/session/media_stream_runtime.py Outdated
@Josephrp

Copy link
Copy Markdown
Contributor Author

@greptileai , comprehensively update your review based on the latest changes and the entire PR accordingly :

Comment thread ccbt/discovery/dht.py Outdated
Comment thread ci_precommit_logs/pytest_batch_019.txt Outdated
Comment thread ccbt/session/media_stream_runtime.py Outdated
@Josephrp

Copy link
Copy Markdown
Contributor Author

@greptileai : comprehensively review your assessment of this PR . this PR introduces a lot of changes , some may seem out of scope but they are included in this PR , review them as if they are in scope. review every touched file . review every intent and method and corresponding datamodel. review out of diff and within diff changes. assess the quality of this PR . comprehensively update your assessment of this PR , produce a complete assessment of this PR :

Comment thread ccbt/storage/xet_folder_manager.py
Comment thread ccbt/storage/xet_folder_manager.py
Comment thread ccbt/executor/xet_executor.py Outdated
Comment thread ccbt/session/xet_metadata_resolver.py
Comment thread ccbt/session/session.py
Comment thread ccbt/models.py
@Josephrp

Copy link
Copy Markdown
Contributor Author

@greptileai : comprehensively review your assessment of this PR . this PR introduces a lot of changes , some may seem out of scope but they are included in this PR , review them as if they are in scope. review every touched file . review every intent and method and corresponding datamodel. review out of diff and within diff changes. assess the quality of this PR . comprehensively update your assessment of this PR , produce a complete assessment of this PR :

Comment thread ccbt/session/media_stream_runtime.py
Comment thread ccbt/session/media_stream_runtime.py
Comment thread ccbt/session/xet_metadata_resolver.py
Comment thread ccbt/utils/media_launcher.py
@Josephrp

Copy link
Copy Markdown
Contributor Author

@greptileai : comprehensively review your assessment of this PR . this PR introduces a lot of changes , some may seem out of scope but they are included in this PR , review them as if they are in scope. review every touched file . review every intent and method and corresponding datamodel. review out of diff and within diff changes. assess the quality of this PR . comprehensively update your assessment of this PR , produce a complete assessment of this PR , double check your work to assure you are not missing false positives , review this PR comprehensively :

Comment thread ccbt/interface/widgets/media_playback_widget.py
Comment thread ccbt/session/media_stream_runtime.py
Comment thread ccbt/interface/widgets/media_playback_widget.py Outdated
Comment thread ccbt/daemon/ipc_server.py
@Josephrp

Copy link
Copy Markdown
Contributor Author

@greptileai : comprehensively review your assessment of this PR . this PR introduces a lot of changes , some may seem out of scope but they are included in this PR , review them as if they are in scope. review every touched file . review every intent and method and corresponding datamodel. review out of diff and within diff changes. assess the quality of this PR . comprehensively update your assessment of this PR , produce a complete assessment of this PR , double check your work to assure you are not missing false positives , review this PR comprehensively :

1 similar comment
@Josephrp

Copy link
Copy Markdown
Contributor Author

@greptileai : comprehensively review your assessment of this PR . this PR introduces a lot of changes , some may seem out of scope but they are included in this PR , review them as if they are in scope. review every touched file . review every intent and method and corresponding datamodel. review out of diff and within diff changes. assess the quality of this PR . comprehensively update your assessment of this PR , produce a complete assessment of this PR , double check your work to assure you are not missing false positives , review this PR comprehensively :

@Josephrp Josephrp merged commit d6506dd into dev Mar 16, 2026
Comment thread ccbt/session/session.py
self.udp_tracker_client: Optional[Any] = None
# Queue manager for priority-based torrent scheduling
self.queue_manager: Optional[Any] = None
self.key_manager: Optional[Any] = None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key_manager parameter silently overwritten with None

self.key_manager = key_manager is assigned from the constructor parameter at line 3490, but then unconditionally overwritten with None at this line — discarding whatever was passed in.

Because key_manager is used in _build_xet_node_id, _ensure_xet_discovery_graph (XET allowlist creation), and XetAllowlist instantiation, any caller that passes a live key manager (e.g., the daemon startup) will silently get no signing/identity capability.

Suggested change
self.key_manager: Optional[Any] = None
self.key_manager = self.key_manager or None # preserved from __init__ param

Or simply remove this line — the assignment at line 3490 is sufficient.

Comment on lines 204 to +207
msg = "Cryptography library not available for signing"
raise RuntimeError(msg)

# Build message to sign: salt + seq + v (data)
# BEP 44: sig = sign(salt + seq + v)
message = salt + seq.to_bytes(8, "big") + data
# Build message to sign per BEP 44: bencoded-style buffer (salt + seq + v)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BEP 44 signature format change breaks all previously stored mutable data

The old signing message was salt + seq.to_bytes(8, "big") + data — a non-standard raw concatenation. The new format is the correct BEP 44 bencoded buffer (4:salt<len>:<salt>3:seqi<seq>e1:v<len>:<v>).

While the new format is spec-compliant, this is a backward-incompatible change: any mutable DHT record written by a previous build will fail verify_mutable_data_signature after this update because the signing message they were signed with no longer matches. Nodes that have not yet upgraded will also reject signatures from upgraded nodes and vice-versa.

There is no migration path in this PR. Consider either:

  1. Bumping the mutable store version/salt so old records are naturally discarded, or
  2. Adding a fallback verification attempt with the legacy format during a transition window.

Comment on lines +183 to +207
if peer_manager.is_peer_xet_authorized(peer_id, None):
return True
return False

async def start(self) -> None:
"""Start the sync manager."""
if self._running:
return

if self.sync_mode == SyncMode.CONSENSUS:
self.logger.warning(
"Consensus mode is not transport-backed yet; downgrading to best_effort"
)
self.sync_mode = SyncMode.BEST_EFFORT
self.set_last_error(
"Consensus mode is disabled until transport-backed RPCs exist"
)
elif self.sync_mode == SyncMode.BROADCAST:
if not self._has_healthy_propagation_backend():
self.logger.warning(
"Broadcast mode has no healthy propagation backend; downgrading to best_effort"
)
self.sync_mode = SyncMode.BEST_EFFORT
self.set_last_error(
"Broadcast mode requires at least one healthy propagation backend"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DESIGNATED mode always downgrades to BEST_EFFORT on first start

_has_verified_designated_source() checks peer_manager.is_peer_xet_authorized() across all active torrent sessions. When a workspace is first opened — which happens before any peer connections are established — self.session_manager.torrents will be empty (or contain sessions with uninitialized peer managers). The check therefore always returns False, and every DESIGNATED-mode workspace silently downgrades to BEST_EFFORT at startup, regardless of configuration.

The last_error is set to a string that implies a legitimacy problem with the peers, which would be confusing to operators.

Consider deferring this authorization check to the first sync cycle (inside _sync_loop) rather than enforcing it as a hard prerequisite in start(). The downgrade should only happen if the peers remain unverifiable after a configurable warm-up period.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant