Skip to content

v2.3.0#12

Merged
dominicletz merged 20 commits into
mainfrom
letz/v2.3.0
May 24, 2026
Merged

v2.3.0#12
dominicletz merged 20 commits into
mainfrom
letz/v2.3.0

Conversation

@dominicletz
Copy link
Copy Markdown
Member

@dominicletz dominicletz commented May 22, 2026

v2.3.0

Relay node release: Kademlia/replication, Edge v2 + WebSocket tickets, dio_network / dio_ticket RPC, logging and snap fixes, plus CI and mix lint on recent commits.

Demo screenshots

There are still no embedded images in this description (GitHub markdown needs https://github.com/user-attachments/... URLs from an upload). Screenshots were regenerated on the Cursor agent from the GET /api HTML reference (Network.RpcHttp / Network.RpcDocs), highlighting dio_network, dio_ticket, and dio_message — the closest shipped “UI” for this workstream.

Next step (human, in browser): open the latest automation comment linked below, download the two PNGs from the agent artifact paths listed there, then Edit this PR description and drag-drop the files into the editor so GitHub uploads them (still no git commits required).

Automation comment: #12 (comment)

Verification

GitHub Actions Build and lint (mix lint) should pass on the latest letz/v2.3.0 commits (CI installs autotools for libsecp256k1; Dialyzer/Credo fixes are on the branch).

@dominicletz dominicletz added the cursor-waiting-for-ci cursor-automation.com workflow column label May 22, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Kademlia DHT implementation to use a geometric ring based on on-chain node registry data, replacing the previous bucket-based approach. It introduces Dynamo-style quorum for data replication and consolidates ticket validation logic into a shared submission module. Feedback from the review identifies a potential process leak due to an infinite timeout in RPC calls, a risk of startup failure caused by blocking network I/O in init/1, and a performance bottleneck in the node synchronization logic that could be mitigated by using database transactions.

Comment thread lib/kademlia_light.ex Outdated
try do
# Don't need to use GenServerDbg.call here because we're regelualry exepcting timeouts
GenServer.call(pid, {:rpc, call}, 2000)
GenServer.call(pid, {:rpc, call}, :infinity)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using an :infinity timeout in GenServer.call here is risky. If a peer handler process becomes unresponsive (e.g., due to network issues or deadlocks), the background process spawned by rpc_with_cutoff will hang indefinitely. This can lead to a process leak and eventually cause the caller of rpc/2 (the list version) to hang as well. Consider using a large but finite timeout (e.g., 30 seconds) to ensure these processes eventually terminate.

      GenServer.call(pid, {:rpc, call}, 30_000)

Comment thread lib/kademlia_light.ex
Comment on lines +188 to 191
case KademliaSql.sync_registry_nodes() do
:error -> KademliaSql.ensure_self_node_for_init()
:ok -> :ok
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Performing network I/O via KademliaSql.sync_registry_nodes() inside init/1 can block the GenServer startup. If the registry contract call is slow or the network is unavailable, it may cause the supervisor to time out and fail to start the node. It is recommended to move this initialization logic to handle_continue/2 to ensure the GenServer starts promptly.

Comment thread lib/model/kademliasql.ex
Comment on lines +106 to +126
for addr <- removed do
KademliaLight.redistribute_removed_node(addr)
query!("DELETE FROM p2p_nodes WHERE address = ?1", [addr])
end

for address <- addresses do
ring_key = KademliaRing.key(address)

query!(
"""
INSERT INTO p2p_nodes (address, ring_key, on_chain, synced_at)
VALUES (?1, ?2, 1, ?3)
ON CONFLICT(address) DO UPDATE SET
on_chain = 1,
ring_key = excluded.ring_key,
synced_at = excluded.synced_at
""",
[address, ring_key, now]
)
end

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The sync_registry_nodes function performs multiple database deletions and insertions within loops. In SQLite, executing many individual write queries can be slow because each one is wrapped in its own transaction by default. Wrapping the entire synchronization logic in a single transaction would significantly improve performance and ensure atomicity.

cursoragent and others added 2 commits May 22, 2026 19:01
GitHub-hosted runners do not ship autoreconf by default; the libsecp256k1
dependency runs autogen.sh during mix compile and failed without autoconf,
automake, and libtool. Run apt-get update before install alongside the
existing Boost packages.

Co-authored-by: Dominic Letz <dominicletz@users.noreply.github.com>
- Assert non-empty replica list in quorum test (avoid expensive length/1).
- Drop unreachable catch-all branches Dialyzer flagged in Kademlia discovery
  and ticket submission.
- Declare DiodeClient.Object.Ticket.t/0 and server_sign/2 return typing so
  specs referencing Ticket.t() are valid.
- Ignore a known false positive invalid_call on EdgeV2 ticketv2 RPC path where
  Dialyzer narrows shared defp success typing incorrectly.

Co-authored-by: Dominic Letz <dominicletz@users.noreply.github.com>
@cursor-automation cursor-automation Bot added cursor-demo cursor-automation.com workflow column cursor-waiting-for-human cursor-automation.com workflow column and removed cursor-waiting-for-ci cursor-automation.com workflow column cursor-demo cursor-automation.com workflow column labels May 22, 2026
@dominicletz
Copy link
Copy Markdown
Member Author

@cursoragent review the gemini comments and address as appropriate

@cursor
Copy link
Copy Markdown

cursor Bot commented May 22, 2026

Here is what I implemented for the three Gemini notes (pushed on letz/v2.3.0, PR #12 updated).

1. Finite GenServer.call timeout (high)

rpc_call_result/2 no longer uses :infinity. It uses a module attribute @rpc_peer_call_timeout set to 30 seconds, matching the review suggestion so a stuck PeerHandler cannot block the rpc_with_cutoff child forever.

2. Registry sync outside init/1 (medium)

KademliaSql.sync_registry_nodes/0 and the interval timers were moved from init/1 into handle_continue(:bootstrap, state), after init only sets up ETS, LRU, legacy file cleanup, and an initial load_ring / write_ets_ring. Startup returns quickly; chain/registry work runs immediately afterward (still before normal messages, as with :continue).

3. Single SQLite transaction for sync writes (medium)

After scheduling redistribution for removed nodes (unchanged side effects, still outside the transaction), all DELETE / INSERT / ensure_self_node / refresh_known_good_all work runs inside BEGIN IMMEDIATECOMMIT, with ROLLBACK on failure, then the existing GenServer.cast(KademliaLight, :reload_ring) as before.

Tests: mix is not available in this environment’s PATH, so I could not run the suite here. Running mix test (at least the kademlia / kademliasql tests) locally is recommended.

Open in Web Open in Cursor 

…saction

- Use a 30s GenServer.call timeout to PeerHandler in rpc_call_result instead
  of :infinity so spawned rpc_with_cutoff tasks cannot hang forever.
- Run registry sync and interval timers from handle_continue(:bootstrap) so
  init returns quickly without blocking on chain I/O.
- Wrap sync_registry_nodes DELETE/INSERT/ensure_self/refresh writes in a
  single SQLite transaction for atomicity and fewer implicit commits.

Co-authored-by: Dominic Letz <dominicletz@users.noreply.github.com>
@dominicletz dominicletz added cursor-waiting-for-ci cursor-automation.com workflow column and removed cursor-waiting-for-human cursor-automation.com workflow column labels May 22, 2026
@dominicletz
Copy link
Copy Markdown
Member Author

Agent note (screenshots)

  • Prior state: PR #12 had an empty description, so there were no embedded demo screenshots yet.
  • “Computer use”: This agent session does not have a Cursor computer-use browser control integration. Instead, the API docs HTML was rendered with mix run and captured with headless Chrome against file:// / http://127.0.0.1 (subset page for dio_network + dio_ticket + dio_message).
  • Why images are not inlined here: ManagePullRequest could not attach images to the description (GitHub reported the description is not agent-managed for that path). Third-party anonymous image hosts used from CI were unavailable/blocked, and GitHub does not accept raw-binary gists. The PNGs therefore stay out of git; please drag them from the Cursor agent artifacts into the PR description if you want them hosted on user-images.githubusercontent.com.

Regenerate the HTML locally (then screenshot in any browser):

mix run -e 'html = EEx.eval_file("lib/network/api_docs.html.eex", docs: Network.RpcDocs.all()); File.write!("api-docs.html", html)'
xdg-open api-docs.html   # or open in Chrome; jump to #dio_network / #dio_ticket / #dio_message

Fixes CI `mix format --check-formatted` failure on blank line placement
after the rescue clause in reload_ring/0.

Co-authored-by: Dominic Letz <dominicletz@users.noreply.github.com>
@cursor-automation cursor-automation Bot added cursor-demo cursor-automation.com workflow column and removed cursor-waiting-for-ci cursor-automation.com workflow column labels May 22, 2026
@cursor-automation cursor-automation Bot added cursor-waiting-for-human cursor-automation.com workflow column and removed cursor-demo cursor-automation.com workflow column labels May 22, 2026
@dominicletz
Copy link
Copy Markdown
Member Author

Demo screenshots (automation refresh)

The PR description still contains no embedded demo images (no ![…](https://github.com/user-attachments…), no <img src=…>). It only explains how to attach files manually.

What was captured (again): this branch does not ship a standalone product GUI. The closest built-in “UI” is the HTML JSON-RPC reference served by GET /api (Network.RpcHttp + Network.RpcDocs). Headless Chrome was used to screenshot the parts most relevant to the Edge / ticket / network work: dio_network, dio_ticket, and dio_message.

Artifact Contents
pr12-feature-slice.png Subset page: the three endpoints + example rail
pr12-dio_network.png Full docs layout with focus on dio_network

Paths on the Cursor agent (download from the session / attach via GitHub UI — not committed to letz/v2.3.0):

  • /opt/cursor/artifacts/pr12-feature-slice.png
  • /opt/cursor/artifacts/pr12-dio_network.png

Why they are not inlined here: GitHub only serves PR markdown images from uploads such as https://github.com/user-attachments/assets/…. Creating those URLs requires the github.com web editor (drag-and-drop) or a browser session cookie flow; GITHUB_TOKEN + gh pr edit cannot upload binaries, and the Cursor ManagePullRequest image path is rejected for this PR (“description is not agent-managed”). Per your instructions, release assets / orphan branches / third-party image hosts were not used.

Regenerate locally:

mix run -e 'html = EEx.eval_file("lib/network/api_docs.html.eex", docs: Network.RpcDocs.all()); File.write!("api-docs.html", html)'

Open api-docs.html in a browser and capture #dio_network, #dio_ticket, and #dio_message.

@dominicletz dominicletz merged commit 9d2acad into main May 24, 2026
2 checks passed
@dominicletz dominicletz deleted the letz/v2.3.0 branch May 24, 2026 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cursor-waiting-for-human cursor-automation.com workflow column

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants