Summary
The socket REPL on mobile targets (confirmed on Android) is generally usable, and reconnecting does work in normal conditions.
However, there is a teardown race: if a client disconnects too quickly after an evaluation completes (even after seeing the prompt again), the REPL can become wedged.
In that state:
- the original REPL session remains blocked waiting on a promise
- it still holds the global
*compiler-state lock
- subsequent REPL connections are accepted by the socket server but block before they can evaluate anything
So this is not a general “reconnection is broken” issue. It is a disconnect / teardown race that can poison later sessions.
Environment
- Repo:
Tensegritics/ClojureDart
- Tested against:
b62e1f9
- Platform where reproduced: Android device (
flutter run -d <android-device>)
- Also independently confirmed desktop REPL parsing bug exists, but this report is specifically about the teardown race on mobile.
Reproduction
Control case: reconnect works
- Start the app with:
clj -M:cljd flutter -d <android-device>
- Connect to the printed REPL port.
- Evaluate:
- Wait a few seconds after the prompt returns.
- Close the socket.
- Reconnect.
- Evaluate:
Result: works correctly. Banner is shown again, evaluation returns 30.
Failing case: immediate disconnect wedges later sessions
- Start the app with:
clj -M:cljd flutter -d <android-device>
- Connect to the REPL socket.
- Evaluate either a trivial form or a slightly more active one, e.g.:
or
- As soon as the response/prompt returns, close the socket immediately.
- Reconnect to the REPL port.
- Try to evaluate another trivial form:
Result:
- reconnect may still show the banner
- the next evaluation hangs and produces no response
- later sessions are also blocked
Evidence
Thread dump
After reproducing the failing case, jstack shows:
"Clojure Connection CLJD repl 1" ... WAITING (parking)
at clojure.core$promise$reify__8625.deref(core.clj:7257)
at cljd.build$repl$fn__6008.invoke(build.clj:259)
- locked <0x...> (a clojure.lang.Atom)
"Clojure Connection CLJD repl 2" ... BLOCKED (on object monitor)
at cljd.build$repl$fn__6008.invoke(build.clj:258)
- waiting to lock <0x...> (a clojure.lang.Atom)
This strongly suggests the first REPL connection is waiting on @p while still holding the *compiler-state lock, and all later sessions block on that same lock.
Process log
The process log also shows a broken pipe while handling REPL output:
Exception in thread "Thread-18" java.net.SocketException: Broken pipe
...
at cljd.build$compile_cli$fn__6192.invoke(build.clj:586)
Likely root cause
In clj/src/cljd/build.clj, repl currently does this:
(locking *compiler-state
(let [p (promise)
_ (swap! *repl-states assoc-in [repltag :ack!] #(deliver p %))
_ (eval-to-repl repltag expr-or-throwable *compiler-state trigger-reload p)
str-or-throwable @p]
...))
So the code waits for the ack promise inside locking *compiler-state.
If the client disconnects during teardown and the ack path never completes cleanly, the REPL session can remain blocked on @p while still holding the lock. That then blocks all later sessions.
Proposed fix
Move the @p wait outside the locking *compiler-state block so the global compiler-state lock is only held during setup / compile / trigger-reload, not during the potentially unbounded wait for the ack:
(let [p (promise)
_ (locking *compiler-state
(swap! *repl-states assoc-in [repltag :ack!] #(deliver p %))
(eval-to-repl repltag expr-or-throwable *compiler-state trigger-reload p))
str-or-throwable @p]
...)
This does not solve the fact that a disconnected session may remain stuck, but it should prevent one stuck session from poisoning all later ones.
Expected behavior
If a client disconnects abruptly or very quickly after an evaluation, that specific REPL session may fail, but later REPL sessions should still be able to connect and evaluate forms.
Actual behavior
A quick disconnect can wedge one REPL session in a way that blocks later sessions globally.
Summary
The socket REPL on mobile targets (confirmed on Android) is generally usable, and reconnecting does work in normal conditions.
However, there is a teardown race: if a client disconnects too quickly after an evaluation completes (even after seeing the prompt again), the REPL can become wedged.
In that state:
*compiler-statelockSo this is not a general “reconnection is broken” issue. It is a disconnect / teardown race that can poison later sessions.
Environment
Tensegritics/ClojureDartb62e1f9flutter run -d <android-device>)Reproduction
Control case: reconnect works
Result: works correctly. Banner is shown again, evaluation returns
30.Failing case: immediate disconnect wedges later sessions
(pick!)Result:
Evidence
Thread dump
After reproducing the failing case,
jstackshows:This strongly suggests the first REPL connection is waiting on
@pwhile still holding the*compiler-statelock, and all later sessions block on that same lock.Process log
The process log also shows a broken pipe while handling REPL output:
Likely root cause
In
clj/src/cljd/build.clj,replcurrently does this:So the code waits for the ack promise inside
locking *compiler-state.If the client disconnects during teardown and the ack path never completes cleanly, the REPL session can remain blocked on
@pwhile still holding the lock. That then blocks all later sessions.Proposed fix
Move the
@pwait outside thelocking *compiler-stateblock so the global compiler-state lock is only held during setup / compile / trigger-reload, not during the potentially unbounded wait for the ack:This does not solve the fact that a disconnected session may remain stuck, but it should prevent one stuck session from poisoning all later ones.
Expected behavior
If a client disconnects abruptly or very quickly after an evaluation, that specific REPL session may fail, but later REPL sessions should still be able to connect and evaluate forms.
Actual behavior
A quick disconnect can wedge one REPL session in a way that blocks later sessions globally.