Skip to content

Fix bridge auto-connect flap; guild allowlist; mention-strip; HYDRA_SPAWN_MODEL#44

Open
kwliang1 wants to merge 5 commits into
sf8193:mainfrom
kwliang1:fix/bridge-autoconnect-gate
Open

Fix bridge auto-connect flap; guild allowlist; mention-strip; HYDRA_SPAWN_MODEL#44
kwliang1 wants to merge 5 commits into
sf8193:mainfrom
kwliang1:fix/bridge-autoconnect-gate

Conversation

@kwliang1

Copy link
Copy Markdown
Collaborator

Four related changes from a fresh-install setup session. Each is a separate commit; the last one is the load-bearing fix.

1. fix: gate bridge daemon-connect on HYDRA_BRIDGE_AUTOCONNECT (b1e61ed)

The scenario

Set up hydra on a brand-new machine following the README. Installed the plugin once:

claude plugin install discord@claude-plugins-official

Pointed byte at the default config dir (CLAUDE_CONFIG_DIR=~/.claude). Daemon started fine. Byte connected fine. DMs round-tripped. Spawned sub-sessions in threads.

Then opened a second Claude Code session in another window — completely unrelated work — and Discord replies stopped. Bot would 👀-ack messages, show a typing indicator, then nothing. The daemon log showed an endless reconnect storm:

daemon: bridge registered for session main
daemon: bridge disconnected for session main
daemon: replacing bridge for session main
daemon: bridge registered for session main
daemon: bridge disconnected for session main
...

The root cause

A Claude Code plugin's MCP server (bridge.ts) boots whenever the plugin is enabled in CLAUDE_CONFIG_DIR. bridge.ts unconditionally called connectSocket() on startup and registered with the daemon as:

const SESSION_ID = process.env.HYDRA_SESSION_ID ?? 'main'

Every Claude session sharing the byte config dir — second IDE window, another conversation, sometimes even an editor extension — also booted a bridge and claimed main. The daemon's register handler interprets a duplicate registration as a restart and .end()s the existing socket. The kicked bridge then reconnect-loops on its 5s timer, gets accepted, kicks the other one. Forever.

The reply tool fires a tool_call over the socket. If the next disconnect lands before tool_result comes back, the reply silently fails. From Discord's side: typing indicator, nothing.

Why the existing setup works

start-byte-v2.sh uses CLAUDE_CONFIG_DIR=~/.claude-personal — a dedicated dir nothing else uses. So no other Claude session has the plugin loaded. It avoids the bug; it doesn't fix it. Anyone who sets CLAUDE_CONFIG_DIR=~/.claude and forgets to keep their other Claude windows out of that dir hits the loop immediately.

The fix

Bridge no longer auto-connects. Only connects when HYDRA_BRIDGE_AUTOCONNECT=1 is in the environment.

  • start-byte-v2.sh / start-slack-byte.sh set it.
  • daemon/session-lifecycle.ts sets it for every spawned sub-session, alongside HYDRA_SESSION_ID.
  • Every other Claude session that has the plugin loaded still gets the skills (/discord:access, /discord:configure) and the MCP tool definitions. The bridge sits dormant. No socket, no main registration, no flap.

Alternatives considered

  • Per-session ID derived from something stable. Nothing usable: SESSION_ID is overwritten by Claude Code, and Claude session IDs change on restart — would break byte-reclaim-on-bounce.
  • Daemon-side first-wins instead of last-wins. Stops the flap but breaks the legitimate "byte died, new byte should reclaim" path unless we add bridge heartbeats and dead-bridge detection. Bigger change, same outcome.
  • Detect --channels flag and only connect then. No clean way to inspect parent process args from the MCP subprocess.

Env-var gate: three lines in bridge.ts, one in session-lifecycle.ts, one each in the two byte launchers. Surgical.

Migration

Anyone running a hand-rolled byte launcher (not start-byte-v2.sh) must add export HYDRA_BRIDGE_AUTOCONNECT=1 before caffeinate claude --channels …. Without it, byte boots silently and never registers with the daemon. The README has been updated with this requirement.


2. feat: guild-level access policy fallback (9d345cc)

Adds guilds: Record<string, GroupPolicy> to access.json. Applies to any channel in the guild that has no explicit groups entry; per-channel groups still wins. Threads guildId through InboundMessage from both gateways (Discord = msg.guildId, Slack = msg.team).

Lets you opt in a whole server in one entry instead of enumerating every channel.

3. fix: strip leading bot mention before daemon command regex (2f6b0b4)

Daemon command interceptors (spawn:, kill:, list sessions, /sessions, help, etc.) all anchor with ^. In a guild channel where requireMention: true, msg.content arrives as <@1234> list sessions, none of the regexes match, the commands fall through to byte. One-line normalization at the top of the isAllowed block fixes all of them at once.

4. feat: make spawn model configurable via HYDRA_SPAWN_MODEL (a9c4fde)

Default stays claude-opus-4-6[1m] to preserve the maintainer's setup. Env var override lets fresh installs without 1M-context credits drop the [1m] suffix and avoid the silent 402 → no-reply failure mode in spawned sessions.

Test plan

  • Fresh install on a machine without ~/.claude-personal — byte and a second Claude window in ~/.claude coexist without flap.
  • @bot list sessions in a guild channel hits the daemon interceptor, not byte.
  • guilds: { <id>: { requireMention: true } } in access.json enables every channel in that server.
  • HYDRA_SPAWN_MODEL=claude-opus-4-8 in the daemon's .env spawns sub-sessions on 4-8 instead of 4-6[1m].
  • Removing HYDRA_BRIDGE_AUTOCONNECT from start-byte-v2.sh reproduces the original silent-no-register failure mode.

kwliang1 and others added 5 commits June 26, 2026 17:07
Add `guilds: Record<string, GroupPolicy>` to access.json. Applies to any
channel in the guild that has no explicit `groups` entry; per-channel
`groups` still wins. Threads `guildId` through InboundMessage from both
gateways (Discord = msg.guildId, Slack = msg.team).

Opt in a whole server in one entry instead of enumerating every channel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Daemon command interceptors (spawn:, kill:, list sessions, /sessions,
help, etc.) anchor with ^ — in a guild channel where requireMention is
true, the user must @bot to trigger gate, so msg.content arrives as
`<@1234> list sessions` and none of the regexes match. Commands then
fall through to the byte session, which may or may not handle them
consistently.

One-line normalization at the top of the isAllowed block strips a
leading mention so the same regex hits in DMs and guild channels.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Default stays `claude-opus-4-6[1m]` to preserve the maintainer's setup,
but the env var override means fresh installs without 1M-context credits
can drop the [1m] suffix and avoid the silent 402 → no-reply failure
mode in spawned sessions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bug
-------
A Claude Code plugin's MCP server boots whenever the plugin is enabled
in CLAUDE_CONFIG_DIR. bridge.ts (the plugin's MCP server) unconditionally
called connectSocket() on startup and registered with the daemon as
`SESSION_ID = process.env.HYDRA_SESSION_ID ?? 'main'`.

Result: every Claude Code session that happens to share the byte
config dir — a separate IDE window, another `claude` conversation,
even an editor extension — also boots a bridge that claims `main`.
The daemon's register handler treats a duplicate registration as a
restart and `.end()`s the existing socket. The kicked bridge then
reconnect-loops (5s timer), gets accepted, kicks the other one, and
the two flap forever. Symptom: typing indicators in Discord but no
replies (the reply tool's tool_call is interrupted by the next
disconnect).

The maintainer's setup hides this by using a dedicated
`CLAUDE_CONFIG_DIR=~/.claude-personal` for byte alone, so no other
session has the plugin loaded. That's avoidance, not a fix — fresh
installs that just follow the README and use `~/.claude` for everything
hit the loop the moment they open a second Claude window.

The fix
-------
Bridge no longer auto-connects. Connect only when
`HYDRA_BRIDGE_AUTOCONNECT=1` is in the environment.

- `start-byte-v2.sh` / `start-slack-byte.sh` set it.
- The daemon's session spawner (`session-lifecycle.ts`) sets it for
  every spawned sub-session (alongside `HYDRA_SESSION_ID`).
- Every other Claude session that has the plugin loaded still gets the
  skills (`/discord:access`, `/discord:configure`) and the MCP tool
  definitions, but the bridge sits dormant — no socket, no `main`
  registration, no flap.

Why an env var, not (a) per-session ID or (b) daemon-side first-wins
-------------------------------------------------------------------
- Per-session ID: bridge has no stable signal to derive one from. We
  can't use SESSION_ID (Claude Code overwrites that), and Claude
  session IDs change on restart so they'd defeat byte-reclaim-on-bounce.
- Daemon-side first-wins: stops the flap, but breaks the legitimate
  "byte died, new byte should reclaim main" path unless we add bridge
  heartbeats and dead-bridge detection. Bigger change for the same
  effect.

Env-var gate is three lines in bridge.ts, one in session-lifecycle.ts,
one each in the byte launchers. Surgical.

Migration
---------
Anyone running a hand-rolled byte launcher (not `start-byte-v2.sh`)
must add `export HYDRA_BRIDGE_AUTOCONNECT=1` before invoking
`caffeinate claude --channels …`. Without it, byte boots silently and
never registers with the daemon.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without it the session boots into an interactive "use my browser?" prompt
on first Claude Code launch in a new config dir, blocking on stdin
forever. No bridge registration, no replies, silent dead bot. Hit this
twice in a fresh setup; the Esc-to-dismiss recovery isn't obvious because
the prompt only renders if you attach to the tmux session.

`--no-chrome` is the safe default — Playwright and other browser-test
tooling don't depend on Claude-in-Chrome. Flip per-session if you
actually want the integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant