Gently 0.22: web-first operator surface, self-managed auth, and observable permissioned autonomy by pskeshu · Pull Request #22 · gently-project/gently

pskeshu · 2026-05-30T19:28:29Z

🌱 This is our first PR opening Gently to outside review — and the first time we're asking the wider community for help. We'd genuinely love yours.

We're getting Gently production-ready for our first user, while making the harness more autonomous and closed-loop between perception and the orchestrator. If you do only one thing: install it, run it offline, and tell us how it feels (see What we'd love your feedback on below). Rough edges are expected — pointing at them is exactly the help we need.

This epoch makes the browser Gently's single control surface and turns the agent into a first-class, observable, permissioned operator: you drive the microscope from an in-page chat, watch from any LAN browser, and see exactly what the agent does on its own and why. It retires the Node/Ink TUI and the desktop napari/Qt windows that were freezing the asyncio loop on the shared instrument PC, adds self-managed accounts with a single-driver control lock, and closes the perception→agent→acquisition loop so the "adaptive" microscope can actually adapt. This opens the 0.22 line (version bumped to 0.22.0.dev0); the prior epoch is tagged v0.21.0.

What we'd love your feedback on

Primary ask: please actually install Gently and run it offline, then drive the web UI. The
hands-on interface feedback is the main thing we want — everything below is ordered by how much it
helps us.

Installation & first run (please do this for real). Set up a fresh environment (venv or uv),
install Gently, and launch it offline (python launch_gently.py --offline). Tell us exactly where
the README / offline-guide steps snag — a missing dependency, an unclear step, an env var
(GENTLY_STORAGE_PATH) that isn't explained, anything that made you guess. Offline mode runs the full
conversational agent + web UI with no microscope attached, so you don't need hardware.
The interface (the part we care about most). Once it's up, drive the web UI: open the home page,
chat with the agent, and click through the tabs. Tell us what's confusing, what's missing, and
what felt good. Concrete reactions ("I didn't know what this tab was for", "I expected chat to do X")
are gold.
What we built technically (one paragraph). Gently now does agentic perception: the agent reads
the live Perceiver (developmental-stage calls per embryo per timepoint) and a new wake-router
(gently/app/wake_router.py) lets it act on developmental events — when perception detects a stage
transition, arrest, or hatching, the router coalesces/throttles those events and wakes the agent for
an autonomous reasoning turn between user messages, closing the perceive->decide->acquire loop.
Known limitation / open question (we'd value your take). We can't yet easily test the
orchestrator's perception-driven reasoning offline without a fresh live run — even though we have
rich recorded sessions on disk (volumes, perception traces, captured event streams). We're designing
an offline replay harness that republishes recorded perception/lifecycle events onto the agent's
EventBus on a controllable clock (built on the existing gently/eval/ EventReplay/EventCapture
substrate) to exercise the real wake-router + agent without hardware. How do you think about testing
agentic patterns like this — what would make you trust that an offline replay faithfully represents a
live run?

Where we'd especially value help

Beyond the line-by-line review checklist, three areas where outside perspective matters most:

The timelapse orchestrator (delicate — actively being streamlined). gently/app/orchestration/timelapse.py drives per-embryo acquisition + perception off wall-clock scheduling (datetime.now() / asyncio.sleep) and has accreted complexity around burst acquisition, per-frame races, cadence, and photodose. It's the most timing-sensitive, safety-critical part of the loop — and the thing we most want to simplify. Ideas for a cleaner scheduling model (and an injectable clock) are very welcome.
Closing the perception → decide → acquire loop (autonomy). The wake-router + permissioned autonomy (gently/app/wake_router.py) is new: perception events wake the agent to reason and adjust acquisition between user messages. We want this loop genuinely closed and safe — feedback on the wake triggers, coalescing/throttling, the AUTO-mode hard backstop, and the photodose envelope (see What needs review) is high-value.
Testing agentic patterns offline. We can't yet easily exercise the orchestrator's perception-driven reasoning without a live run, despite rich recorded sessions. We've written up an offline replay harness — docs/EVAL.md, built on the existing gently/eval/ capture/replay substrate. How do you test and trust realtime agent reasoning under simulated conditions? That question is wide open.

Where this is headed

The orchestrator today reasons largely turn-by-turn. We're building toward richer cognitive capabilities — memory, expectations, hypothesis tracking, longer-horizon planning — surfaced and coordinated through a meta-orchestrator: a higher layer that supervises the per-instrument orchestrator(s), holds campaign-level intent, and is where the closed perceive → decide → acquire loop becomes a deliberate, inspectable reasoning process rather than reflexive wakes. If this is your area (cognitive architectures for autonomous microscopy, multi-orchestrator coordination), we'd love to think it through with you.

What we wanted

Gently had two competing, fragile operator surfaces. napari's Qt event loop could synchronously freeze the agent/web asyncio loop mid-tool-call, and the Ink TUI required a Node build step while giving only one person a view. We wanted the browser to be the only console: drive the agent from an in-page chat, view volumes/images in the existing WebGL projection viewer instead of popping desktop windows, let multiple people watch, and give the headless device layer a readable terminal UI of its own.

Once viewing became open to anyone on the LAN, we needed an access model — but one with no external SSO or database dependency, working on plain HTTP. The principle was "viewing is open; login is an elevation to control, not a gate on the page," with default-deny on hardware-moving routes. That also forced us to close a stored-XSS hole in the events table now that arbitrary perception/agent text is viewable by anonymous watchers.

Perception had been a fire-and-forget loop the conversational agent could not see (it subscribed to a STAGE_DETECTED event the perception path never emits), and there was no working tool to change a running timelapse's cadence. We wanted to close that loop: let the agent read the live Perceiver's per-embryo stage/stability/arrest state, give it real live knobs (interval, per-embryo cadence, photodose budget), and let it act between user messages on decision-moment events. Because every action spends light on a living sample, autonomy is deliberately opt-in and gated (OFF/ASK/AUTO), with a human approval round-trip in ASK and a hard registry backstop on irreversible tools.

Underneath, the file-based Gently3 stores needed hardening: the very files /resume depends on were written with an unlink-then-rename pattern that left a crash window; the new Home tab made campaign loading O(N²); and resuming a session restored embryo state but left the UI looking empty.

Finally, the web UI itself needed to feel like a real instrument console: a calm landing page, a dockable resizable chat panel, an open view-only path, and — most consequentially — imagery that faithfully shows the whole embryo in the three-orthogonal-view layout the perceiver sees.

What got made

Web-first migration & napari retirement

launch_gently.py no longer spawns Node/Ink at all — it starts the in-process uvicorn viz server, prints a terminal banner (URL / device status / storage / logs), auto-opens the browser, and blocks on asyncio.Event().wait() until Ctrl-C. The agent is driven from a floating in-page chat over /ws/agent; volumes and images render in the existing WebGL viewer instead of desktop windows.

--no-browser flag added; the Node/dist requirement and interactive picker removed (run_ink_picker left as dead-for-reference).
view_volume rewritten to call viz_server.open_volume_in_browser → broadcasts an open_volume WS message → ProjectionViewer.open() (WebGL raymarcher), with file_path mapped back to embryo+timepoint; batch_lightsheet now pushes each image via agent.push_viz.
Deleted gently/app/tools/data_tools.py (4 Databroker tools) and examples/example_napari_visualization.py (522 lines), with no dangling references.
@-tool / /-command autocomplete added to the web composer, fed by a new AgentBridge.get_tools_json() sent in the connect frame; tool rows show args + a one-line result_summary with an error heuristic.
Room-light SwitchBot toggle wired device-layer → client → /api/devices/room_light → a hidden-until-available header button.
New gently/hardware/console_ui.py: a TTY/Unicode-safe terminal UI (Windows VT, NO_COLOR, cp1252 ASCII fallback) for the device layer's step progress and a plain-language startup-failure panel.

Auth, roles & control gating

A dependency-free AccountStore (accounts.py) stores users in <storage>/auth/users.yaml with PBKDF2-HMAC-SHA256 hashes (200k iterations) and signs stateless session cookies. auth.py gained an account-mode branch to resolve_role: when users exist, identity comes from the signed cookie (operator/admin → control, everyone else → view); otherwise it falls back to legacy loopback / X-Gently-Token mode.

Stateless cookie: HMAC-SHA256 over username|expiry, urlsafe-b64, 1-week TTL, HttpOnly + SameSite=Lax, Secure only over HTTPS (so it survives plain-HTTP LAN); secret.key created on first run.
auth_routes.py: /login page plus /api/auth/login|logout|me and an admin-only POST /api/auth/users; launch_gently.py bootstraps a random-password admin on first run and prints it once.
REST hardware/mutation routes gated by Depends(require_control) (the perception chat POST newly gated); /ws/agent and /ws enforce role from the cookie — viewers can watch but cannot take the single-driver lock or send marking messages.
events.js XSS fix: escapeHtml applied before injecting <mark> highlight tags at every innerHTML site.

Agent↔perception integration & permissioned autonomy

A new WakeRouter (gently/app/wake_router.py) subscribes to critical events plus DETECTOR_EVALUATED, filters for real developmental transitions/arrest (dropping role=test pseudo-stages, recheck-skips, and the no_object sentinel), coalesces bursts (20s) and throttles non-critical wakes (120s, critical bypasses), then fires agent.run_wake_turn.

An asyncio.Lock turn-lock serializes user and wake turns over the shared conversation; bridge.stream_response now aclose()s the generator in a finally so the lock always releases on cancel/error.
A hard backstop in registry.py blocks {set_laser_power, remove_embryo, stop_timelapse} whenever _autonomous_active is set, regardless of mode; wiring verified end-to-end.
New tools: set_autonomy (off/ask/auto), modify_timelapse_interval, set_embryo_cadence, set_photodose_budget/get_photodose_status, read-only get_recent_perceptions; get_stage_history/predict_hatching now prefer the live Perceiver.
ASK mode round-trips an Approve/Modify/Skip picker (300s timeout → skip); a deterministic ## Perception (live) snapshot is injected into the system prompt bypassing the cache.
Autonomous turns are observable: bracketed with autonomous_start/stream_end, broadcast to all web clients, and persisted distinctly as "Gently · autonomous" in chat_display.json.

Storage, session resume & performance

FileStore._write_yaml and save_conversation now fsync the temp file and use os.replace() (atomic, overwrites on Windows), closing the crash window on the files /resume depends on. FileContextStore._read_yaml gained a parse cache keyed by (mtime, size) handing out deepcopies, collapsing campaign-tree builds from O(N²) to O(N).

New browser resume flow: GET /api/sessions + GET /api/sessions/{id} read the live FileStore; control-gated POST /api/sessions/{id}/resume calls agent.resume_session and broadcasts session_changed.
rehydrate_session repopulates the in-memory image store from on-disk projections and rebuilds the tracker's detection_reasoning/projection_uids/per-embryo stage from predictions.jsonl; /ws connect overrides the stale tracker session id with the live agent session.
Resumed sessions derive a best-effort chat transcript from conversation_history when chat_display.json is missing.
Projection responses switched to a content-aware mtime+size ETag (was immutable + uid) so regenerated JPEGs refresh; new cheap helpers (recent_session_ids, list_embryo_ids, list_projection_timepoints) power the Home aggregator with a component-wise path-traversal guard.

UI surfaces: Home, chat panel, login, imaging

A new HomeApp (home.js) plus a Home tab is the default landing page with three read-only cards (recent sessions with Resume, recent plans with progress chips, a recent-images strip). The chat was rewritten from a floating FAB popup into a docked <aside> side panel inside a new .app-shell flex row.

Chat panel: overlay-by-default slide-over (no reflow) with opt-in pin-to-dock (a real pushing column), left-edge pointer-capture drag-resize (clamped, persisted to localStorage, double-click reset), header "Agent" toggle with connection dot + unseen-activity badge, Ctrl/Cmd+J, a sticky ASK-approval slot, and pin-to-bottom autoscroll with a "↓ N new" pill.
imaging.py: 3D volumes now render the full three-orthogonal-view layout via projection_three_view (XY|YZ over XZ) instead of the old A|B side-by-side flat XY max — the wrong image for biologists.
Two new routes in sessions.py: GET /api/home/recent-images (walks the store on disk, no YAML parse / no pixel decode, scan budget default 200 / clamp 500) and GET /api/sessions/{id}/projection (component-wise ancestor check defeating sibling-prefix dirs).
Wizard no longer auto-runs on WS connect (gated behind server.wizard_autorun, default off; launched on demand from Home); login.html gained a "Continue without signing in →" view-only escape hatch.
ProjectionViewer gained a ResizeObserver + gently:layout-changed listener (the 3D WebGL canvas previously never resized); filmstrip got a side reasoning panel and an object-position:left-center thumbnail crop.

What needs review

High risk

Autonomy backstop completeness — the hard backstop blocks only {set_laser_power, remove_embryo, stop_timelapse}. During an AUTO turn the agent can still call modify_parameters (per-embryo 488 power within the 2–6% clamp, slices, exposure), modify_timelapse_interval/set_embryo_cadence (interval as low as 1s = high dose), and queue_burst. set_photodose_budget is the only dose ceiling and is DISABLED by default. Confirm an unsupervised AUTO agent's photodose envelope matches intent.
Resume state consistency: per-session subsystems not re-pointed — interaction_logger, event_capture, decision_log, the timeline manager, and AgentMemory are bound to the original session_id at agent construction and are NOT re-initialized on resume. After a browser resume, subsequent activity can still log into the OLD session folder, silently splitting records across two directories. Most material correctness gap in the storage theme.
Control lock does not gate take_control — the observer gate covers only chat/command/cancel; any observer can seize the wheel with no auth/confirmation, and wizard choice responses are ungated. Two clients can race during the startup wizard on the single shared bridge; disconnect cleanup only clears _choice_futures when the last client leaves, so a mid-flight choice future from a departed client can linger.

Medium risk

Low risk

Try it

# Collaborators
git fetch origin
git checkout 0.22-dev

Then run the two processes:

# Device layer (separate process, gets its own terminal UI)
python start_device_layer.py

# Agent + web (starts the in-process viz server and opens the browser)
python launch_gently.py

Auth note: on first run the server bootstraps a random-password admin and prints the one-time password once in the startup banner — capture it then. Set GENTLY_NO_AUTH=1 to disable accounts entirely (legacy mode: loopback gets control, remote callers need X-Gently-Token). Use --no-browser to skip auto-opening the browser.

Notes

napari is fully retired from the agent runtime. It remains only in diagnostics/examples and as deprecated guarded shims (sam_detection.show_in_napari, the multi_embryo plan); no live agent path opens a Qt window.
No SQLite. All data stays in file-based stores under D:\Gently3\ (human-browsable YAML/JSONL/TIF). This epoch hardened those writes (atomic os.replace + fsync) and added a parse cache.

…tages EmbryoState now carries position_coarse (bottom-camera detection or manual map placement) and position_fine (future SPIM-objective alignment) as separate fields; stage_position becomes a derived property (fine ?? coarse) so every existing call site keeps working. FileStore round-trips both stages; legacy position_x/position_y on disk backfill into coarse on read. Phase 1 of the Map-as-embryo-home arc -- schema only, no UI changes yet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> (cherry picked from commit 3e41058)

Adds an EMBRYOS_UPDATE event type and wires ExperimentState's mutations (add_embryo / remove_embryo / assign_nickname / batch clear / editor finish) to publish a full embryo-list snapshot through the agent. The viz server's existing wildcard subscription forwards it to all browser clients, so Phase 3 can render embryos on the Map without polling. ExperimentState stays bus-agnostic via an on_embryos_changed observer hook; the agent wires the publisher at init. Phase 2 of the Map-as-embryo-home arc. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> (cherry picked from commit 617e54c)

Adds an embryo layer to the Map between the axes and the live stage marker. Each embryo renders at its resolved XY (fine if SPIM-aligned, else coarse): coarse-only as an outlined lavender ring, fine-calibrated as a filled disc, both labelled with the embryo number. The layer is a pure read of EMBRYOS_UPDATE events plus an initial /api/embryos/current snapshot so a Map page opened mid-session shows existing embryos without waiting for the next mutation. Read-only at this phase; click / drag / delete will land in Phase 5. Phase 3 of the Map-as-embryo-home arc. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> (cherry picked from commit 144d9fc)

…ol routes Adds Phase 4 of the Map-as-embryo-home arc. Auth (4a) --------- New gently/ui/web/auth.py introduces a two-role model: localhost is always control, remote callers default to view and need X-Gently-Token matching GENTLY_CONTROL_TOKEN to upgrade. Bottom-camera stream start/stop POST routes now Depends(require_control), so a remote browser can watch the stage but cannot drive hardware until an operator provisions the shared token. Marking canvas seeded (4b) -------------------------- VisualizationServer.start_marking_session takes initial_markers (pixel positions from SAM); they're seeded into the session state and the marking_image broadcast so the canvas opens with SAM detections pre- placed. wait_for_marking now also computes stage_x_um / stage_y_um from the operator-confirmed pixel positions, so callers can drop the result straight into agent.experiment.add_embryo. marking.js renders the seeded markers immediately and adapts the instruction string. detect_embryos -> web (4c) -------------------------- The agent tool now SAM-detects with open_editor=False (napari path bypassed), then hands off to the web Marking canvas via agent.viz_server.start_marking_session(initial_markers=...) and awaits wait_for_marking. Confirmed embryos land in agent.experiment, which broadcasts EMBRYOS_UPDATE -> Devices > Map shows them as coarse rings. Falls back gracefully if viz_server is unavailable or the operator never confirms. edit_embryos / manual_mark_embryos still use napari; deferred to a later phase. gently/ui/napari_viewer.py kept intact for offline use. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> (cherry picked from commit 4fbb9ed)

Map becomes the home for embryos rather than a viewer. Backend (data.py) ----------------- PUT /api/embryos/{id}/position {x, y} updates position_coarse and CLEARS position_fine -- the operator overriding the sighting invalidates any prior SPIM-objective alignment derived from the old coarse, so it must be re-run. DELETE /api/embryos/{id} removes via ExperimentState. Both endpoints Depends(require_control), so only the diSPIM box (or a remote session with X-Gently-Token) can mutate the embryo list. Both fire EMBRYOS_UPDATE through the observer hook for live Map refresh. Frontend (devices.js + main.css) -------------------------------- First click on an embryo selects it (dashed lavender ring, brighter label -- the "picked up" state). Click on empty map space drops it there with a confirm prompt; Delete/Backspace removes with confirm; Escape clears the selection. New embryos still go through the bottom- camera Marking canvas -- the Map is a schematic, not a satellite, so adding without a visual reference would be guessing. Keyboard handler is tab-aware (Devices tab + Map view only) and ignores keystrokes while an input/textarea/select has focus so it doesn't hijack the chat composer. Smoke-tested end-to-end via ASGI: PUT clears fine correctly, DELETE fires notify, error paths return 400 / 404 / 503. Phase 5 of the Map-as-embryo-home arc. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> (cherry picked from commit 8f6553e)

Tiger persists JoystickEnabled in non-volatile card settings. If a prior session ever ran SaveCardSettings while the joystick happened to be off, every subsequent boot inherits that state and the physical controller is dead. We don't run SaveCardSettings ourselves, so the only way to recover the joystick was a manual property write -- and there was no way to know the state had drifted until the operator tried to use it. DiSPIMXYStage gains enable_joystick(True) that writes JoystickEnabled + verifies read-back (same pattern as set_firmware_limits). device_layer calls it at boot right after the firmware soft limits are applied. Failure is non-fatal: agent can still drive the stage; we just log loud so the operator knows the joystick is unavailable. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> (cherry picked from commit 808fe81)

Two improvements to the bottom-camera live thumbnail on the Map. Crosshair (FOV reticle) ----------------------- A centre crosshair anchored to the image, not the viewer rect. SVG sibling of <img>; an inner <g> receives the same translate/scale as the image (in viewBox units, via the SVG transform attribute), so the lines track the FOV centre through zoom/pan instead of staying pinned to the container centre. Transform sits on <g> rather than the SVG element so the renderer re-rasterises at each zoom step -- otherwise 1px strokes get bitmap-scaled and go blurry. vector-effect: non- scaling-stroke keeps them 1px at any zoom. Default colour amber (var(--map-warm)). Zoom / pan ---------- Scroll-wheel over the camera stage zooms in/out (1x to 8x, ~15% per notch) centred under the cursor. Click and drag pans when zoomed. Double-click resets to 1x. Pan is clamped so the image centre stays inside the visible window. Stream stop also resets the transform so the next session starts at 1x. wheel listener is passive:false so the page doesn't scroll under the operator's hand. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> (cherry picked from commit f7a13d6)

Substrate for testing candidate orchestrator architectures without running real hardware. Three layers, all under gently/eval/: EventCapture Wildcard-subscribes to an EventBus; appends every Event to a per- session events.jsonl (D:/Gently3/sessions/{id}/events.jsonl). High-volume telemetry (DEVICE_STATE_UPDATE, BOTTOM_CAMERA_FRAME) is filtered out by default so 12-hour sessions don't drown the meaningful events under polling noise. Auto-starts in agent init. Handles non-JSON-native payloads (numpy, Path, datetime, set, Enum, dataclass, bytes) via a fallback serialiser. EventReplay Reads events.jsonl back; publishes via EventBus.publish_event() so original timestamps survive (candidates can reason about historical cadence). Fast mode (no sleep) and real-time mode with optional time_scale. event_types() for cheap pre-flight histogramming. DecisionLog + Decision + DecisionTrigger Per-session decisions.jsonl record. Each Decision captures WHY the agent woke up (trigger + detail), WHAT it saw (context summary, recent event ids, prompt hash), WHAT it did (tool calls, response text), and HOW it went (duration, error). Substrate for diffing candidate decisions later. ShadowRunner + OrchestratorCandidate + NoOpCandidate Candidates subscribe to an EventBus alongside production but their decisions are LOGGED, not enacted -- never permitted to touch hardware by construction. ShadowRunner hosts a set of candidates, isolates candidate failures from each other and from the live bus. NoOpCandidate ships as worked-example and proof-of-life. scripts/replay_session.py CLI: replay a session by id-prefix, with optional --candidate attachment, --real-time + --time-scale, and --histogram pre-flight. 15 unit tests in tests/test_eval.py covering capture filter, non-JSON payloads, thread safety, replay round-trip (event_type / source / data / correlation_id / timestamp all preserved), real-time cadence, time_scale, malformed-line tolerance, decision-log round-trip, shadow forwarding to multiple candidates, candidate-failure isolation, and event-type whitelisting. Phase 6 of the Map-as-embryo-home arc, unlocking offline iteration on the world-model + decision-moment work (operator-action events, wake triggers, tiered context). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> (cherry picked from commit d69cc21)

Completes the second half of the shadow-mode substrate. The agent now writes a Decision row to a per-session decisions.jsonl every time ConversationManager.call_claude returns — success or error. Pairs with the events.jsonl from Phase 6a so a candidate replay can be diffed against production turn-by-turn. What a production Decision captures ----------------------------------- - trigger always USER_MESSAGE for now (event/tick triggers land with the wake-router phase) - trigger_detail user message excerpt (200 chars) - tool_calls aggregated across the multi-step tool loop — every tool_use block Claude emitted during this turn - response_text final assistant text - prompt_hash short SHA-256 of (system_prompt, conversation_history) snapshotted BEFORE the tool loop appends to history. Same hash = same input; safe to compare candidate decisions against this one. - duration_ms wall time of the whole turn - error set on the failure path; the exception still re-raises to the caller so the existing error UX is unchanged Wiring ------ - gently/eval/decision_log.py new prompt_hash() helper (shared by production + candidates so the fingerprint format stays consistent) - gently/harness/conversation.py ConversationManager gains decision_log field; call_claude collects tool_use blocks across every Claude round, then writes one Decision in both success and except branches. Best-effort: a DecisionLog write failure never breaks the live agent. - gently/app/agent.py _init_decision_log opens session_dir/decisions.jsonl and assigns to self.conversation.decision_log; stop_decision_log mirrors stop_event_capture for shutdown cleanliness. - tests/test_eval.py +5 tests: prompt_hash stability and shape-tolerance; success path captures tool_calls + response + prompt_hash + duration; error path captures error + re-raises; no-log path is a clean no-op. Phase 6f. 20/20 tests green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> (cherry picked from commit 75d7c9d)

Two pieces of the closed-loop paradigm, tightly coupled. Operator-action events (vocabulary) ----------------------------------- Three new EventType values for human-driven mutations. They're distinct from EMBRYOS_UPDATE because they carry INTENT, not just state delta — candidates can reason about "the operator just did X" without typing that fact into chat. OPERATOR_EDITED_EMBRYO PUT /api/embryos/{id}/position payload: embryo_id + old/new coarse + fine_position_invalidated OPERATOR_REMOVED_EMBRYO DELETE /api/embryos/{id} payload: embryo_id + last_position OPERATOR_MARKED_EMBRYOS detect_embryos web-editor finish payload: embryo_ids + count + stage_origin + pre_edit_count Map-edit routes publish via server.agent_bridge.agent._event_bus. detect_embryos publishes only when the operator actually confirmed via the web canvas (operator_marked flag) — if the editor was skipped, the SAM list still landed in experiment.embryos but it wasn't operator- confirmed, so no operator event. ReactiveCandidate (first real candidate) ---------------------------------------- gently/eval/candidates.py — pure-rule shadow orchestrator with a tiny world model (embryos + last stage + last error). Reacts to: EMBRYOS_UPDATE ingest, silent STAGE_MOVED ingest, silent OPERATOR_EDITED_EMBRYO propose recalibrate_embryo if fine was invalidated OPERATOR_MARKED_EMBRYOS propose calibrate_all_embryos for the new set OPERATOR_REMOVED_EMBRYO propose forget_embryo for cache tidy-up ERROR_OCCURRED escalate first occurrence, suppress same msg within 30s The thesis being tested: a rule-based responder can do the routine bookkeeping that today only happens when the operator chats with Claude. Shadow mode will tell us how often that thesis holds in practice. Tests ----- +7 ReactiveCandidate tests covering silent ingest, conditional recalibrate, marked-set proposal, removal tidy-up, error escalate/suppress, and a full event-stream-through-replay smoke that proves the captured jsonl alone is sufficient input to drive a candidate to a decision log. 27/27 green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> (cherry picked from commit 0a97563)

Distillation of the design conversation that produced the paradigm/closed-loop branch: - The four orchestrator roles and which one creates the friction - Web/chat reconciliation patterns A/B/C/D - Why 'turn' is the wrong unit and 'decision moment' is right - Wake-router model (events + schedule + user input) - Tiered world model (snapshot / digest / pull tools / lazy summariser) - Five testing primitives ranked by payoff - Coarse-vs-fine schema as 'measurement provenance' - Map as collaborative world model - Revolutionary trajectories: plans-as-goals, compounding learning, collaborative world model, reverse-mode microscopy, continuous shadow - What is built (commit table), what is not yet, and the open questions for the next iteration Future-self / new-collaborator reference, not a transcript. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> (cherry picked from commit 938baf8)

The console output from launch_gently.py is the most informative surface in the system -- calibration progress, plan executor state changes, perception decisions, MMCore callbacks. Until now it lived only in the terminal and on-disk gently_*.log. This bridge fans it onto the EventBus too, so the Events page in the viz server mirrors what the operator would otherwise have to alt-tab to a terminal to see. Backend ------- gently/core/log_bridge.py LogToBusHandler(logging.Handler) -- emit() publishes EventType.LOG_RECORD with {level, level_name, logger, message, module, func, line, ts_ms, exc_text?}. Per-thread re-entry guard prevents a subscriber's own log call from spawning a cascade. Loggers in gently.core.event_bus and gently.core.log_bridge are never bridged. configure_log_bridge() reads three env vars: GENTLY_LOG_BUS off/on (default on) GENTLY_LOG_BUS_LEVEL threshold (default INFO) GENTLY_LOG_BUS_INCLUDE_THIRDPARTY also bridge aiohttp/uvicorn/ bluesky/anthropic/httpx/httpcore (default off -- keeps the page readable; durable copy still in gently_*.log) Idempotent: re-attaching is a no-op. gently/core/event_bus.py EventType.LOG_RECORD added, plus inclusion in _NO_HISTORY_TYPES (log records can fire hundreds-per-minute and would crowd out the bounded history deque used for "meaningful" events). launch_gently.py configure_log_bridge() runs right after configure_logging() in main(). Single line, env-controlled, no API changes. Frontend -------- gently/ui/web/static/js/events.js addEventToTable() branches on LOG_RECORD vs everything else. Log rows render with a level-coloured badge (DEBUG / INFO / WARN / ERROR), the logger name greyed before the message, and click-to-expand reveals the full payload including stack traces. gently/ui/web/static/css/main.css Four new .event-type-badge.log-{debug,info,warn,error} classes matching the existing badge palette. Monospace font for log message, red tint for exception lines. websocket.js already forwards everything except DEVICE_STATE_UPDATE / BOTTOM_CAMERA_FRAME to the events table; LOG_RECORD inherits that behaviour automatically. Tests ----- tests/test_log_bridge.py: 10 tests covering pass-through, level threshold, exception capture, re-entry guard, bridge-internal logger skip, GENTLY_LOG_BUS=off path, default attach, idempotency, third-party exclusion default, opt-in third-party. 10/10 green; full paradigm suite 42/42. Phase 9. On paradigm/closed-loop only. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> (cherry picked from commit 6318691)

SwitchBot Bot (WoHand) as a Bluesky/ophyd-protocol device over `bleak` (set('on'|'off'|'press') -> Status, read/describe). Controls the diSPIM room light used for bottom-camera imaging. Adds a standalone FastAPI test GUI under diagnostics/ (buttons + morse blinker). Dep in requirements_device.txt. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> (cherry picked from commit a790b02)

Bluesky/ophyd-protocol device for the ACUITYnano Peltier controller. set(target) blocks until "[ SYSTEM LOCKED ]"; read() reports water temp, setpoint, state. Serial + MQTT transports plus a mock backend for hardware-free testing. Deps in requirements_device.txt. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> (cherry picked from commit 5f7912e)

… the agent Config-gated registration of the SwitchBot and ACUITYnano devices alongside the MMCore devices. Adds /api/temperature/{set,status} REST endpoints + client methods, and set_temperature/get_temperature agent tools so the agent can hold or shift sample temperature (C. elegans development rate). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> (cherry picked from commit 249ae4a)

New EventType.EMBRYO_TERMINATED fires whenever an embryo's imaging stops for any reason (no_object terminal, configured stop condition, errors, user removal). The orchestrator emits it from both the no_object terminal path and the per-condition stop check. TimelapseStateTracker handles it by marking the embryo complete and carrying the completion_reason through for the UI. Single source of truth for "an embryo has stopped" — downstream listeners (filmstrip terminated badge, summary stats) now have one event to subscribe to instead of polling embryo state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ive view The compact SPIM thumb in the metrics strip is now a button. Click opens a draggable, fixed-position popout (~560×480) that mirrors the same live frame stream with bigger imagery, the embryo label, and a close affordance. Hover/focus on the thumb shows a ⤢ chip hinting the interaction; the chip is hidden until a frame actually arrives. The popout reuses SpimLivePreview's apply-on-render plumbing so no new stream is opened — same data path, second render target. Keeps the calibration profile compact by default while letting the operator pull a properly-sized view when they need to read fine structure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The board's metric columns are restructured around developmental time, the actual question this view answers ("is this embryo on pace?"): - 'clock' — elapsed wall-clock time in the current stage - 'stereo' — stereotypic developmental position at 20 °C reference - 'pace' — clock / stereo ratio; 1.0× means on reference pace These replace 'confidence' (never populated meaningfully) and 'rate' (misleading for slow embryos). 'eta' is now hatch-time, pace-corrected. Migration: dashboardConfig loaded from localStorage runs an idempotent filter that drops 'confidence' / 'rate' from the saved column list and inserts the three new columns in the right slots. Existing user configs upgrade silently on next load. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Opens the v0.21 development cycle on this branch. Targets per the KANBAN roadmap: cross-session resume, sacrificial vocab alias, campaign template loader (Path B), LDM Phase 1 MVP. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Modern SwitchBot Bot firmware (≥ v4.x) replies to press/on/off with a 3-byte status frame: 0x05 + battery% + flag bits. Older firmware returned the bare 0x01 success byte. The strict 0x01-only check raised SwitchBotError for any current-production Bot even though the press had landed (visible on the controlled load). Widen _RESP_OK to accept either prefix. Both indicate the command reached the actuator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wires the BLE-attached SwitchBot Bot that toggles the room light into the device-layer config so DeviceLayerServer registers it on boot. Plans address it via `bps.mv(room_light, 'on')`. MAC is the bot already mounted on the rig. Reached over BLE via the TP-Link UB500 dongle on this desktop — RSSI -70 dBm, well within reliable range. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a dedicated status query (BLE 0x57 0x02) that returns battery percentage and firmware version without touching the actuator. Result is cached on the device instance and surfaced through read() / describe() as `<name>_battery_pct` and `<name>_firmware`, so the device-state stream picks them up automatically once polled. Verified on a Bot v4.2 over the TP-Link UB500: response `01 64 42 64 00 00 00 66 00 10 00 00 00` parses as battery 100%, firmware 0x42 (v4.2). Importantly, action-command responses are NOT used as a battery source — their byte-1 field looks like battery (an empirically 0x48-shaped value) but isn't: the dedicated query on the same bot reads 100%, so byte 1 of an action response is some other firmware-internal counter. Documented inline so the next reverse-engineer doesn't fall into the same trap. Periodic polling cadence is left to the caller; hourly is plenty for a battery that moves over months. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Begin the TUI->web convergence: - Floating agent-chat window in the web UI (agent-chat.js/.css, wired into index.html) connecting to /ws/agent: streaming text/thinking/tool calls, choice pickers, applied-spec cards, slash-command routing. All untrusted text is escaped before insertion. - Single-driver control lock in agent_ws.py: only the holder may drive the agent; other clients are observers with a "Take control" banner. Fixes the latent shared-conversation corruption when >1 client connects. - launch_gently.py no longer spawns the Node TUI. It starts the agent + viz server, prints a launch banner (URL, device status, storage, Ctrl-C), auto-opens the browser (--no-browser to suppress), and serves until interrupted. Removes the Node/dist requirement; --resume resolves to latest (interactive picking deferred to the browser). TUI source kept in-tree (reversible). Auth not yet added: the browser is now the only control path and is unauthenticated on the LAN. Bind to 127.0.0.1 or trust the LAN until self-managed accounts land. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Consolidated plan from the codebase audit: robustness gaps, biologist-UX gaps, complexity audit (legitimate vs refactorable + ~4000 lines of dead duplicate code), frontend audit, startup/topology, multi-user auth + single-driver control arbitration, a 5-day plan, the web-only convergence roadmap (milestones A-F), and progress to date. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…hat/control-lock/launcher work)

Auth — self-managed accounts gating the control surface: - accounts.py: file-backed user store (PBKDF2 password hashing, HMAC-signed session cookies, first-run admin bootstrap); roles viewer/operator/admin. - auth.py: resolve_role recognizes the session cookie in account mode (operator/admin -> control, else view); legacy localhost/token path kept when no accounts are configured. - routes/auth_routes.py: /login page + /api/auth/login|logout|me and an admin-gated /api/auth/users. - pages.py: main page redirects to /login when accounts require it. - agent_ws.py: /ws/agent authenticates via the session cookie; only operators/admins may hold/take the control lock, viewers watch only; the holder label is now the username (fixes ambiguous "a browser window"). - launch_gently.py: initializes the account store and prints first-run admin credentials in the banner (GENTLY_NO_AUTH=1 disables auth). - templates/login.html: clean on-brand sign-in page. Chat UX: - Activity indicator: instant "Working…/Thinking…" feedback with animated dots across the stream lifecycle; tool rows show spinner -> check. - Professional restyle (Inter Tight / JetBrains Mono, role labels, status pill); brand cell/embryo favicon replaces the Gemini-like sparkle; dropped the "Microscope assistant" subtitle; header shows signed-in user + Sign out. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The serve loop blocked on a bare asyncio.Event().wait(), which the Windows Proactor loop won't interrupt on Ctrl-C, leaving the server unstoppable. Install SIGINT/SIGTERM handlers (loop.add_signal_handler, falling back to signal.signal + call_soon_threadsafe on Windows) and poll a stop Event on a short interval so the interrupt surfaces and shutdown runs cleanly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Opening the dashboard no longer redirects to /login — viewing is open to everyone (the "watch like it is now" model). Login is an elevation to the control role, not a gate on the app. - pages.py: drop the /login redirect on the main page. - agent_ws.py: anonymous clients may connect to /ws/agent and *watch*; only authenticated operators/admins can hold/take the control lock (drive actions stay gated). No more close-on-unauthenticated. - agent-chat.js: distinguish anonymous ("Viewing — sign in to control", with a Sign in button) from a viewer-role account ("view-only"); header button is Sign in / Sign out accordingly. API model: observable (read) endpoints + watching the agent need no auth; only inputable (control) actions are gated via require_control / the control lock — auth is not attached to every endpoint. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Complete the inputable-action gating beyond the REST routes already covered in data.py: - chat.py: the per-timepoint VLM follow-up (POST /api/perception/chat/...) now requires the control role — it spends API budget and writes traces, so anonymous viewers can't trigger it. - websocket.py (/ws): marking actions (embryo_marked / marking_update / marking_done / marking_redetect) are gated to control-role clients via the session cookie; pure read/presence messages stay open so anyone can watch. Deliberately NOT gated here: device-layer ingest (POST images/volumes — a machine trust domain, would break under account mode where localhost is no longer auto-control) and campaign mutations (their own mesh scope auth). These need a separate machine-token pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The conversation is now the same for every client of a session — operators and viewers, live and on reconnect/refresh. - Broadcast: user messages and the agent's streamed reply (text/thinking/ tool calls/choice requests) go to ALL connected /ws/agent clients via a raw-websocket registry, not just the driver. Observers watch live. - History: a display transcript is accumulated server-side and persisted to <session>/chat_display.json (user/agent/tool turns, capped to 500). On connect each client is sent a "history" message and rebuilds the transcript, so refreshes and late joiners see the full conversation. - Choice pickers are interactive only for the control holder; observers see them read-only and only the holder's choice_response resolves. - Client: handles "history" (rebuild) and "user_message" (live echo with author); stops double-echoing the sender's own chat (it now arrives via the broadcast); slash commands still echo locally (not broadcast). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… bridge, wake-router Closes the gap where perception ran as a fire-and-forget tool the agent never saw. agent.perceiver is the same Perceiver the orchestrator drives, so all reads are direct and side-effect-free. B1 (read-only): - get_recent_perceptions tool returns live per-embryo stage / stability / arrest / trajectory + the perceiver's reasoning. - A deterministic '## Perception (live)' section is injected into the system prompt (build_perception_snapshot), bypassing the AI context-summary cache so stage data is never stale. B2 (bridge + unify): - The perception path's DETECTOR_EVALUATED now mirrors into EmbryoState (latest_developmental_stage) on stage CHANGE only; role=test pseudo-stages, recheck-skips, and the 'no_object' sentinel are filtered out. This fixes the long-standing dead wiring (the agent subscribed to STAGE_DETECTED, which the perception loop never emits). - get_stage_history / predict_hatching now read the live Perceiver (hatching time computed from gently_perception's own organism stage durations), falling back to the DevelopmentalTracker. B3 (decision-moment wake-router, opt-in / default OFF): - gently/app/wake_router.py wakes the agent on stage transitions + critical events (hatching / arrest / embryo-terminated / errors), coalesced and throttled (critical bypasses the throttle); deferred events are re-armed, not dropped. Enabled via the set_autonomy tool. Full autonomy on a wake; device limits still bound it. - A new agent turn-lock serializes wake turns against user turns on the shared conversation history; run_wake_turn drives the normal streaming pipeline. Review fixes (adversarial pass): - bridge.stream_response now closes the agent generator in a finally, so the turn-lock always releases on cancel/error (was: stalled the next turn). - wake-router evaluates its guards before draining _pending so co-pending critical events survive an in-flight turn. - 'no_object' no longer mirrors as a developmental stage or triggers a wake; autonomous turns log when they auto-cancel an interactive picker. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…efreshed prompt Builds on the perception integration: gives the agent real live control over acquisition, makes its autonomous decisions visible and gated, fixes message interleaving, and updates the long-stale system prompt. Live control (A/B): - modify_timelapse_interval (whole run) + set_embryo_cadence (one embryo) change cadence on a running timelapse and correctly re-anchor next_due_at (closes the gap where no working live-interval tool existed). set_embryo_cadence reports a no-op instead of a misleading reschedule message. - set_photodose_budget (caps cumulative exposure; resumes budget-paused embryos on a raise, only when they're back under the cap) + get_photodose_status. - (snap-mode hardware plumbing deferred — needs real-hardware validation.) Observability (C): - Autonomous wake turns now stream to every web chat client and persist to the transcript, rendered distinctly: a 'Gently woke up — <trigger>' banner + a 'Gently · autonomous' bubble. bridge.register_display_broadcaster wires the previously-dead on_message_callback; run_wake_turn brackets the turn with autonomous_start/stream_end. Previously autonomous turns were invisible. Interleaving (D): - Typing while the agent is busy now QUEUES (with per-message remove + clear-all + auto-drain on idle) instead of cancelling; a separate Stop button replaces Send-as-Stop; the composer shows 'working' vs 'acting autonomously'. Slash commands no longer wedge the composer busy. Autonomy modes (E): - OFF / ASK / AUTO tri-state via set_autonomy, switchable mid-run. ASK proposes a change and waits for Approve/Modify/Skip in the chat (round-tripped through the wake choice channel, bounded by a timeout->Skip, lock released via aclose). Hybrid backstop: a few irreversible tools (set_laser_power, remove_embryo, stop_timelapse) can never run during an autonomous turn, enforced in the registry regardless of mode. System prompt (F): - Replaced the fictional cv_analyze 'CV subagent' block with an accurate Perception & Analysis section; added an Adapting-Acquisition (gentleness-first) + Autonomy (OFF/ASK/AUTO) section; removed the nonexistent enable_preset_detector reference and the stale interval tool name. Review fixes: turn-lock released on disconnect-cancel (CancelledError handled as 'cancelled'); picker futures discarded on timeout/cancel; deferred critical wakes re-fire promptly after a turn; Escape-cancel + the autonomous history flag reset cleanly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…-dock) Replaces the floating popup bubble with a professional VSCode-style side panel, per a UI/UX + biologist-cognition consult that converged on overlay-by-default with an opt-in pin-to-dock. - App shell: header/navbar stay full-width; a new .app-shell flex row holds .app-main (content) + the chat <aside>. Overlay mode = absolute slide-over (transform, no reflow) so the live viewer stays steady; pin-to-dock makes it a real pushing column (min-width:0 lets canvases shrink). - Killed the floating FAB → a labeled 'Agent' toggle in the header with a connection dot + an unseen-activity badge (wake/approval/notification while closed); Ctrl/Cmd+J toggles it. - Docked grammar: flush to edge, no radius, directional edge shadow (overlay) / seam (dock), 220ms transform slide, left-edge drag handle to resize (clamped 320..min(560,45vw), persisted, double-click reset, debounced dock reflow). - Fixed a latent bug: the Three.js 3D viewer had no resize handler at all — added a ResizeObserver + a 'gently:layout-changed' event so it follows the dock and window. - ASK approvals pin to a sticky slot above the composer (never scroll away); autoscroll pins-to-bottom only when already there, else a '↓ N new' pill. - Light/dark shadow + seam theme vars. Review fixes: resize handle uses pointer capture + pointercancel + primary-button guard; the 'N new' pill counts items not stream chunks; Ctrl/Cmd+J ignores auto-repeat and won't fire while composing; opening the panel re-pins to latest. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replaces the auto-popping startup wizard with a proper Home landing page (the new default tab). - Home tab: a scrollable landing with a 'Start / continue an experiment' button + a thin status line, and three at-a-glance cards — recent sessions (with Resume), recent plans (with progress chips), and a recent-images strip — all fed by existing endpoints (/api/sessions, /api/campaigns, /api/snapshots, /api/images/{uid}/png). New HomeApp module (mirrors ReviewApp/CampaignsApp); self-inits on load since it's the default tab. - Wizard: no longer auto-pops in the chat on connect — gated behind server.wizard_autorun (default off). 'Start / continue an experiment' opens the agent panel and runs /wizard on demand; the briefing/resolution path is unchanged (wizard_ran still derives from wizard.needed). - Wiring: TABS.HOME; Home is the default-active tab (navbar + panel + state.tab); switchTab lazy-inits HomeApp; #home in the hash-route whitelist; AgentChat now exposes runCommand() so Home can trigger /wizard. - Reuses .panel/.empty-state + theme vars; namespaced .home-* styles, responsive grid, scroll on an inner wrapper (the panel is overflow:hidden). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ding _build_campaign_tree calls get_subcampaigns (reads every campaign.yaml) and get_plan_status -> get_plan_items(include_children) -> _get_campaign_tree_ids (re-scans every campaign.yaml) per node, so listing/opening campaigns re-parsed the same YAML files O(N^2)+ times per request. The new Home tab made /api/campaigns fire on every page load, compounding it. Add a parse cache to _read_yaml keyed by (mtime, size): repeated reads of the same file return a deepcopy of the cached parse instead of re-opening + re-parsing. Auto-invalidated when a file's mtime/size changes (incl. external writes) and explicitly on _write_yaml. Every return is a deepcopy, so callers that mutate raw plan-item lists (update/delete) can't corrupt the cache. Collapses the tree build from O(N^2) YAML parses to O(N) parses + cheap stat/deepcopy. Verified: deepcopy isolation, write- and mtime-invalidation, missing-file. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The home "recent images" card previously read /api/snapshots, which is the in-memory ImageStore for the *current* session only -- so a freshly opened UI showed nothing until the live session captured volumes. Pull from the FileStore on disk instead so it reflects imagery from previous sessions. New routes (gently/ui/web/routes/sessions.py): - GET /api/home/recent-images -- latest projection per embryo across the most-recent sessions. Cheap by construction: session IDs from folder names (no session.yaml parse), embryo IDs from directory names (no embryo.yaml parse), timepoints from a filename glob (no pixel decode), and the walk stops as soon as `limit` images are collected. limit/sessions are clamped on both ends so a crafted query cannot turn this unauthenticated read into a full-disk scan. - GET /api/sessions/{id}/projection -- serves any saved session's JPEG projection, with a component-wise path-traversal guard (the resolved file must be a child of the session dir; not str.startswith, which a sibling like `<dir>_evil` would slip through). Cheap FileStore helpers (gently/core/file_store.py): - recent_session_ids(limit) -- newest-first by folder date prefix, no YAML. - list_embryo_ids(session) -- IDs from directory names, no embryo.yaml read. home.js loadImages() now fetches the aggregator and builds encoded thumbnail URLs, with an in-flight + 15s TTL guard so re-entering the Home tab does not re-walk the disk on every visit. Verified with a synthetic on-disk store (helpers, aggregation ordering + short-circuit + clamps, and the traversal-guard predicate including the sibling-prefix case). Findings from an adversarial review (missing upper-bound clamp, no early short-circuit, per-embryo YAML parse, redundant fetches) are all addressed here. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The /login page was a dead end: once a viewer landed there (via the chat window's "Sign in" button, a bookmark, or a redirect) the only way forward was valid credentials. Viewing is already open to everyone at / (index serves the SPA in view mode; signing in is an *elevation* to control, not a gate), so the login page should offer the same choice. Add a clearly-secondary "Continue without signing in ->" action beneath the Sign in button that drops straight into view-only mode, with a one-line note that you can sign in any time to take control. The subtitle now frames both paths. Only rendered when accounts are configured (otherwise /login already redirects to /). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The aggregator scanned only the newest 6 sessions, but a rig accrues many no-capture/aborted sessions at the head (on this store the 12 newest had zero embryos while older ones held thousands of projections), so the home "recent images" card came up empty even though previous sessions had imagery. Treat the session count as a *scan budget* rather than a hard window: walk most-recent sessions, skip empty ones (one iterdir each, nearly free), and stop as soon as `limit` images are collected. Default budget raised 6 -> 200 (clamped <= 500) so the walk reaches the older image-bearing sessions. Verified against the live store: scan=6 returned 0 images, scan=200 returns 8 across the two most-recent sessions that actually have projections. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The agent panel showed reasoning, tool calls, approvals and pickers in a 384px column with a hard max of min(560, 45vw) -- so even dragging the handle fully, content was capped at 560px and wrapped badly. Widen the defaults: default 384 -> 460px, ceiling min(560,45vw) -> min(760, 60vw) (~half the viewport). Min stays 320px; double-click still snaps to the default; persisted widths are untouched. Default factored into CHAT_DEFAULT_W and mirrored in the CSS fallback. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

0.21 is tagged (v0.21.0) at the 0.21-dev tip. This branch carries the web-first epoch on top of it -- napari retired, web UI + self-managed auth and roles, agent<->perception integration, observable/permissioned autonomy (wake-router), the Home landing tab, and the docked sliding chat panel -- a minor-version's worth of work, so it opens the 0.22 line. pyproject.toml and gently/__init__.py -> 0.22.0.dev0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The Quick Start was stale for the 0.22 web-first line: it told first-timers to install Node and `npm run build` the Ink TUI (retired this epoch), used `pip install -r requirements.txt`, and documented nothing about the web UI or the admin bootstrap. Rewrite for reality: - Prereqs: drop Node/TUI; Python 3.10+, ANTHROPIC_API_KEY, optional GENTLY_STORAGE_PATH. - Setup: `pip install -e .`. - Launch: device layer + launch_gently opens the browser; document --no-browser, --offline, and the banner URL (default http://localhost:8080). - New "First sign-in (accounts)" section: viewing is open, login elevates to control; first run prints a one-time admin password to the console (never logged); how to add users (POST /api/auth/users), recover (delete users.yaml), and GENTLY_NO_AUTH=1 to disable auth. - Fix architecture note: the core store is the file-based FileStore, not GentlyStore (SQLite). Status line 0.11.0 -> 0.22.0.dev0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

pskeshu · 2026-05-30T19:38:24Z

First-run setup & admin password — reviewed, docs added (`f33847d`)

Reviewed the first-timer instructions against the actual code. The README Quick Start was stale for this web-first line — it told newcomers to install Node and npm run build the Ink TUI (retired in this PR), used pip install -r requirements.txt, and said nothing about the web UI or the admin bootstrap. Fixed in f33847dd (new "First sign-in (accounts)" section, web-first launch steps, file-based-store note).

How the admin password works on first run

launch_gently.py → AccountStore.bootstrap_admin_if_empty():

Unless GENTLY_NO_AUTH=1, on startup, if users.yaml has no users, Gently creates a single admin with a random secrets.token_urlsafe(12) password and prints it once in the console banner:
```
First-run admin account created — sign in at the URL above:
    username: admin
    password: <random>
```
The password is console-only — it is never written to the log file (only a PBKDF2-HMAC-SHA256 hash, 200k iterations, lives in <GENTLY_STORAGE_PATH>/auth/users.yaml). The cookie-signing key is auth/secret.key, created on first run.
Add more accounts (roles viewer / operator / admin) via the admin-only POST /api/auth/users.

How to run (first-timer)

pip install -e .
python start_device_layer.py      # device layer (separate process)
python launch_gently.py           # agent + web UI -> http://localhost:8080

--offline (no hardware), --no-browser, and GENTLY_NO_AUTH=1 (disable accounts) as needed.

Gap worth a follow-up

There's no built-in password reset. If the one-time password is missed, the only recovery is deleting auth/users.yaml and restarting (re-bootstraps a fresh admin, clearing all accounts). A --reset-admin / --set-password launcher flag would be a friendlier path — and ties into the auth review items already flagged in the description (fixed admin username, no login rate-limiting, no token revocation). Happy to add that in a follow-up commit if we want it in 0.22.

First-timers increasingly reach for uv, and the project is a standard PEP 621 pyproject so uv works against it with no extra config. Offer both paths in Setup instead of assuming pip/venv: `python -m venv` + `pip install -e .`, or `uv venv` + `uv pip install -e .` (lockfile-free — there's no uv.lock to maintain yet). Add a Launch note so uv users know to either activate .venv or prefix commands with `uv run`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The 10-minute offline guide had the same stale steps as the README: it told readers to `pip install -r requirements.txt` and `npm run build` the terminal UI that this epoch retires. Bring it in line with the web-first reality and mirror the README's dual environment paths (venv+pip or uv), drop the Node prerequisite, and add the uv-run / Windows `set` variants to the launch step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

pskeshu · 2026-05-30T19:41:48Z

Dual `venv` + `uv` setup instructions (`406c87d`, `d6c0082`)

Follow-up to the first-run docs: support both environment managers so newcomers can use whichever they have.

406c87d — README Setup now offers both paths: python -m venv .venv + pip install -e ., or uv venv + uv pip install -e .. Added a Launch note so uv users know to either activate .venv or prefix with uv run.
d6c0082 — the 10-minute offline guide (docs/guides/try-offline.md) carried the same stale steps as the README (pip install -r requirements.txt + npm run build for the retired TUI); brought it to web-first and mirrored the dual venv/uv paths.

Why lockfile-free uv: there's no uv.lock / [tool.uv] in the repo and it's a standard PEP 621 pyproject.toml, so uv venv + uv pip install -e . works as a drop-in faster pip with nothing extra to maintain. If we later want reproducible, pinned environments, a follow-up could add uv lock + uv sync (which commits a uv.lock) — glad to do that if the team wants it in 0.22.

Fills the docs/EVAL.md TODO referenced by gently/eval/__init__.py. Documents the capture/replay substrate that already shipped (EventCapture / EventReplay / ShadowRunner / DecisionLog) and a grounded, incremental design for exercising the *real* wake-router + agent reasoning offline by replaying recorded sessions on a controllable clock. Covers the central wiring gap (replay currently targets a fresh EventBus the agent never subscribes to), four compared approaches (event-stream replay, Perceiver stub, full timelapse re-feed, shadow scoring), honest fidelity limits (recorded perception != new perception, LLM nondeterminism, wake-router coalesce/throttle vs replay clock, wall-clock reads), and a step-by-step build order. Referenced from the PR's "Where we'd especially value help" section. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ans viewer) The Quick Start got users installed and signed in but stopped at "open the URL" — it never led them into actually using the app. Add a walkthrough that takes a first user all the way through the core loop: - open the agent chat (header Agent toggle / Ctrl+J, or Home's Start button + the /wizard setup), - enter plan mode with /plan (agent as scientific collaborator, no hardware), - describe an experiment in plain language -> the agent drafts a campaign of typed plan items with concrete specs, - inspect it in the Plans tab (plan viewer): campaign card -> plan document, item statuses/specs + inspector, doc/board/graph/timeline views, versions. Notes that plan mode works fully offline (--offline), so reviewers can try the talk -> plan -> inspect loop without a microscope — which is exactly the hands-on interface feedback the PR asks for. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

subindevs · 2026-06-01T08:10:03Z

Bug: `_autonomous_active` set before lock acquired — blocks human hardware commands

gently/app/agent.py:1014

run_wake_turn sets self._autonomous_active = True before the generator is iterated, but _turn_lock is not acquired until inside handle_message_stream (line 950). If a user turn is already holding the lock at that moment, the registry backstop at registry.py:435 sees _autonomous_active=True while the user turn is still executing — silently refusing stop_timelapse, remove_embryo, or set_laser_power on a human-directed command.

Failure scenario: operator sends a message that calls stop_timelapse; WakeRouter fires concurrently, sets the flag, then blocks waiting for the lock. The user turn's stop_timelapse hits the backstop and is refused with "irreversible action cannot run autonomously" — no error is shown to the operator.

Fix: move the flag into handle_message_stream behind an autonomous: bool = False parameter — set it after lock.acquire(), clear it in the same finally before lock.release(). We've opened PR #33 with the patch.

…ature

Relicense and update author list

Fix 500 error on every page under Starlette 1.x

- Adopt uv for env + deps; declare gently-perception (sibling repo) via [tool.uv.sources] and add python-dotenv. Refresh README setup/launch docs. - Migrate requirements*.txt into pyproject: drop the redundant requirements.txt, move device accessories (bleak/pyserial/paho-mqtt) to a [device] extra, and document the optional CUDA-torch install (kept opt-in, not a forced default). - Auto-load a project-root .env on startup; OS-aware ANTHROPIC_API_KEY message. - Add --no-api UI-only mode and an immediate startup log line before imports.

Switch environment setup to uv and add offline/UI-only launch

pskeshu and others added 30 commits May 28, 2026 14:09

Bump version to 0.21.0.dev0

cbf4140

Opens the v0.21 development cycle on this branch. Targets per the KANBAN roadmap: cross-session resume, sacrificial vocab alias, campaign template loader (Path B), LDM Phase 1 MVP. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Merge 0.21-dev into worktree-elegant-purring-russell (integrate web-c…

14a62c3

…hat/control-lock/launcher work)

pskeshu and others added 11 commits May 29, 2026 17:42

pskeshu and others added 2 commits May 30, 2026 15:41

pskeshu and others added 2 commits May 30, 2026 15:55

schneidermc marked this pull request as draft May 31, 2026 00:09

Merge main (fix-name hotfix, PR #34) into 0.22-dev

2e36add

schneidermc changed the base branch from 0.21-dev to main June 1, 2026 16:13

schneidermc and others added 4 commits June 1, 2026 14:52

Relicense and update author list

ce8b647

fix(web): use Starlette 1.x TemplateResponse(request, name, ...) sign…

574bc80

…ature

Merge pull request #35 from gently-project/relicense

82ed966

Relicense and update author list

Merge pull request #36 from gently-project/fix/web-templateresponse

c33d033

Fix 500 error on every page under Starlette 1.x

schneidermc force-pushed the 0.22-dev branch from 6f2d092 to c33d033 Compare June 1, 2026 20:07

schneidermc and others added 2 commits June 1, 2026 17:11

Merge pull request #37 from gently-project/feature/env-setup

37ca689

Switch environment setup to uv and add offline/UI-only launch

schneidermc closed this Jun 3, 2026

schneidermc deleted the 0.22-dev branch June 3, 2026 18:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gently 0.22: web-first operator surface, self-managed auth, and observable permissioned autonomy#22

Gently 0.22: web-first operator surface, self-managed auth, and observable permissioned autonomy#22
pskeshu wants to merge 76 commits into
mainfrom
0.22-dev

pskeshu commented May 30, 2026 •

edited

Loading

Uh oh!

pskeshu commented May 30, 2026

Uh oh!

pskeshu commented May 30, 2026

Uh oh!

subindevs commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pskeshu commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What we'd love your feedback on

Where we'd especially value help

Where this is headed

What we wanted

What got made

Web-first migration & napari retirement

Auth, roles & control gating

Agent↔perception integration & permissioned autonomy

Storage, session resume & performance

UI surfaces: Home, chat panel, login, imaging

What needs review

Try it

Notes

Uh oh!

pskeshu commented May 30, 2026

First-run setup & admin password — reviewed, docs added (f33847d)

How the admin password works on first run

How to run (first-timer)

Gap worth a follow-up

Uh oh!

pskeshu commented May 30, 2026

Dual venv + uv setup instructions (406c87d, d6c0082)

Uh oh!

subindevs commented Jun 1, 2026

Bug: _autonomous_active set before lock acquired — blocks human hardware commands

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pskeshu commented May 30, 2026 •

edited

Loading

First-run setup & admin password — reviewed, docs added (`f33847d`)

Dual `venv` + `uv` setup instructions (`406c87d`, `d6c0082`)

Bug: `_autonomous_active` set before lock acquired — blocks human hardware commands