UX v2 stack: shared lab notebook (incr 1 & 2) + imaging Run button by pskeshu · Pull Request #58 · gently-project/gently

pskeshu · 2026-06-16T22:49:36Z

Consolidates the full UX-v2 feature set on top of #43 — supersedes #53 (which had only the notebook). Branch integration/ux2-all is also the branch used for end-to-end testing, so what's reviewed here is exactly what runs.

Shared lab notebook (Increments 1 & 2) — concept trace #52

Model + store: unified Note (3 kinds: Observation/Finding/Question; author/status/confidence/scope/links/artifacts as orthogonal fields) + file-backed NotebookStore (atomic YAML, rebuildable reverse-indexes, scope queries, append-only link/supersede)
Producer bridge: apply_updates mirrors observations & learnings into the notebook; FileContextStore.notebook
Read API: /api/notebook/notes (filter by kind/author/status/scope/limit), /notes/{id}, /threads
Notebook tab (Library) + Agent's-View live edge (Home)
Ask the notebook: select_notes + forced-tool grounded synthesis (cited, "not in notebook" valid, no self-rated confidence) + POST /api/notebook/ask + the Ask UI box — verified live against Opus 4.8

Imaging

Run button on actionable imaging plan items → routes through the agent (execute_plan_item) to apply the item's spec and start

Dep

Includes the opencv-python-headless declaration (also going to development via deps: declare opencv-python-headless (cv2) as a runtime dependency #55) so detection deps are present.

Tests

~39 notebook tests (tests/test_notebook_*.py); design doc + plans under docs/superpowers/.

Design doc: docs/superpowers/specs/2026-06-16-shared-lab-notebook-design.md

🤖 Generated with Claude Code

The header pill, the home landing line, and the agent dock each computed connection state from their own signal at their own time. home.js read state.connected exactly once at tab init — before the /ws handshake — and never corrected, so the landing showed "Offline — start the agent to connect" while the header pill showed "Online" and state.connected was true. Add a single sticky ConnectionStatus store (status-store.js) holding three distinct signals — gentlyConnected (/ws), microscopeConnected (/api/device-status poll), agentConnected (/ws/agent) — which replays its current snapshot to every new subscriber, so a late subscriber can never miss the initial state (the root of the bug). All three surfaces now read from / write to this store: - websocket.js onopen/onclose -> setGently (via updateGentlyStatus) - app.js fetchDeviceStatus -> setMicroscope; header renders via subscriber - home.js updateStatus reads the store and re-renders on every change - agent-chat.js setConn -> setAgent Verified live: after reload the home line and header pill agree (no more "Offline while Online"); no console errors. Bug gently-project#3 (idle event-count inflation) needs no code change: the high-frequency telemetry (DEVICE_STATE_UPDATE/BOTTOM_CAMERA_FRAME) is already excluded from the events table + count at websocket.js, and idle measurement showed the count is calm and dominated by LOG_RECORD. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Adds UISettings.ux_v2 (env GENTLY_UX_V2, default off) and threads it into the index.html template context from pages.py. This is the coexistence switch for the agent-first UX: the v2 markup/JS will mount only under this flag, so the v1 dashboard stays the default and prod is unaffected while the migration soaks behind the flag. No behaviour change yet (flag off by default; nothing reads it client-side until the Phase 1 dual-render lands). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The agent's structured asks (choice_request / choice_response over /ws/agent) already ARE the one-payload protocol. This makes the SAME ask render both in the chat transcript and prominently on a new main-stage surface (#ask-stage), behind GENTLY_UX_V2 — the foundation for the agent-first paradigm. Frontend only; no wire-protocol or backend change. The double-answer and turn-wedge concerns are handled without touching the server: - agent-chat.js renderChoice is factored into a pure buildAskCard(data, {reqId, isWake, hasControl, onPick}) reused by both surfaces; exported alongside answerChoice + a hasControl getter. - A module-level answeredAsks Set keyed by request_id makes answering idempotent across both surfaces (only ONE choice_response is ever sent), so the existing holder-gate + _choice_futures.pop on the server stay correct and never see a duplicate. - The CLEAR signal fires off the CHOICE lifecycle: answerChoice emits ASK_CLEARED{request_id} the instant a response is sent (NOT stream_end, which lands after the answer for in-turn asks and never for a cancelled turn). Both surfaces clear on it; '*' clears all on cancel/error/socket-drop. - Read-only when !hasControl on both surfaces (observers can't answer), matching the server's holder gate — no dismiss-without-answer path, so asend can't wedge. - Adds the free-text "Something else…" escape the web cards lacked (the bridge routes unknown selections to LLM resolution). ask-stage.js (new) renders the current ask into #ask-stage via AgentChat.buildAskCard and clears on ASK_CLEARED; no-ops unless #ask-stage is present (flag off → v1 untouched). Verified: node --check on all touched JS, Jinja parse on index.html. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Replaces the flat 8-tab bar with a calm grouped left rail (Now / Library / System) and adds a session-context strip at the top of the main area — the structural transformation toward the prototype's shell. All scoped under body.ux-v2; v1 markup/CSS untouched (no consolidation of the known duplicate .tab rulesets — deferred to the final cleanup phase). - shell.js (new): wires each rail item to switchTab(tabId) — it ROUTES through the single init chokepoint, never reimplements tab activation, so every tab's lazy-init side-effect still fires. Keeps the rail's active state in sync via a new TAB_CHANGED event; populates the strip's status/embryo count from the Phase 0 ConnectionStatus store. Wires the rail's "Talk to Gently" to the existing AgentChat dock. No-ops unless body.ux-v2 is present. - app.js: switchTab now emits TAB_CHANGED(tabName) — additive; v1 has no listener, so no behaviour change. - index.html: the rail (first child of the flex-row .app-shell) + the strip (top of .app-main) + shell.css/shell.js includes, all under {% if ux_v2 %}. - shell.css (new): rail + strip styling and a subtle unfold animation, every rule scoped under body.ux-v2. Deferred to keep this phase low-risk: History-API routing and the session_changed in-place re-hydration (current hash routing + reload still work). Verified: node --check on all touched JS, Jinja parse on index.html. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…nce) Flips plan mode from ASK-FIRST to INFERENCE-FIRST: the agent arrives with a draft instead of interrogating the researcher. Per Keshu's call, the genotype -> imaging-channel inference is done by the MODEL (it reads the reporters and knows fluorophore spectra) — NOT a hardcoded heuristic table, which would only cover a fraction of real fluorophores/dyes and force needless "asks". - prompt.py: the stance now says infer what you can (channels from the strain genotype via your own fluorophore knowledge, organism defaults, lab/campaign context), record each inferred value's source + confidence in the spec's provenance, state a wavelength only when confident (else mark low-confidence and confirm via ask_user_choice — never guess a number), and ask ONLY for genuine gaps / low-confidence / consequential choices. - model.py: ImagingSpec gains a `provenance` map (field -> {source, confidence}). It's a valid dataclass field, so it flows end-to-end with no extra plumbing: the model passes it in create_plan_item(spec=...), the store rebuilds it via ImagingSpec(**valid-fields), and it round-trips through serialization. Backend only; needs a server restart to load the new prompt. The actual inference behaviour is validated live in plan mode (see handoff). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…in the UI Makes Phase 3a's inference visible. renderSpec now shows the channel (laser_wavelength_nm) and reporter/genotype rows — which it omitted before — and tags each value with its provenance ("561 nm · inferred · medium") read from spec.provenance, so the researcher can see what was inferred vs. cited and what to confirm. - agent-chat.js renderSpec: keyed rows (label, value, fieldKey) with a small source/confidence tag per row when spec.provenance carries that field; adds Genotype/Reporter/Channel rows. - bridge.py: the spec payload builder now includes genotype/reporter/ laser_wavelength_nm and the provenance map (was a curated subset that dropped the channel), so the UI has the data to render. - ask-stage.css: styling for the .ac-spec-src provenance tag. Frontend + a contained backend payload enrichment; node --check + bridge import verified. Note: this surfaces provenance wherever the spec panel is shown and in the plan document (which already serializes provenance); threading it through every spec-emission path (e.g. the apply_plan_acquisition_spec stash) and a full plan_confirm ask with inline edit/confirm is the remaining 3b polish. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…d, live) Renders the agent's expectations (beliefs), watchpoints (attention), and open questions (uncertainty) as a calm panel on the landing, updating live and resolvable by the control holder. The "store has no event bus" blocker is solved via the EXISTING global bus rather than dependency injection (matches agent.py's `emit` usage, so no __init__/launch_gently changes): - core/event_bus.py: new EventType.CONTEXT_UPDATED. - file_store.py: FileContextStore._notify_context_change() emits it (lazy import, best-effort) from add/resolve of expectations, watchpoints, and questions. The server already broadcasts ALL bus events to /ws (subscribe_async("*")), and websocket.js re-emits them on ClientEventBus — so the surface refreshes live with zero new transport. Verified: the emit fires on the bus (unit check). - routes/context.py (new, registered): GET /api/context (read the 3 lenses, defensive on cold start) + POST .../{id}/resolve for questions/watchpoints/ expectations, each gated by Depends(require_control) (data.py pattern, NOT the mesh-scoped campaigns auth) so viewers can't mutate the agent's mind. Read side reuses campaigns._serialize. - context-surface.js (new, ux_v2 only): fetches /api/context, renders the three lenses, re-fetches on CONTEXT_UPDATED + AGENT_CONTROL, and lets the holder answer a question (inline input — no native prompt), resolve a watchpoint, or confirm an expectation. shell.css: scoped styling. index.html: panel mounted at the top of the home landing under {% if ux_v2 %}. Backend needs a server restart to load; live push + render is validated in-app. Proactive #ask-stage cards from watchpoint creation are the remaining Phase-4 polish (noted). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The context surface hid itself entirely when the agent had no expectations/ watchpoints/questions — so on a fresh session it was invisible and read as "missing". Render a calm "nothing yet" empty-state instead, so the surface is discoverable before the agent has formed any beliefs. Static JS/CSS only — hard-reload to pick it up, no restart. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

When the viz server can't bind its port, the error now tells you how to free it (fuser -k <port>/tcp, or lsof -ti | xargs kill) instead of just "close it first" — uses self.port so it's always the right port. (Standalone DX fix.) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

… data The Experiment view's job is to show the live experimental TACTIC patterns (cadence: base/fast/burst/cooldown + reactive-monitoring rules). It was falling back to a ~130-line STUB_STRATEGY (and a "mockup · stubbed data" badge) whenever there was no active experiment or the fetch wasn't ready — i.e. production could render fake tactics. - Removed the STUB_STRATEGY const entirely. - loadStrategy() now returns null on non-OK / error (no stub fallback). - render(null) shows a calm empty state ("No active experiment — the imaging tactics will appear here once a run is live"), never fabricated data. - Removed both "mockup · stubbed data" badges; the header just shows "live". Affects v1 and v2 (the Experiment tab isn't flag-gated) — removing fake data is correct for both; it only changes the no-active-experiment case (stub → empty state), real live runs render as before. node --check clean. NOT done (deliberately): the large carve of a new per-embryo renderer out of the 4,556-line embryos.js — the existing ExperimentOverview already renders the tactic patterns from /api/experiments/current/strategy, and the detailed contents are yours to define. The reconcileWithServerState/clearAllState contract in embryos.js is left untouched. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

v2 is now the default UI — no env var needed. The v1 dashboard stays reachable as a fallback via GENTLY_UX_V2=0 (and its markup is NOT deleted yet). This is the reversible "flip → soak" step; the irreversible v1 markup/CSS deletion is deferred until v2 has run as the default and is confirmed good. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Add the entry landing the prototype sketched — agent orb, time-aware greeting, and choice cards (Plan / quick look / free-text escape), behind GENTLY_UX_V2 and receding into the workspace. Crucially, the plan dialogue renders IN the landing, not the chat REPL: 'Plan an experiment' switches to an in-place plan-wizard screen, enters /plan, and renders the agent's ask_user_choice questions as button cards there (reusing AgentChat.buildAskCard), with the plan assembling from each pick. agent-chat.js: runCommand is now connection-aware (connect + queue/flush on open) so the page drives the agent without opening the chat panel. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Self-contained landing prototype (ux-prototype/landing.html) and the 8-phase migration plan it was built from. A sketch space, kept separate from the live frontend. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Convert hatching.py and the verifier's four challenger strategies from JSON-in-prose / startswith-scraping with silent defaults to forced tool_choice — the verdict arrives as a validated dict on the tool_use block. Deletes the regex/parse layer; downstream vote-tally/consensus is untouched. Also drop self-rated confidence from these schemas (a heuristics-era artifact — the boolean/categorical judgment is the signal); the ensemble's derived agreement ratio stays. docs/HEURISTICS-AUDIT.md ranks the remaining candidates and the keep-deterministic boundary. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…ight bind uvicorn binds with SO_REUSEADDR; the preflight check did not, making it stricter than the server it guards — a just-exited instance leaves client sockets in TIME_WAIT that fail a bare bind() even though uvicorn would bind fine, so quick restarts hit 'port in use' repeatedly. Set SO_REUSEADDR on the preflight so it fails only on a genuine live listener. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

On Linux the Windows default GENTLY_STORAGE_PATH (D:\Gently3) gets created literally as ./D:/ under the repo, full of logs/sessions. Ignore it so it stops cluttering status and can't be committed by accident. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Polish + flow fixes for the agent-first landing and in-page plan wizard. - Kill the welcome→plan "lift-up" lurch: top-anchor both screens to one shared offset so the orb no longer teleports ~140px on swap (now ~1px), with a single coordinated cross-fade keyframe. - Fix broken dark mode: define the tokens landing.css relied on but the theme never set (--bg, --text-secondary, --accent-soft, --accent-green-soft), scoped to body.ux-v2 with a light override; route the page background, drift glow, and accent-keyed shadows through real per-theme tokens. - A11y + polish: visible :focus-visible rings, animated tool-card reveal (grid-rows), feed scrolls internally with anchored header/footer, single- column mobile without a nested-scroll trap, consistent type scale + 4px spacing, ~40px touch targets, aria-expanded on disclosures, expanded prefers-reduced-motion coverage. - Entry flow: under ux_v2 the landing owns session entry, so suppress the legacy connect-time resolution picker server-side (it duplicated and contradicted the landing's Plan/Standalone choice). Guard the design kickoff so it fires once per session (no Back/forward pile-up). - "Plan an experiment" now offers continue-vs-fresh when an active campaign exists: continue it (default) or start a brand-new campaign from scratch. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

agent-chat.js mirrors the agent stream onto a new AGENT_ACTIVITY event (turn/thinking/text/tool_start/tool_result/turn_end/error) so the plan wizard can render the agent's work as collapsible tool cards instead of leaving it in the chat. Replaces the inline-only mdToHtml with a block-aware, escape-first GFM renderer (headings, pipe tables, lists, fenced code, links — XSS-safe and streaming-safe) and exports it; agent-chat.css styles the ac-md-* output for both the chat and the wizard feed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…allback settings.py tiers: main→claude-fable-5, perception+medium→claude-opus-4-8, fast→claude-sonnet-4-6, plus a refusal_fallback (claude-opus-4-8). Strip the params the new models reject: thinking budget_tokens → output_config.effort (conversation.py, sam_detection.py); drop the obsolete interleaved-thinking beta header (agent.py). conversation.py: a main-tier 400 (e.g. Fable 5 under <30-day org data retention) OR a stop_reason='refusal' transparently retries the turn on Opus 4.8 in both the streaming and non-streaming paths; get_tool_call guards empty refusal content; dopaminergic detector guarded. chat.py model centralized to settings. Quiet log noise: the per-response diagnostic WARNING→DEBUG (it fired on every tool-use turn), and the benign send-after-close websocket WARNING→DEBUG. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…tooling Rebasing feature/ux-v2 onto development (now carrying gently-project#47's ruff/format gate) left the model-migration and detector-rewrite files in their pre-gently-project#47 formatting. Run ruff --fix + ruff format and fix the violations that aren't auto-fixable: - model.py: provenance annotation used unimported `Dict` (F821, would NameError at import) -> `dict[str, dict[str, str]]` (PEP 585). - sam_detection._detect_with_sam: returned undefined `image_8bit` (F821) -> `image_rgb`, the 8-bit RGB image computed at the top. - agent.py / sam_detection.py: moved the `logger = ...` assignment below the import block to clear E402 (import execution order unchanged). - verifier.py / conversation.py: wrapped long log/summary f-strings and tool-schema descriptions; reflowed prompt prose (content preserved) to satisfy E501. ruff check . and ruff format --check . both pass (lint.yml CI gate). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Fable 5 was declining benign planning turns (stop_reason="refusal"), which tripped the refusal fallback on essentially every turn — adding a full extra round-trip of latency per message. Point MODEL_MAIN at Opus 4.8 so the common path is a single call; refusal_fallback is now inert (the guard skips it when fallback == main). Set MODEL_MAIN=claude-fable-5 to switch back once Fable 5 stops refusing. Prune the now-stale Fable-5 notes from the tier docstring. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Stream a bounded result_full alongside the 140-char result_summary so the web UI's expandable tool card can show what a tool actually returned, not just the one-line preview. Frontend (landing.js/css) renders the expandable card; the thinking indicator's label is wrapped in a span so it can be updated live. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Live digital-twin of the addressable imaging volume: the acquisition cuboid + light-sheet plane, driven by a new SCAN_GEOMETRY_UPDATE backend signal (emitted from acquire_volume, bootstrapped via /api/devices/scan_geometry) and live DEVICE_STATE_UPDATE positions. Sits as the Map / Details / 3D view switcher inside the Devices tab (not a top-level nav tab). Includes a demo driver for offline development. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Flow/IA audit (not visual) from a live click-audit with the agent on, plus code cross-check. Prioritized findings + fixes: P0 loading-state legibility (stream thinking summary — set thinking display:summarized + handle thinking_delta), first-character truncation in the plan feed, control/auth wall hidden in chat, double ask-mount, ASK_CLEARED never emitted, no path back to welcome, non-stateful routing, resume hard-reload, and the workspace-IA placement of the 3D optical-space view. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The streaming path collected the entire turn before yielding anything, so the plan wizard sat on a static "working…" spinner for the whole turn (~90s). And the streamed call requested no thinking, so there was no reasoning to show even if it had streamed. - conversation.py: rewrite call_claude_stream to stream live — a worker thread drains the SDK stream and pushes events onto an asyncio queue as they arrive; the coroutine yields text/thinking deltas in real time. Enable adaptive thinking with display="summarized" (+ effort=medium) and emit thinking_delta as {"type":"thinking"} chunks. Full assistant content (incl. thinking blocks) is still replayed from final_message, so the tool loop stays valid. Retry / 400-fallback / refusal-fallback preserved (clean while nothing's been yielded). - agent-chat.js: forward the thinking text on the 'thinking' activity. - landing.js: render streamed reasoning as a dim block in the plan feed and add an elapsed-time counter to the thinking indicator so a long think reads as progress, not a hang. Verified live (Opus 4.8): reasoning + prose stream into the plan feed during the turn with a ticking timer; no console errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… strings The model often serializes nested tool args (spec, references) as JSON strings instead of objects. create_plan_item stored the raw string, and read-back via _dict_to_plan_item did spec_data.items() on a str → AttributeError, leaving a malformed, unreadable plan item persisted. - planning.py: _coerce_plan_args() parses string spec/references (and int-casts estimated_days) in both create_plan_item and update_plan_item before storing. - file_store.py: _dict_to_plan_item tolerates spec/references persisted as JSON strings (parse on read; fall back cleanly on garbage), so existing bad items load instead of crashing. Verified: string spec/refs hydrate to ImagingSpec/list; malformed spec → None (no crash). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…rompt The plan-mode prompt specified what to design in detail but nothing about how to write to the user, so (with Opus 4.8's stronger narration) the agent produced dense questions, paragraph-long ask_user_choice options, and over-explained prose — cognitively heavy for a working biologist. Add an explicit communication-style section: lead with the ask/finding, short questions + short options, plain words over process jargon, one-clause rationale (full reasoning goes to provenance/references), one idea per message. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A turn's tool calls were always awaited one-by-one, so several independent lookups (e.g. search_strains for dlg-1 then ajm-1) ran serially even when the model could issue them together. Add a concurrency fast-path: when EVERY tool in a turn is non-hardware (requires_microscope=False) and non-interactive (not ask_user_choice), fire their tool_start events, asyncio.gather the executions, then emit results. Any microscope action or ask_user_choice in the batch falls back to the existing serial path, so hardware is never raced and interactive prompts/stateful ordering are preserved. Also nudge the plan-mode prompt to batch independent lookups into one turn so the model actually produces parallelizable tool calls. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

phase_number arrived as "1" (stringified, like spec/references), so get_nth_subcampaign did `1 <= "1"` → TypeError. Coerce phase_number/phase_order to int in the create tool, and make get_nth_subcampaign tolerant of a numeric string. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…otebook

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Thread rail + kind filter + note cards (color-coded by kind, with author, status, and scope chips), consuming /api/notebook. Wired into the v2 rail and the legacy tab bar, lazy-init via switchTab, live-refresh on CONTEXT_UPDATED. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ook tab

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ounded synthesis

Question field -> POST /api/notebook/ask, rendering the grounded answer with a coverage badge, cited 'Why' points (note-id chips), and 'Try next' steps. Asks within the selected thread scope. Verified live against Opus 4.8.

Adds a context-placed '▶ Run this imaging item' button to the plan-item inspector (imaging + planned only). Routes through the agent (AgentChat.runCommand) to execute_plan_item, which applies that item's ImagingSpec and starts the timelapse — agent-first, so it stays in the loop and can confirm/adjust. First slice of the multi-entry-point imaging triggers (workspace 'embryos mounted' CTA + resolution-picker continuation to follow).

…button) Test branch combining all three feature branches for end-to-end testing on the production machine. Not for merge — review happens on the individual PRs.

cv2 is imported unguarded by sam_detection, the device layer, analysis/steps, and video_maker, but was never declared — so a clean install breaks detection/ analysis with 'No module named cv2'. headless avoids the libGL.so requirement of full opencv-python on headless servers/agents. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…es LED Two fixes from live testing on integration/ux2-all: 1. get_stage_position() in the client parsed the read_stage_plan event with a stale key list (["XY:31", "xy_stage", "stage"]) that never matched the device's actual key ("XYStage:XY:31"/"xy_stage_position"), so every read raised "Failed to read stage position" (also broke view_image, which reads the stage). Align the key list with the one the device layer's own handle_detect_embryos already uses. 2. The bottom camera no longer drives the LED at all. trigger() ignores the persistent use_led flag and captures under room light only, so no caller (manual marking, detection, live preview) can flash the LED. capture_for_marking no longer requests use_led=True, and handle_detect_embryos turns the room light on before capturing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Regenerate uv.lock for the opencv-python-headless runtime dep declared in pyproject (6eac3e6). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

When an agent tool call passed embryo_ids as "embryo_1,embryo_2,..." (a string) instead of a JSON list, start() iterated the string character by character and reported every letter as a missing embryo ("Embryos not found: ['e','m','b','r','y','o',...]"). Coerce a string into a list by splitting on commas before the membership check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The web canvas → server path fabricated a zero-padded id (f"embryo_{number:03d}") whenever a marker arrived without an embryo_id, producing ids like "embryo_002". Everything else — detection_tools, the orchestrator, all tool examples — uses the unpadded f"embryo_{n}" form, so the padded ids never matched the stored "embryo_2" and surfaced as "Embryos not found" in start_adaptive_timelapse. Emit unpadded ids here too. The legacy SQLite store (core/database.py) still uses :03d but is retired/read-only; the state.py lookup shim that tries both formats stays as a safety net for any already-stored padded ids. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ruff check . and ruff format --check . both clean.

…ux2-all # Conflicts: # .gitignore

…tebook (connection C) The agent's first notebook *write* tool (memory_tools had only recall_*). When the user says 'note that…', the agent tidies the phrasing and saves a human-authored Note tagged to the current session (+ embryos/strains), then emits CONTEXT_UPDATED so the Notebook tab and Agent's-view live edge refresh. Reliable (human content), no perception guesswork; the agentic session-summariser is deferred. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… (connection A, live-UI half) Plan-item/campaign writes (create/update/complete item, session-campaign link, progress) now emit PLAN_UPDATED; the server already broadcasts it to /ws, and campaigns.js subscribes + re-fetches the open campaign (debounced, preserving the selected item). Fixes 'agent changed the plan but the UI didn't move'. The execute_plan_item link-ordering fix (overlaps the production branch) is separate.

…connection A, link-fix) execute_plan_item linked before the session existed (agent.session_id was None) and swallowed errors → running from a spec never linked. Now: start first, then link (re-read session_id), append via link_plan_item_session, surface failures. Same item-level link added to attach_session_to_plan. PlanItem gains session_ids (an item can run many sessions: re-runs/multi-sitting); session_id kept as the latest for back-compat. All link writes emit PLAN_UPDATED (live UI).

…ble render Connection B: the run-mode awareness summary now carries the investigation goal, phase, and what's next (incl. decision-point flag) — not just the active item's spec sheet — so the agent runs as a scientist inside the experiment. Laser power was always in the brief/inspector; it just wasn't *set* in plans (connection D will let you fill it). Also fixes renderSpecTable showing 'provenance [object Object]' — nested objects are now skipped.

Closes the laser-power loop. The inspector was read-only and only surfaced fields that were already set, so a TBD value like laser power had no way in. - PATCH /api/campaigns/{cid}/items/{item_id} — edits item fields and merges partial imaging-spec updates (send one field, keep the rest; empty string clears). Routes through update_plan_item, which fires PLAN_UPDATED, so the edit shows live (connection A). - campaigns.js: ✎ Edit toggles the spec into a form listing every fillable field — empty ones included and flagged — PATCHes changed fields on Save, then re-fetches. Mid-edit guard so a live refresh doesn't clobber unsaved input. - tests: PATCH route (edit, spec-merge, clear, PLAN_UPDATED, 400/404). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

pskeshu and others added 30 commits June 16, 2026 04:18

Add UX v2 landing screenshot for the PR

0457975

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

pskeshu and others added 19 commits June 16, 2026 16:39

feat(notebook): apply_updates mirrors observations & learnings into n…

9224b0f

…otebook

plan: notebook read API

03ea7c6

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(notebook): read API — GET /api/notebook/notes + /notes/{id}

7d51e20

feat(notebook): read API — GET /api/notebook/threads with counts

1a0121d

plan: notebook live edge (Agent's view recent notes)

1c87896

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(notebook): limit param on GET /api/notebook/notes

9b28fa8

feat(notebook): Agent's-view live edge — recent notes section → Noteb…

b643972

…ook tab

plan: ask the notebook (increment 2 backend)

814e322

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(notebook): ask backend — select_notes retrieval + forced-tool gr…

8928084

…ounded synthesis

feat(notebook): POST /api/notebook/ask — grounded notebook Q&A

13b4dd2

feat(notebook): 'Ask the notebook' box on the Notebook tab

c7be82a

Question field -> POST /api/notebook/ask, rendering the grounded answer with a coverage badge, cited 'Why' points (note-id chips), and 'Try next' steps. Asks within the selected thread scope. Verified live against Opus 4.8.

integration: ux-v2 + memory-model (notebook) + imaging-triggers (run …

71ac0ea

…button) Test branch combining all three feature branches for end-to-end testing on the production machine. Not for merge — review happens on the individual PRs.

chore(deps): lock opencv-python-headless (uv.lock sync)

9abcf6e

Regenerate uv.lock for the opencv-python-headless runtime dep declared in pyproject (6eac3e6). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

pskeshu mentioned this pull request Jun 16, 2026

Memory model: the active, shared lab notebook (Increments 1 & 2) #53

Closed

lint: wrap long lines + ruff-format notebook modules (E501)

68a7833

ruff check . and ruff format --check . both clean.

pskeshu changed the base branch from feature/ux-v2 to development June 16, 2026 22:57

pskeshu mentioned this pull request Jun 16, 2026

UX v2: agent-first entry paradigm, shared-visibility surface, and model migration #43

Closed

5 tasks

pskeshu and others added 7 commits June 17, 2026 10:29

Merge remote-tracking branch 'upstream/development' into integration/…

31ba4b1

…ux2-all # Conflicts: # .gitignore

lint: ruff-format file_store.py (fixes CI format check on integration)

a3b649c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UX v2 stack: shared lab notebook (incr 1 & 2) + imaging Run button#58

UX v2 stack: shared lab notebook (incr 1 & 2) + imaging Run button#58
pskeshu wants to merge 71 commits into
gently-project:developmentfrom
pskeshu:integration/ux2-all

pskeshu commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pskeshu commented Jun 16, 2026

Shared lab notebook (Increments 1 & 2) — concept trace #52

Imaging

Dep

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant