Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
4530ab1
UX v2 Phase 0: unify connection status across the three indicators
pskeshu Jun 11, 2026
4571a58
UX v2 Phase 1 (scaffold): add GENTLY_UX_V2 feature flag
pskeshu Jun 11, 2026
17e66cb
UX v2 Phase 1: dual-render agent asks (chat transcript + main stage)
pskeshu Jun 11, 2026
52503e7
UX v2 Phase 2: grouped rail nav + session-context strip (behind flag)
pskeshu Jun 11, 2026
288f97b
UX v2 Phase 3a: inference-first plan mode (model-driven, with provena…
pskeshu Jun 11, 2026
e947506
UX v2 Phase 3b: surface inferred imaging spec + per-field provenance …
pskeshu Jun 11, 2026
2387225
UX v2 Phase 4: co-editable shared-visibility surface (the agent's min…
pskeshu Jun 11, 2026
e8ce2bf
UX v2 Phase 4 follow-up: show an empty-state for the agent's-view panel
pskeshu Jun 12, 2026
c98a33f
Put the free-the-port command in the "port in use" error
pskeshu Jun 12, 2026
927900a
UX v2 Phase 5: Experiment view shows real tactics only — drop stubbed…
pskeshu Jun 12, 2026
f69bc4e
UX v2 Phase 6 (step 1): flip GENTLY_UX_V2 default ON, keep v1 fallback
pskeshu Jun 12, 2026
e4de3a1
UX v2: agent-first landing → in-page plan wizard (flag-gated)
pskeshu Jun 12, 2026
37bc534
Add UX v2 entry-paradigm prototype + migration plan (design reference)
pskeshu Jun 12, 2026
6c6bc70
Detectors: forced-tool structured output for hatching + verifier
pskeshu Jun 12, 2026
4910d06
Fix recurring port-8080 false positive: SO_REUSEADDR on the viz prefl…
pskeshu Jun 12, 2026
5ee9e32
chore: gitignore the stray D:/ storage dir
pskeshu Jun 12, 2026
a8fd3e5
UX v2 landing: fix welcome→plan jump, dark mode, and the entry flow
pskeshu Jun 12, 2026
1cefe2f
UX v2: agent-activity events + GFM markdown rendering
pskeshu Jun 12, 2026
64f9338
Models: migrate to Fable 5 / Opus 4.8 / Sonnet 4.6 with refusal+400 f…
pskeshu Jun 12, 2026
0457975
Add UX v2 landing screenshot for the PR
pskeshu Jun 12, 2026
b526b23
lint: conform UX v2 + model-migration code to #47 ruff tooling
pskeshu Jun 14, 2026
ec64daf
Models: revert main tier from Fable 5 to Opus 4.8
pskeshu Jun 14, 2026
d5dac56
UX v2: expandable tool cards show full (bounded) tool results
pskeshu Jun 15, 2026
a26ee99
WIP: 3D optical-space view in the Devices tab
pskeshu Jun 15, 2026
91ec6d2
docs: UX v2 interaction-flow / IA audit
pskeshu Jun 16, 2026
9edb766
UX v2 P0: live-stream the agent turn + show reasoning during the wait
pskeshu Jun 16, 2026
f209874
Fix create/update_plan_item crash when spec/references passed as JSON…
pskeshu Jun 16, 2026
144d8dd
UX v2: add a concision/communication-style section to the plan-mode p…
pskeshu Jun 16, 2026
d061e9d
UX v2: run independent read-only tool calls concurrently
pskeshu Jun 16, 2026
a779901
Fix create_plan_item crash when phase_number is a string
pskeshu Jun 16, 2026
06741eb
Restrict tool concurrency to read-only tools; nudge batched plan-item…
pskeshu Jun 16, 2026
faee181
UX v2: plan-done state + restructured THE PLAN panel
pskeshu Jun 16, 2026
f6f2ba3
UX v2: drop wrap-up reasoning litter from the plan feed
pskeshu Jun 16, 2026
b8df610
UX v2: export-plan button replaces the end-of-plan prose upsell
pskeshu Jun 16, 2026
dedc418
Design doc: active shared lab notebook (memory model)
pskeshu Jun 16, 2026
e89161f
plan: notebook foundation (Note model + NotebookStore) — increment 1 …
pskeshu Jun 16, 2026
60cd40a
feat(notebook): unified Note model + dict serialization
pskeshu Jun 16, 2026
d15607d
feat(notebook): NotebookStore write_note/get_note with atomic YAML
pskeshu Jun 16, 2026
5225811
feat(notebook): rebuildable reverse-indexes by strain/embryo/thread
pskeshu Jun 16, 2026
9ba23c9
feat(notebook): query_notes by kind/author/status/scope
pskeshu Jun 16, 2026
458810a
feat(notebook): link_notes + supersede_note (append-only history)
pskeshu Jun 16, 2026
fed6821
plan: notebook producer bridge (apply_updates -> notebook)
pskeshu Jun 16, 2026
9ca5593
feat(notebook): Observation/Learning -> Note converters
pskeshu Jun 16, 2026
19c2ff4
feat(notebook): FileContextStore.notebook lazy property
pskeshu Jun 16, 2026
9224b0f
feat(notebook): apply_updates mirrors observations & learnings into n…
pskeshu Jun 16, 2026
03ea7c6
plan: notebook read API
pskeshu Jun 16, 2026
7d51e20
feat(notebook): read API — GET /api/notebook/notes + /notes/{id}
pskeshu Jun 16, 2026
1a0121d
feat(notebook): read API — GET /api/notebook/threads with counts
pskeshu Jun 16, 2026
0760578
feat(notebook): Notebook tab (read view) in LIBRARY rail
pskeshu Jun 16, 2026
1c87896
plan: notebook live edge (Agent's view recent notes)
pskeshu Jun 16, 2026
9b28fa8
feat(notebook): limit param on GET /api/notebook/notes
pskeshu Jun 16, 2026
b643972
feat(notebook): Agent's-view live edge — recent notes section → Noteb…
pskeshu Jun 16, 2026
814e322
plan: ask the notebook (increment 2 backend)
pskeshu Jun 16, 2026
8928084
feat(notebook): ask backend — select_notes retrieval + forced-tool gr…
pskeshu Jun 16, 2026
13b4dd2
feat(notebook): POST /api/notebook/ask — grounded notebook Q&A
pskeshu Jun 16, 2026
c7be82a
feat(notebook): 'Ask the notebook' box on the Notebook tab
pskeshu Jun 16, 2026
ecd5239
feat(imaging): Run button on actionable imaging plan items
pskeshu Jun 16, 2026
71ac0ea
integration: ux-v2 + memory-model (notebook) + imaging-triggers (run …
pskeshu Jun 16, 2026
6eac3e6
deps: declare opencv-python-headless (cv2) as a runtime dependency
pskeshu Jun 16, 2026
0d01229
fix(dispim): correct stage-position read keys; bottom camera never us…
pskeshu Jun 16, 2026
9abcf6e
chore(deps): lock opencv-python-headless (uv.lock sync)
pskeshu Jun 16, 2026
33ce5b0
fix(timelapse): accept comma-separated embryo_ids string in start()
pskeshu Jun 16, 2026
abb11fb
fix(web): emit unpadded embryo ids to match the live convention
pskeshu Jun 16, 2026
68a7833
lint: wrap long lines + ruff-format notebook modules (E501)
pskeshu Jun 16, 2026
31ba4b1
Merge remote-tracking branch 'upstream/development' into integration/…
pskeshu Jun 17, 2026
d2d61b8
feat(notebook): record_note tool — agent writes human notes to the no…
pskeshu Jun 17, 2026
83b825f
feat(plans): live plan UI — PLAN_UPDATED event + campaigns.js refresh…
pskeshu Jun 17, 2026
e1f698e
fix(plans): reliable session↔item linking, one item ↔ many sessions (…
pskeshu Jun 17, 2026
a3b649c
lint: ruff-format file_store.py (fixes CI format check on integration)
pskeshu Jun 17, 2026
0d1cdeb
feat(plans): plan narrative in agent context + fix provenance spec-ta…
pskeshu Jun 17, 2026
bb54ab6
Connection D: editable/fillable imaging specs in the plan inspector
pskeshu Jun 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -146,5 +146,6 @@ electron/
gently/ui/tui/node_modules/
gently/ui/tui/dist/

# Runtime storage accidentally created on Linux when GENTLY_STORAGE_PATH="D:/" resolves literally
D:/
# Stray local storage: GENTLY_STORAGE_PATH default (D:\Gently3) resolves
# literally to ./D:/ under the repo on Linux. Not data we track.
/D:/
112 changes: 112 additions & 0 deletions docs/HEURISTICS-AUDIT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Heuristics audit — where to use the model (as a typed-output function) instead

Codebase sweep (5 parallel scanners + synthesis) for heuristics that **fake
judgment** an LLM would do better — in the spirit of the genotype→channel
refactor (drop the lookup table, let the model infer, keep a typed provenance
record + confirm-when-unsure). The flip side — logic that **must stay
deterministic** (safety, math, calibration, transport) — is listed at the end so
we don't mistakenly LLM-ify it.

The unifying move for every candidate: **LLM with a typed structured-output
schema + provenance + a confirm/UNCERTAIN escape**, never free-text-then-parse.

## Model candidates (ranked)

### High value

1. **Hatching / time-to-stage prediction** — `organisms/celegans/developmental_tracker.py`
*(the closest twin of genotype→channel; medium effort)*
Three hardcoded 20 °C lookup tables (`STAGE_TIMING_20C`, `TIME_TO_HATCHING`,
`TIMING_VARIABILITY`) plus magic `{HIGH:1.0, MEDIUM:1.5, LOW:2.0}` uncertainty
fudge factors. Structurally **can't use the rig's actual temperature** (we run
a TEC), the strain, or the embryo's observed progression rate. Let the model
produce a calibrated, explained interval; **keep the literature table as a
deterministic sanity bracket** and flag when the estimate falls outside it.
→ `{ predicted_minutes_to_hatching, low, high, basis, assumptions{temperature_c,strain,used_observed_rate}, confidence, reasoning }`

2. **Citation → PubMed query** — `harness/plan_mode/tools/research.py` (`_search_pmid`)
A regex that only handles "Surname et al YEAR …" + six hand-rolled query-
relaxation strategies + a stopword/word-position ladder that drops load-bearing
nouns. The model parses the sloppy citation and proposes relaxed queries; **code
keeps the deterministic esearch call and never fabricates a PMID.**
→ `{ author_last, year, journal, topic_keywords[], organism, pubmed_query, alt_queries[], confidence }`

3. **Lab-history retrieval** — `harness/plan_mode/tools/lab_context.py`, `harness/memory/interface.py`
Semantic recall faked by substring-OR over query tokens (matches "we"/"before",
misses every paraphrase). Feed the model the candidate records and have it
**rank/select from provided ids only** (no fabrication). Read-only, no
acquisition risk.
→ `{ matches:[{kind,id,summary,relevance,why_relevant}], answer }`

4. **Stage-label parse via 22-entry synonym dict** — `developmental_tracker.py` (`_parse_stage_name`)
*(small effort, pure robustness win)* The Vision call already classifies; the
brittleness is a plain-text `STAGE:/CONFIDENCE:` block scraped line-by-line, with
off-vocabulary phrasings silently collapsing to `UNKNOWN` (which kills the
downstream hatching prediction). Constrained-enum structured output deletes the
parser + synonym table.
→ `{ stage: enum(...), confidence: enum(high|medium|low), is_transitional, reasoning }`

### Medium value (mostly small — fix the output contract, not the judgment)

5. **Calibration Vision calls** — `hardware/dispim/claude_client.py`
Four Vision calls return positional free text recovered by `'yes' in first_line`
/ `re.search(r'\d+')` / first-valid-letter, with silent defaults (so "no, this is
not yes…" reads as *yes*). Typed output deletes the parse + silent-default layer.

6. **ML architecture ranking** — `ml/architectures.py` (`get_suitable_architectures`)
Hard feasibility gates (VRAM / dataset) are correct **and stay**; the `+2/+1/+1`
point-score ranking that follows discards the per-arch prose. Let the model rank
the *pre-filtered feasible set* (ids constrained to that set).

7. **Training label normalization** — `ml/data_loader.py` (`build_labels_from_store`)
Class space built by exact-string identity over free-text human annotations —
"1.5-fold" and "1.5 fold" become different classes. Model normalizes to the
canonical staging vocabulary, flags novel/ambiguous ones.

### Lower value

8. **"Plan has a control?"** — `plan_mode/tools/validation.py` — substring scan of a
6-word keyword set; a scientific judgment over the whole plan. Non-blocking
warning → safe for the model.
9. **CGC HTML scraping** — `research.py` (`_cgc_search`) — positional multi-group
regex over fetched HTML; structured extraction the model does better (HTTP GET
stays code; **mark strain names low-confidence to avoid sending someone to order
a hallucinated strain**).

### Cross-cutting batch (small each): typed output for the detector/verifier cluster
`harness/detection/verifier.py`, `app/detectors/hatching.py`,
`app/detectors/dopaminergic_signal.py`, `hardware/dispim/sam_detection.py` — all
already make the right model call but reconstruct the verdict via
`startswith`/regex-JSON-scraping with silent defaults. A batch move to native
structured output **strictly reduces parse-induced false negatives** without
touching the deterministic vote-tally/consensus/enum-dispatch downstream.

**Reference implementations already in the repo (imitate, don't change):**
`dopaminergic_signal`'s perceiver→classifier rubric (typed enums, UNCERTAIN
escape, conservative-on-tie) and onboarding's `_extract_with_llm` (typed
extraction, degrade-to-verbatim fallback).

## Keep deterministic (do NOT LLM-ify)
Safety, math, calibration, and transport — where a hallucinated value is unsafe
or breaks reproducibility:
- Laser-power safety limits + wavelength→MM-property map (`hardware/dispim/devices/optical.py`)
- SPIM trigger-timing arithmetic, piezo–galvo calibration, MM framing (`dispim/config.py`)
- Calibration prior EMA + R²≥0.75 slope-lock gate (`dispim/calibration.py`)
- SwitchBot GATT byte commands / status decoding (`hardware/switchbot.py`)
- Temperature setpoint bound [0,99.9] °C + stabilization I/O (`hardware/temperature.py`)
- Autofocus signal-processing, curve fitting, adaptive-sweep stop rules (`analysis/core.py`, `analysis/focus.py`)
- Classical-CV ROI detection + pixel→stage coordinate transforms (`detection.py`, `sam_detection.py` geometry)
- Timelapse rule dispatch + `confirm_timepoints` debounce + monotonic power ramp (`app/orchestration/timelapse.py`)
- Volume→b64 dark/flat calibration + fixed brightness scaling (`dopaminergic_signal._volume_to_b64` — deliberately non-adaptive)
- Wake-router debounce/throttle/stage-transition gate (`app/wake_router.py`)
- Plan hardware limits, detector-preset membership, dependency-cycle DFS, stage-order normalization (`plan_mode/tools/validation.py`)
- Ensemble vote tally + 0.70 quorum / unanimity consensus (`detection/verifier.py`)
- ML metric/aggregation math: confusion matrix, F1, federated averaging (`ml/evaluation.py`, `federated.py`)
- Core imaging geometry (max-projection, crop bounds, Euler rotations) + UI event reduction/routing/security (`core/imaging.py`, `ui/web/*`)
- Device-state SSE watchdog/staleness timers (`app/device_state_monitor.py`)
- Reference-type dispatch (PMID/DOI/URL by canonical syntax), `os.path.isfile` checks (`research.py`)

## Note
`gap_assessment.conversation_weight` (the 0.25/0.1/0.05 readiness scalar) is now
largely **vestigial** — it only returns 'heavy' (lab onboarding) or 'none' — so
it's not worth an API call. Left off the candidate list.
Loading
Loading