Skip to content

Shape plan mode around microscope context hierarchy#32

Open
ceej640 wants to merge 4 commits into
gently-project:developmentfrom
ceej640:ceej/planning-context-hierarchy
Open

Shape plan mode around microscope context hierarchy#32
ceej640 wants to merge 4 commits into
gently-project:developmentfrom
ceej640:ceej/planning-context-hierarchy

Conversation

@ceej640

@ceej640 ceej640 commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Context

This PR takes in the context from pskeshu's PR #23 reply and the Kesavan/Nordenfelt smart-microscopy framework (Journal of Microscopy 2026, doi: 10.1111/jmi.70063). It maps the technical/experimental/theoretical/conceptual hierarchy into Gently plan mode so the planning layer can become the biologist-facing interaction layer over DiSPIM, C. elegans embryos, and later other experiment modalities.

This is intentionally one PR for iteration, per the request in the PR #23 thread.

Changes

  • Added a first-class PlanContext on plan items with technical, experimental, theoretical, conceptual, sample_entity, operator_context, constraints, and success_question.
  • Persisted plan_context through the SQLite context store, file-backed context store, templates, and plan restoration paths.
  • Updated plan-mode tools so create_plan_item and update_plan_item accept context hierarchy data and render it in plan review/export output.
  • Updated the plan-mode prompt with the paper's hierarchy and the Ryan/Brie DiSPIM embryo workflow: bottom overview XY finding, F/head-axis alignment, calibration before timelapse, and F-drive/glass-slide safety context.
  • Added validation warnings for imaging items missing hierarchy layers and for DiSPIM embryo timelapse plans that do not state calibration/F-drive focus-safety assumptions.
  • Added docs describing the mapping from the smart-microscopy framework into Gently's planning implementation.

Verification

  • python -m py_compile gently/harness/memory/model.py gently/harness/memory/_plans.py gently/harness/memory/store.py gently/harness/memory/file_store.py gently/harness/plan_mode/prompt.py gently/harness/plan_mode/tools/planning.py gently/harness/plan_mode/tools/validation.py tests/test_context_store.py tests/test_plan_context_validation.py
  • python -m pytest tests/test_context_store.py tests/test_plan_context_validation.py -q -p no:cacheprovider
  • git diff --check

@pskeshu

pskeshu commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

the latest comments from #23 apply here. Need a way to streamline plan generation itself - it currently takes a long time to make a plan. the tool calls might not be the most effectively organized to create the campaigh - phase - task structure, and the class of tasks - bench, imaging, genetics, etc,... there is a question of structure that constraints, versus structures that enrich the experience for a biologist looking for discovery - at the same time, we have microscope support, so need to maximize that supported integration. Need design that takes care of all of this - and is aesthetic.

@ceej640

ceej640 commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator Author

I posted the empirical setup/preflight status in the #23 thread: #23 (comment)

That run could not complete a true plan-quality benchmark because this environment lacks ANTHROPIC_API_KEY, lacks the installed gently_perception dependency, and did not expose the requested browser-MCP control surface. It did get far enough to find and fix the web route rendering issue now pushed in this PR.

Your point here is consistent with what the preflight suggests: the next design iteration should focus on streamlining plan generation itself, not just adding more fields. The campaign/phase/task structure should probably be created in fewer, more deliberate operations, with enough structure to support microscope integration while still feeling like a discovery-oriented planning surface for the biologist.

@ceej640

ceej640 commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator Author

I replied on #23 with the empirical setup status. Relevant to this PR: the benchmark preflight found a web UI rendering blocker, fixed here in commit aa79f0f. The true plan-generation benchmark could not complete because the local environment lacks ANTHROPIC_API_KEY and gently_perception, and browser MCP/Playwright were not available; /plan over WebSocket does work.

I agree with your design concern. The next iteration should probably focus less on adding more fields and more on streamlining the generation operation itself: one coherent campaign/phase/task construction path, task classes that enrich rather than constrain, and microscope-support context kept explicit. #32 gives the context substrate; the next PR should likely address the planner workflow/UX around that substrate.

@ceej640

ceej640 commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator Author

Clarifying my earlier wording because I mixed two separate things.

The missing ANTHROPIC_API_KEY, missing gently_perception, and lack of browser MCP/Playwright block only the empirical benchmark: I cannot honestly measure real plan-synthesis latency, visible browser UX, or generated-plan quality in this environment yet.

Those blockers do not prevent a follow-up implementation pass on #32. The remaining implementation/design work is still real and can start from code review and tests:

  • streamline campaign -> phase -> task generation
  • reduce tool-call chatter around creating and linking plan items
  • make task classes/categories more intentional
  • keep microscope-support context explicit where it affects safety or feasibility
  • improve the planner workflow/UX on top of the context substrate added here

So the clean split is:

  1. Shape plan mode around microscope context hierarchy #32 currently provides the context substrate and includes the Starlette route fix needed for local UI startup.
  2. A follow-up code pass can improve the planner generation workflow without waiting for the benchmark.
  3. The benchmark is still needed afterward to validate runtime latency, visible UI/UX, and generated-plan quality against the real model/browser path.

I am going to implement what I can from the workflow side now: reduce the number of separate tool calls needed to create a structured campaign/phase/task plan, while leaving the empirical benchmark as a separate blocked validation step.

@ceej640

ceej640 commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator Author

Follow-up implemented from the clarified #32 thread in commit 301bffb.

What changed:

  • Added create_structured_plan, a plan-mode tool that creates the root campaign, phase subcampaigns, typed tasks, local dependency links, specs, references, estimated durations, and plan_context in one operation.
  • The tool accepts task type or task_class, so the campaign -> phase -> task structure can be generated with explicit classes such as imaging, bench, genetics, analysis, and decision_point.
  • It supports local phase keys and task keys, so dependencies can be expressed as depends_on: ["calibrate"] inside the single outline instead of requiring later link_plan_items calls.
  • It renders the resulting plan for review by default, using the same plan rendering path as propose_plan.
  • Updated the plan-mode prompt to prefer this operation when the initial structure is known, while keeping the lower-level tools for incremental edits.
  • Added docs and a focused test covering phases, items, dependencies, task class aliasing, and plan_context preservation.

Why this addresses the implementation side of your comment:

  • It reduces tool-call chatter around campaign/phase/task construction.
  • It makes the task-class structure more deliberate at creation time.
  • It keeps microscope-support context explicit through plan_context instead of hiding it in chat prose.

What remains unvalidated:

  • I still cannot report real runtime latency, browser UI/UX, or model-generated plan quality until the empirical benchmark blockers are resolved. This commit is a workflow/tooling improvement that can be tested offline; the runtime benchmark is still a separate validation step.

Verification:

  • python -B -c "...compile planning.py and prompt.py..."
  • pytest tests/test_structured_plan_tool.py tests/test_context_store.py tests/test_plan_context_validation.py -q -p no:cacheprovider
  • git diff --check

@ceej640

ceej640 commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator Author

Follow-up implemented for the proposed benchmark path in commit 9f9ebe1.

What changed:

  • Added benchmarks.structured_plan_replay, an offline replay benchmark for create_structured_plan.
  • Added python -m benchmarks.runner structured-plan as the runner entry point.
  • The replay uses a fresh local ContextStore, calls create_structured_plan once, and verifies the expected root campaign, two phases, three plan items, and two dependencies.
  • Added docs and a focused test for the replay benchmark.

This is deliberately not a claim about real model latency, browser UX, or generated-plan quality. It is an offline regression check for the streamlined structured-plan tool path while the empirical benchmark remains blocked by missing model/browser/runtime dependencies in this environment.

Verification:

  • pytest tests/test_structured_plan_tool.py tests/test_structured_plan_replay.py -q -p no:cacheprovider
  • python -m benchmarks.runner structured-plan
  • python -m benchmarks.structured_plan_replay
  • non-writing compile check for the benchmark modules
  • git diff --check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants