Motivation
The accuracy gap on fold stages (1.5fold: 59%, 2fold: 70%) is the primary bottleneck for autonomous microscopy decisions. Max-intensity projections collapse depth information — overlapping body segments fuse into one bright band, destroying the key discriminative feature: how many times the body folds back on itself.
When a biologist can't tell the stage from a projection, they scroll through z-slices and count body segments at the most informative plane:
- 1.5fold: 1-2 body segment profiles visible at the midplane
- 2fold: 2 clearly separated profiles
- pretzel: 3+ profiles, crossing over each other
Design
A subagent dispatched alongside or after the primary perception call, receiving 2-3 selected z-slices:
- Midplane (z ≈ Z/2): maximum number of body segments visible simultaneously
- Max-entropy slice: computationally selected as the slice with the most structural information
The subagent prompt asks specifically to count discrete body segment profiles — a much more constrained task than full stage classification.
Open problem: triggering the subagent
We dropped VLM self-reported confidence (uncalibrated noise). Stability (consecutive same-stage count) only helps after multiple calls. Measuring uncertainty of a single perception call is unsolved in this system. Options to explore:
- Always run the subagent during fold stages (expensive but simple)
- Run when the primary call's stage disagrees with the temporal prediction
- Run when the stage changes (transition detection)
- Develop a separate uncertainty estimator
This trigger heuristic is itself a research question worth experimenting with.
Benchmark
Measure accuracy lift on fold stages (1.5fold, 2fold, pretzel) with and without the subagent, using the existing benchmark infrastructure.
Motivation
The accuracy gap on fold stages (1.5fold: 59%, 2fold: 70%) is the primary bottleneck for autonomous microscopy decisions. Max-intensity projections collapse depth information — overlapping body segments fuse into one bright band, destroying the key discriminative feature: how many times the body folds back on itself.
When a biologist can't tell the stage from a projection, they scroll through z-slices and count body segments at the most informative plane:
Design
A subagent dispatched alongside or after the primary perception call, receiving 2-3 selected z-slices:
The subagent prompt asks specifically to count discrete body segment profiles — a much more constrained task than full stage classification.
Open problem: triggering the subagent
We dropped VLM self-reported confidence (uncalibrated noise). Stability (consecutive same-stage count) only helps after multiple calls. Measuring uncertainty of a single perception call is unsolved in this system. Options to explore:
This trigger heuristic is itself a research question worth experimenting with.
Benchmark
Measure accuracy lift on fold stages (1.5fold, 2fold, pretzel) with and without the subagent, using the existing benchmark infrastructure.