Agent-sim predicate invention, partial-observability simulators, and reconstruction-guard refactor by yichao-liang · Pull Request #35 · BasisResearch/predicators

yichao-liang · 2026-05-31T16:42:43Z

Summary

Merges the accumulated sim-learning work into master, continuing the
periodic-merge pattern of #20 / #30 / #34. Large branch (172 commits, 83
files, +7766/-942); the major themes are below and the per-change
rationale lives in the commit history.

Highlights

Agent-driven simulation learning & predicate invention

New agent_sim_predicate_invention and agent_sim_recurrent_predicate_invention approaches.
code_sim_learning/ synthesis + synthesis_validation.py; geometric-gate guidance made binding in synthesis prompts; model-learning prompts made domain-general.
agent_sdk/ tooling: tool registry, versioned snapshots, phase-aware sandbox prompt, one-tool-per-line surface logging.

Partial observability / latent hidden state

pybullet_boil made partially observable, with a PO ground-truth simulator (boil/gt_simulator_po.py) and its factory + config block.
Latent and privileged hidden-state blocks added to State and threaded through predicate-quality eval, refinement, and a recurrent latent-threaded simulator-fitting loop.

Bilevel refinement & forward validation

agent_bilevel_max_refine_retries setting; reseed bilevel refinement before re-querying the LLM in _solve.
Surface forward-validation failures in synthesis plan refinement; log final-state details and sync fitted params to _ParamsView.

PyBullet robustness & reconstruction guard

Trust authoritative joint positions in robot reset_state; retry transient PyBullet shared-memory errors.
Replaced the per-env _strict_set_state_reconstruction flag with universal warn/raise magnitude thresholds on PyBulletEnv (1e-3 warn, 2.0 raise) so impossible states (e.g. held=-10000) still abort while benign IK noise and symbolic-vs-physical placement gaps (e.g. pybullet_fan) only warn — no per-env opt-in.

Tooling / CI

--parallel mode and sys.path self-bootstrap in local launch scripts; counter-first log filenames; assorted autoformat / mypy / pylint cleanups.

Testing

This session's reconstruction-guard change was verified against a broad
pybullet-env test sweep (blocks, cover, domino, boil, fan, switch,
oracle-approach, option-model, skill-factories, sim-learning) with zero
spurious raises, plus the 4 repo checks (pytest / mypy --platform linux
/ pylint / yapf+isort) on the changed files. The remaining work is
covered by its own commits and tests; CI runs the full suite.

Delegate option execution to option_model.get_next_state_and_num_actions instead of duplicating its termination logic (stuck detection, Wait atom-change checks) and directly accessing its simulator.

…inement Extract the duplicated backtracking loop from run_low_level_search (SeSamE) and _refine_sketch (agent bilevel) into a single run_backtracking_refinement function in planning.py. Both callers now delegate to it with their own sample_fn and validate_fn callbacks, eliminating ~80 lines of duplicated loop/backtracking logic.

Replace 60 lines of manual option-model execution with a call to run_backtracking_refinement using max_tries=[1] and a sample_fn that returns the pre-grounded options. Remove unused Any import.

Move the _current_observation assignment into _reset_state so callers don't need to remember the two-step pattern. Clarify the relationship between _current_observation (backing field) and _current_state (typed read accessor) in docstrings and comments.

Adds agent_bilevel_plan_sketch_file setting that, when set to a file path, loads the plan sketch directly from that file, bypassing the foundation model query. Includes test data files and a unit test.

Extract repeated wait-termination check into _check_wait_termination helper and unify the three _terminal branches into a single definition with config checks inside the function body.

…dition checks

- Remove dead/commented-out code and stale self-question comments - Add _VIRTUAL_OBJECT_TYPES constant to replace hardcoded type-name skip lists in _set_state and _get_state - Move env-specific _get_robot_state_dict branches to subclass overrides in pybullet_cover and pybullet_blocks - Extract _get_camera_matrices helper to deduplicate render methods - Extract _get_object_state_dict from _get_state for per-object logic - Move create_pybullet_block/sphere to pybullet_helpers/objects.py - Merge _create_task_specific_objects into _set_domain_specific_state - Rename: _reset_state -> _set_state, _reset_custom_env_state -> _set_domain_specific_state, _extract_feature -> _get_domain_specific_feature - Add docstrings explaining where each method is called from

Reorganize methods into labeled sections (Setup, Public API, Core Loop, State Write/Read, Grasp Management, Action Helpers, Rendering, Utilities) so related functions are adjacent. Update module docstring to document the main public API and state synchronization methods.

Add _step_base() and _domain_specific_step() to PyBulletEnv base class. step() now calls _step_base (robot control, physics, grasp) then _domain_specific_step (water filling, heating, etc.), gated by _skip_domain_specific_dynamics flag for kinematics-only mode. Migrate all 15 domain envs to override _domain_specific_step() instead of step(). Envs with pre-step logic (coffee, switch, blocks, cover) still override step() for the pre-step part only.

Document the step_base → domain_specific_step → get_observation flow, _skip_domain_specific_dynamics flag, and _domain_specific_step as an optional override.

Replace direct access to private _skip_domain_specific_dynamics attribute with a public constructor parameter, so callers declare kinematics-only mode at creation time instead of mutating internal state after construction.

…ging Both AgentSessionMixin and AgentExplorer had near-identical wrappers that ran session.query() synchronously via nest_asyncio or asyncio.run. Move that logic into a module-level run_query_sync helper in session_manager and have both callers delegate to it.

…y and maintainability

Distinguishes the grounded-plan explorer from upcoming bilevel variants. AgentExplorer -> AgentPlanExplorer, get_name() 'agent' -> 'agent_plan', file moved to agent_plan_explorer.py, and all callers / docstrings / YAML config examples updated accordingly.

The mixin is pure agent-session plumbing (session creation, lifecycle, explorer factory) and has no approach-specific logic, so it belongs next to session_manager.py, tools.py, and the sandbox managers rather than in approaches/.

The explorer asks a Claude agent for a plan sketch, refines it against the approach's current (possibly learned) option model, and rolls the refined plan out in the real env. When the mental model disagrees with reality — e.g. the sketch expects JugFilled after a Wait but the mental model's process dynamics can't produce it — the explorer truncates the plan at the deepest unsatisfiable subgoal (inclusive) so the real-env rollout ends exactly where the disagreement occurs, maximising signal per experiment. Key pieces: - predicators/agent_sdk/bilevel_sketch.py: extracted the sketch build / parse / refine helpers from AgentBilevelApproach as module-level functions so both the approach (solve path) and the new explorer (exploration path) can share them. refine_sketch gains truncate_on_subgoal_fail: the on_step_fail callback snapshots the deepest subgoal failure seen during backtracking, and on exhaustion the captured prefix is returned as the experiment plan. - predicators/explorers/agent_bilevel_explorer.py: new explorer. Reads option_model from tool_context (synced by the approach), builds the sketch prompt via bilevel_sketch, runs refine_sketch with check_subgoals=True, check_final_goal=False, truncate_on_subgoal_fail =True, wraps the result in an option_plan_to_policy that converts OptionExecutionFailure into RequestActPolicyFailure so the episode cleanly terminates at the point of real-env divergence. Stashes the sketch subgoals/options on ToolContext for downstream diffing by the learning approach. - predicators/approaches/agent_bilevel_approach.py: shim methods over bilevel_sketch; behaviour unchanged. - predicators/approaches/agent_planner_approach.py: _create_explorer dispatches both "agent_plan" and "agent_bilevel" through the agent factory path and forwards CFG.explorer as the name. - predicators/explorers/__init__.py: factory branch merged for the two agent-session-backed explorers. - predicators/agent_sdk/tools.py: ToolContext gains last_sketch_subgoals / last_sketch_options fields, populated by the explorer and marked TODO for the learning approach to consume. - tests/explorers/test_agent_bilevel_explorer.py: happy-path, fallback, wait-memory-injection, and deepest-subgoal-failure truncation tests.

- New setting agent_bilevel_explorer_max_samples_per_step (default 50), separate from the solve-path budget, so the explorer's backtracking cost is independently tunable. - Log the actual experiment plan (option names, objects, params) after refinement so the explorer's output is visible alongside the existing sketch/truncation log lines. - Test config updated to set both budgets explicitly.

AgentSimLearningApproach extends AgentBilevelApproach to learn process dynamics online. Each cycle: the agent synthesizes parameterized process rules via Claude (using run_python / evaluate_simulator / test_simulator MCP tools), parameters are fitted via emcee MCMC, and the learned dynamics are composed with a kinematics-only PyBullet oracle into a combined option model for plan refinement. Key pieces: - predicators/approaches/agent_sim_learning_approach.py: the approach. Initialises with a kinematics-only option model (so AgentBilevelExplorer sees disagreements at process-dynamic subgoals like JugFilled/Boiled), and replaces it with the kin+learned model after each successful synthesis cycle. - predicators/agent_sdk/tools.py: create_synthesis_tools() builds the three MCP tools the synthesis agent uses; extra_mcp_tools field and get_allowed_tool_list(extra_names=) plumbing lets the approach inject them into the session. - predicators/code_sim_learning/: ParamSpec, fit_params (emcee MCMC), compute_mse, LearnedSimulator. - predicators/ground_truth_models/boil/gt_simulator.py: ground-truth process-dynamics simulator for the boil environment. - tests/: approach and param-fitting tests.

- agents.yaml: comment out agent_bilevel preset, add agent_sim_learning with explorer=agent_bilevel and skip_test_until_last_ite_or_early_stopping. - common.yaml: disable failure/test video recording, set num_online_learning_cycles=1 for faster iteration.

Simulation primitives (code_sim_learning/utils.py): - apply_rules(state, rules, params) → ProcessUpdate - merge_updates(base_state, updates, process_features) → State - simulate_step(state, action, base_env, rules, params, features) → State These replace _build_fitted_step_fn, merge_process_updates, _sim_fn_from_rules, and the body of _build_combined_simulator. GT simulator factory (ground_truth_models): - GroundTruthSimulatorFactory ABC + get_gt_simulator(env_name) discovery, following the existing get_gt_options / get_gt_nsrts pattern. - PyBulletBoilGroundTruthSimulatorFactory registered in boil/. - Replaces the hardcoded _load_oracle_simulator in the approach. Oracle ablation flags (settings.py): - agent_sim_learn_oracle_sim_program: load GT rules, skip synthesis. - agent_sim_learn_oracle_sim_params: use GT param values, skip MCMC. Also: kin_env → base_env rename throughout, redundant self._types assignment removed, process_features computed once in __init__.

- yapf + isort autoformatting applied to all touched files. - pylint: fix logging-not-lazy in agent_bilevel_explorer, add broad-except and reimported disables in agent_sim_learning_approach. - mypy: fix base/env variable name collision, add type: ignore on lambda inference, add return type annotations to GT factory methods.

…est description

Use utils.abstract to evaluate expected atoms in low-level search so that DerivedPredicates — which require a Set[GroundAtom] rather than a State — are handled correctly alongside regular predicates.

When sequential simulate calls differ only in process features (as in the combined kinematic+learned simulator), reapplying joint positions and tearing down/recreating grasp constraints causes visible arm jitter. Compare robot poses first and skip the kinematic reset path when they already match.

Factor simulator synthesis into a shared _learn_simulator helper so that both learn_from_offline_dataset and learn_from_interaction_results can trigger it on their respective trajectory sources. Also create a separate headless env for parameter fitting so MCMC's thousands of _set_state calls don't thrash the GUI env during training.

When _set_state is called with a PyBulletState whose simulator_state is a rich dict carrying joint_positions, those joints could only have come from a previous _get_state call on the same robot, so they are authoritative. Previously reset_state always ran an EE-pose roundtrip check that could spuriously fail on Euler->Quat float noise at the 1e-2 tolerance, discard the joints, and fall back to IK — which dropped information not encoded in (x, y, z, tilt, wrist) and surfaced as ~1e-2 rad wrist/roll drift across refinement/execution rollouts. Add a trust_joints flag, default False to preserve the guardrail for plain-State hint callers, and set it True in _set_state when the rich dict is present.

Both scripts/local/launch.py and scripts/local/launch_simp.py now: * Insert the project root into sys.path themselves, so callers no longer need to prefix invocations with PYTHONPATH=. * Accept --parallel to launch each experiment in its own macOS Terminal window concurrently. Each window writes a temp .command script that cd's to the repo root, exports PYTHONHASHSEED=0, runs the command, and pauses on `read` so you can inspect the final state before closing. * Build the run command with sys.executable instead of bare `python` so the new Terminal's fresh shell doesn't fall back to a different conda env (the user's default was activating base in the new window, which lacks the project's deps). launch.py also tees output to its logfile in parallel mode so the new window shows progress live while the logfile is still written. The wrong-import-position pylint warning is silenced once with a module-level disable since there's no other valid place for the post-sys.path-insert cluster_utils import. Docstrings expanded to document the flags and behavior; launch_simp stays minimal and points at launch.py for the featureful variant.

The import was added in 020697d but never referenced; pylint flagged it as unused-import (W0611), which fails the lint CI check.

Extracts forward validation into bilevel_sketch.validate_plan_forward so both AgentBilevelApproach and the synthesis evaluate_plan_refinement tool share it. The tool now runs forward validation after refinement passes and reports both verdicts, with per-step subgoal-divergence logging when a sketch is provided. Updates the synthesis prompt to explain that refinement-pass + forward-validation-fail almost always means a learned threshold is more permissive than the env's effective behavior.

Raises max_num_steps_interaction_request 300→500 to give longer continuous rollouts headroom under forward validation, and switches the sweep to seeds 0–4 to surface regressions across more starts.

yapf/isort reflow on bilevel_sketch.py + test_agent_bilevel_approach.py, plus splitting the subgoal-divergence log site to keep the option-string formatter under the 80-col line limit pylint enforces.

Budget for reseeding continuous refinement on the same plan sketch before paying for a fresh LLM skeleton query; consumed by _solve.

Wrap refinement in an inner reseed loop: a sketch that refines but fails forward validation is a continuous-params problem, not a wrong skeleton, so resample params with fresh seeds before re-querying the agent (which rarely changes the skeleton yet always costs an LLM call). Seeds are flattened across both loops so each (sketch, refine) pair is unique.

CI runs mypy under its Linux platform, where the `sys.platform != "darwin"` guard in scripts/local/launch.py and launch_simp.py makes the rest of each helper dead code (unreachable). Disable warn_unreachable per-module for those two scripts so CI's static-type-checking passes; they still type-check otherwise, and the check passes on macOS too.

latent holds the agent's inferred belief about hidden state; privileged holds the environment's true hidden state that the observation omits. Both are excluded from __hash__/allclose and deep-copied by State/PyBulletState/VLMState/StateWithCache copy(). Predicate.holds and GroundAtom.holds auto-route latent into classifiers that opt in via a latent kwarg.

compute_sse_recurrent and fit_params_recurrent thread a per-trajectory latent block across steps; apply_rules_with_latent dispatches 5-arg recurrent rules rule(state, latent, history, updates, params) alongside legacy 3-arg rules; init_latent and read_latent_init build the initial block from a LATENT_INIT export.

evaluate_predicate_quality materialises per-step latent for each trajectory via approach.materialise_latent so latent-aware predicates score against a real block. Add the _attach_initial_latent hook in the bilevel approach to seed task.init.latent before refinement; default is a no-op.

Add the cross-cutting CFG.partially_observable flag. In PO mode the jug type drops heat_level so the agent never sees the latent's name; heat is kept internally (state.privileged plus the jug.heat_level sim attribute), WaterBoiled reads the derived observable bubbling_level, and the heating/state-reset paths route off the observable array. Fully-observable mode is unchanged.

Partial-observability variant of agent_sim_predicate_invention: synthesized rules carry a latent block across steps and may declare LATENT_INIT, read from the simulator file. The parent loader now execs that file once and returns its namespace, so LATENT_INIT loads without a second exec; also guards the oracle-sim-program path as incompatible with partial observability.

gt_simulator_po.py is the answer-key for the heat-hidden boil env: it carries the hidden per-jug heat in a recurrent latent block and surfaces it only as the observable bubbling_level (the env's monotone ramp), never touching the heat_level feature that is absent in PO mode. Gates are hard (no soft thresholds) since the recurrent fit is gradient-free. Both boil GT-simulator factories now gate get_env_names on CFG.partially_observable, so get_gt_simulator dispatches to exactly one module per run: the PO simulator under partial observability, the fully-observable gt_simulator.py otherwise.

…roach The latent mechanism is orthogonal to predicate invention, so it moves from AgentSimRecurrentPredicateInventionApproach down into the base AgentSimLearningApproach, auto-activated by rule signature (has_latent_rules). Fully-observable simulators (3-arg rules) take the existing non-latent paths unchanged; partially-observable ones (5-arg rules) thread a latent block through fitting, the combined simulator, and the oracle-param SSE diagnostic. This lets the base approach (which keeps all ground-truth predicates, no invention) load and solve with the PO GT simulator: the oracle-program path no longer asserts against partial observability. The recurrent predicate-invention approach slims to just its synthesis prompt, inheriting every latent mechanic from the base.

agent_po_gt_sim runs the base agent_sim_learning approach (keeps all ground-truth predicates) with the PO GT simulator loaded as the oracle program and oracle params, on the heat-hidden boil env. A fixed plan sketch and zero online cycles mean no LLM is queried, so it is a fast, deterministic end-to-end check. The LLM-driven agent_predicate_invention block is commented out so the launcher targets only this test.

boil/__init__.py imported only the fully-observable simulator factory, so get_gt_simulator (which discovers GroundTruthSimulatorFactory subclasses via get_all_subclasses) never saw PyBulletBoilPOGroundTruthSimulatorFactory and raised NotImplementedError for pybullet_boil under partially_observable. Import the PO factory and add it to __all__ so the PO oracle simulator is discoverable.

The strict raise on a reconstruction mismatch was gated on whether an env overrode _get_state() -- a leaky proxy for 'has an exact state<->sim mapping'. An env may override _get_state() for a non-kinematic reason (e.g. boil attaching a hidden-heat privileged block) without making its robot reconstruction any less lossy than the base env's, which spuriously promoted benign ~0.02 rad IK round-trip noise into a fatal ValueError. Replace the proxy with an explicit _strict_set_state_reconstruction ClassVar defaulting to False (warn). pybullet_blocks, whose State<->sim mapping is exact, opts into True. Behavior is unchanged for every existing env (blocks raises as before; all others warn as before).

- training.py: blank line after a nested import block (isort 5.10.1). - structs.py: suppress arguments-differ on DerivedPredicate.holds and ConceptPredicate.holds, which intentionally keep the legacy 3-arg signature (base Predicate.holds gained a latent param); they already suppress the mypy override error. - pybullet_boil.py: h != h -> np.isnan(h) (comparison-with-itself) and iterate init_dict via .items() (consider-using-dict-items).

The _set_state reconstruction guard used a per-env boolean (_strict_set_state_reconstruction) to decide whether a State<->sim round-trip mismatch should raise or merely warn. That required each env to assert "my mapping is exact", which is brittle: pybullet_fan, for instance, stores fan positions symbolically and places the bodies by side, so a valid State legitimately round-trips with ~0.35 m of benign position disagreement -- not an angle, so it wasn't covered by the existing IK-noise rationale either. Replace the flag with two universal magnitude thresholds on PyBulletEnv: warn above _reconstruction_warn_atol (1e-3, unchanged behavior) and raise above _reconstruction_raise_atol (2.0). Benign reconstruction error is workspace-scale at most (~0.8 m worst case by fan geometry, well under 2.0), while an impossible or corrupt requested feature (e.g. held=-10000, off by 1e4) is far above it -- so only the latter aborts, for every env, with no per-env opt-in. pybullet_blocks drops the flag and uses the base defaults; its held=-10000 reset test still raises as before.

…ller's state

The master merge kept both sides of the conflict in code_sim_learning/utils.py, leaving two byte-identical definitions of iter_feature_residuals and tripping mypy's no-redef check. Drop the second copy.

yichao-liang added 30 commits April 1, 2026 17:43

Refactor _validate_plan_forward to use option model directly

ee3fe4f

Delegate option execution to option_model.get_next_state_and_num_actions instead of duplicating its termination logic (stuck detection, Wait atom-change checks) and directly accessing its simulator.

Simplify _validate_plan_forward to use run_backtracking_refinement

e7eaf05

Replace 60 lines of manual option-model execution with a call to run_backtracking_refinement using max_tries=[1] and a sample_fn that returns the pre-grounded options. Remove unused Any import.

Add CFG option to load plan sketch from file instead of LLM

57ef4b8

Adds agent_bilevel_plan_sketch_file setting that, when set to a file path, loads the plan sketch directly from that file, bypassing the foundation model query. Includes test data files and a unit test.

Remove redundant conditions from Place action in boil_plan_sketch

d0ac199

Scale target joint value based on switch_joint_scale in PyBulletBoilEnv

0cafcd8

Refactor _terminal in option model to deduplicate wait-termination logic

3808337

Extract repeated wait-termination check into _check_wait_termination helper and unify the three _terminal branches into a single definition with config checks inside the function body.

Refactor terminal state logging in _OracleOptionModel to simplify con…

3624d01

…dition checks

Format docstring in get_observation method for improved readability

80c8110

Update PyBulletEnv module docstring for step() refactoring

f86c0ea

Document the step_base → domain_specific_step → get_observation flow, _skip_domain_specific_dynamics flag, and _domain_specific_step as an optional override.

Add skip_process_dynamics constructor param to PyBulletEnv

9cddb03

Replace direct access to private _skip_domain_specific_dynamics attribute with a public constructor parameter, so callers declare kinematics-only mode at creation time instead of mutating internal state after construction.

Refactor main function: extract and modularize setup logic for clarit…

87bbe1c

…y and maintainability

Move AgentSessionMixin into agent_sdk package

4076abd

The mixin is pure agent-session plumbing (session creation, lifecycle, explorer factory) and has no approach-specific logic, so it belongs next to session_manager.py, tools.py, and the sandbox managers rather than in approaches/.

Update test setup to use test tasks for boil environment and refine t…

8ff80a4

…est description

Refactor combined model in GT simulator

54002dd

Fix expected-atoms check to support DerivedPredicates

cb405d9

Use utils.abstract to evaluate expected atoms in low-level search so that DerivedPredicates — which require a Set[GroundAtom] rather than a State — are handled correctly alongside regular predicates.

yichao-liang added 29 commits May 17, 2026 16:47

Apply autoformat fixes across pybullet helpers and agent SDK files

8d9b72e

Drop unused INSPECTION_TOOL_NAMES import

1138d49

The import was added in 020697d but never referenced; pylint flagged it as unused-import (W0611), which fails the lint CI check.

Bump interaction-request step cap and run 5 seeds from 0

352aff2

Raises max_num_steps_interaction_request 300→500 to give longer continuous rollouts headroom under forward validation, and switches the sweep to seeds 0–4 to surface regressions across more starts.

Apply autoformat and split long line in forward validator

3fd741f

yapf/isort reflow on bilevel_sketch.py + test_agent_bilevel_approach.py, plus splitting the subgoal-divergence log site to keep the option-string formatter under the 80-col line limit pylint enforces.

Add 'paper/' directory to .gitignore

8c30703

Add agent_bilevel_max_refine_retries setting

3803aa4

Budget for reseeding continuous refinement on the same plan sketch before paying for a fresh LLM skeleton query; consumed by _solve.

Merge branch 'master' into sim-learning

364c9ce

Comment out unused code in the main simulation function for clarity

b3dc952

Remove unused simulate_step helper from code_sim_learning utils

7e44a42

Merge branch 'master' into sim-learning

f99069c

Deep-copy latent state in combined_simulate to prevent mutation of ca…

a649de5

…ller's state

Remove duplicate iter_feature_residuals from master-merge resolution

d412245

The master merge kept both sides of the conflict in code_sim_learning/utils.py, leaving two byte-identical definitions of iter_feature_residuals and tripping mypy's no-redef check. Drop the second copy.

yichao-liang merged commit daf7b36 into master May 31, 2026
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent-sim predicate invention, partial-observability simulators, and reconstruction-guard refactor#35

Agent-sim predicate invention, partial-observability simulators, and reconstruction-guard refactor#35
yichao-liang merged 172 commits into
masterfrom
sim-learning

yichao-liang commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yichao-liang commented May 31, 2026

Summary

Highlights

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant