Skip to content

v0.1.5: stop leaking "Plan is VALID" on syntax-only calls#1

Merged
omereliy merged 4 commits into
mainfrom
fix/syntax-only-plan-verdict-leak
May 19, 2026
Merged

v0.1.5: stop leaking "Plan is VALID" on syntax-only calls#1
omereliy merged 4 commits into
mainfrom
fix/syntax-only-plan-verdict-leak

Conversation

@omereliy
Copy link
Copy Markdown
Collaborator

Summary

  • format_plain_text emitted "All goals satisfied. Plan is VALID." whenever is_valid=True, including validate_syntax(domain[, problem]) calls where no plan was executed and no goal was checked.
  • Root cause: the "Goal Check" block in pyval/report_formatter.py ran unconditionally on is_valid=True.
  • Fix: gate it on "execution" in result.phases. For the syntax-only success case, emit a single accurate line: "All syntax and consistency checks passed. No plan was executed.".
  • Version bump to 0.1.5 so downstream consumers can pin and drop their workaround (pddl-copilot's pddl-validator 2.2.0 currently string-strips the leaked verdict; that patch becomes dead code once pinned to >=0.1.5).

Why upstream and not downstream

The bug affected every consumer of result.report() — library API, CLI, the MCP plugin — not just pddl-copilot. A downstream substring-strip is brittle (silently breaks on rewording) and only helps one consumer.

Test plan

  • python3 -m pytest tests/ -v — 58 passed, 8 skipped (56 prior + 2 new)
  • python3 -m pytest tests/ -v -k numeric — 13 passed, 2 skipped
  • CLI smoke: python3 -m pyval.cli <domain> and python3 -m pyval.cli <domain> <problem> on a clean blocksworld pair both print the new success line (previously printed the misleading verdict)
  • python3 -c "import pyval; print(pyval.__version__)"0.1.5

New tests

  • test_plain_text_syntax_only_success_domain_and_problem
  • test_plain_text_syntax_only_success_domain_only

Both assert: no "Plan is VALID", no "Plan is INVALID", no "Goal Check", and the new success line present.

Follow-up (separate PR, pddl-copilot repo)

  1. plugins/pddl-validator/requirements.txtpddl-pyvalidator>=0.1.5
  2. plugins/pddl-validator/server/validator_server.py — delete _PLAN_VERDICT_LINE_MARKERS, _strip_plan_verdict_lines, and the plan_executed strip branch
  3. Plugin CHANGELOG → 2.2.1

omereliy added 4 commits May 19, 2026 10:31
format_plain_text emitted "All goals satisfied. Plan is VALID." whenever
is_valid=True, including the validate_syntax(domain[, problem]) path
where no plan was executed and no goal was checked. Gate the Goal Check
block on "execution" in result.phases and, for the syntax-only success
case, emit a single accurate line instead. Downstream consumers
(pddl-copilot pddl-validator 2.2.0) can drop their string-strip
workaround once pinned to >=0.1.5.
Extends test_plain_text_syntax_error to lock in the contract that a
SYNTAX_ERROR result never emits "Plan is VALID/INVALID" or "Goal Check"
— no plan was executed, no goals were checked. Cheap regression guard
against anyone later removing the gate added in 0.1.5.
- README, CLAUDE.md, and four .claude/skills/*.md files referenced
  VALIDATOR_SPEC.md (deleted in 83618e6); point them at CLAUDE.md,
  pyval/models.py, and the inline templates in pyval/diagnostics.py
  / pyval/report_formatter.py instead.
- CLAUDE.md: update "Output Modes" to note that syntax-only validation
  omits the "Plan is VALID/INVALID" verdict (v0.1.5 behavior), fix
  stale tests/domains/ layout claim.
- CHANGELOG: add "Removed" entry under v0.1.5 for VALIDATOR_SPEC.md
  so the deletion is recorded alongside the syntax-only fix.
@omereliy omereliy merged commit f362607 into main May 19, 2026
3 checks passed
@omereliy omereliy deleted the fix/syntax-only-plan-verdict-leak branch May 19, 2026 10:33
omereliy added a commit to SPL-BGU/pddl-copilot that referenced this pull request May 19, 2026
… fix

pyvalidator 0.1.5 (SPL-BGU/pyvalidator#1) lands the upstream gating fix:
the report formatter's Goal Check block now only runs when "execution"
is in result.phases, so the misleading "Plan is VALID" line no longer
appears on validate_syntax() success.

Plugin-side changes:
- requirements.txt: pddl-pyvalidator>=0.1.4 → >=0.1.5
- validator_server.py: delete _PLAN_VERDICT_LINE_MARKERS,
  _strip_plan_verdict_lines, and the conditional strip call. ~12 lines
  removed; the upstream fix carries the load.
- Version bump 2.2.0 → 2.2.1 (consolidated release; 2.2.0 is never cut —
  no transient release that ships the in-tree workaround).
- CHANGELOG + docs/breaking-changes.md updated to credit pyvalidator
  0.1.5 as the fix mechanism.

The 5 regression tests added in the prior commit assert observable
behavior ("Plan is VALID" absent from domain-only/domain+problem
reports; retained for empty-plan-init-satisfies-goal). They pass
against either implementation, so the test suite validates the
transition without coupling to the mechanism — 21/21 green with the
workaround removed.
omereliy added a commit to SPL-BGU/pddl-copilot that referenced this pull request May 19, 2026
…olver env errors, schema cleanups (#50)

* tool interface audit fixes: gate validator plan-VALID leak, surface solver env errors, schema cleanups

Consolidated PR addressing the May-2026 tool interface audit. Headline bug
fixes plus a batch of additive widenings and response-shape cleanups across
all three MCP plugins. See `docs/breaking-changes.md` for the full migration
log.

Critical bugs:
- pddl-validator: `validate_pddl_syntax` no longer leaks pyvalidator's
  "Plan is VALID/INVALID" line in `report` when no plan was executed.
  Empty plan with init satisfying the goal correctly retains the line via
  the full plan-execution path.
- pddl-solver: `classic_planner` / `numeric_planner` now surface
  INTERNAL_ERROR / UNSUPPORTED_PROBLEM / INTERMEDIATE as `{"error": True,
  "message": ...}`. Java-missing returns a clear message. Only UNSOLVABLE_*
  remains as no-plan-found.

Additive widenings (non-breaking):
- list[str] plan accepted on validate_pddl_syntax, get_state_transition,
  get_trajectory
- normalize_pddl accepts file paths; adds `errors` field separate from
  `warnings` (kept for compat)
- parser parameter is Literal["pddl-plus-parser", "unified-planning"]

Response-shape fixes:
- inspect_domain.types drops implicit "object": null root
- :requirements preserves source order (no more sorted())
- get_applicable_actions sorts lexicographically before truncation
- closed-world semantics + would_add/would_delete diagnostic note in
  docstrings

Versions: pddl-solver 2.1.1→2.2.0, pddl-validator 2.1.1→2.2.0,
pddl-parser 1.4.0→1.5.0, marketplace 1.2.0→1.3.0.

Verified locally (validator 16/16, parser 39/39, solver 11/12 — sole
failure is Java-missing on this dev box, which is exactly what the
fix now surfaces). Experiments-repo validator agent reports OVERALL: SAFE
across solver error shape, report text, list-plan widening, parser changes,
and pinned verbose=False.

* address PR-50 review: tighten Java detection, add regression tests, fix pddl-plus requirements order

Reviewer feedback (PR #50):

Issue 1 — Java-runtime detection in solver_server.py was too loose
("java" in log.lower() AND "not found" in log.lower() matched both
substrings anywhere). Reduced to the specific macOS stub message
"Unable to locate a Java Runtime"; other env failures fall through
to the generic "Planner failed with status X" with full log attached.

Issue 2 — added inline regression tests covering every audit fix:
- pddl-validator/tests/verify.py (+5 tests):
  * "Plan is VALID" / "Plan is INVALID" absent from report on domain-only
    and domain+problem calls (observable — passes under either the in-
    plugin strip or pyvalidator >=0.1.5 upstream fix)
  * empty plan [] with init satisfying goal → valid=True, "Plan is VALID"
    is legitimately RETAINED (plan-execution path)
  * list[str] plan input accepted by both validate_pddl_syntax and
    get_state_transition
- pddl-parser/tests/verify.py (+4 tests):
  * normalize_pddl accepts file path argument
  * normalize_pddl populates `errors` (separate from `warnings`) on parse
    failure
  * get_applicable_actions returns lex-sorted output
  * inspect_domain :requirements preserves source declaration order
    (tests both backends)
- pddl-solver/tests/verify.py: restructured numeric_planner test to be
  Java-aware via subprocess `java -version` check (shutil.which is
  unreliable on macOS — stub binary returns truthy without a working JRE).
  When Java is missing, asserts the error-shape regression (was bug #4
  in the audit: empty plan + misleading "Planner ran" note).

Issue 3 — folded into Issue 2 via the requirements-order test.
Test surfaced that pddl-plus-parser internally stores requirements as a
set (loses source order). Fix: backend_pddl_plus.inspect_domain now
regex-extracts from source file, same approach as backend_up. Test
passes for both backends.

Issues 4, 5, 6 dropped per triage:
- (4) anchored regex on strip: workaround being retired against
  pyvalidator 0.1.5 in a follow-up PR — hardening doomed code is waste.
- (5) one-consumer helper in normalize_pddl: violates
  .claude/rules/simplification.md "one consumer rule".
- (6) semver semantics on Literal change: already documented in
  docs/breaking-changes.md as breaking-schema-but-minor-bump.

Verification:
- pddl-validator: 21/21 (was 16/16)
- pddl-parser:    43/43 (was 39/39)
- pddl-solver:    12/12 (was 11/12 — the new Java-aware path also fixes
  the pre-existing Java-less-box failure)

* retire Plan-VALID strip workaround against pyvalidator 0.1.5 upstream fix

pyvalidator 0.1.5 (SPL-BGU/pyvalidator#1) lands the upstream gating fix:
the report formatter's Goal Check block now only runs when "execution"
is in result.phases, so the misleading "Plan is VALID" line no longer
appears on validate_syntax() success.

Plugin-side changes:
- requirements.txt: pddl-pyvalidator>=0.1.4 → >=0.1.5
- validator_server.py: delete _PLAN_VERDICT_LINE_MARKERS,
  _strip_plan_verdict_lines, and the conditional strip call. ~12 lines
  removed; the upstream fix carries the load.
- Version bump 2.2.0 → 2.2.1 (consolidated release; 2.2.0 is never cut —
  no transient release that ships the in-tree workaround).
- CHANGELOG + docs/breaking-changes.md updated to credit pyvalidator
  0.1.5 as the fix mechanism.

The 5 regression tests added in the prior commit assert observable
behavior ("Plan is VALID" absent from domain-only/domain+problem
reports; retained for empty-plan-init-satisfies-goal). They pass
against either implementation, so the test suite validates the
transition without coupling to the mechanism — 21/21 green with the
workaround removed.

* address PR-50 review #2: deliver cross-backend determinism, defensive fixes

Reviewer's main finding was that the "sorted before truncation,
deterministic across backends" claim wasn't actually delivered: my
post-hoc sort() in parser_server.py only ordered the displayed subset;
each backend had already truncated using its own iteration order (source
order for pddl-plus, UP's internal order for unified-planning), so
max_results=50 on a 500-action domain returned different 50s.

Issue 1 — make the determinism claim true:
- backend_pddl_plus.get_applicable_actions: sort schemas by name; sort
  each per-parameter object list by name. itertools.product then yields
  grounded actions in lex order over the output string, so the
  early-exit at max_results captures the lex-smallest N applicable
  actions deterministically and identically to the UP backend.
- backend_up.get_applicable_actions: symmetric — sort up_problem.actions
  by name; sort per-parameter object lists by name.
- The server-side sorted() remains as defense-in-depth (cheap, ensures
  the displayed list is monotone even if a backend changes).
- New regression test (parser verify): assert pddl-plus and UP return
  the same applicable_actions list; assert the truncated subset is the
  lex-smallest prefix and matches across backends.

Issue 4 — strip ;-to-EOL comments inside (:requirements ...) before
splitting in both backends. A `(:requirements :strips ; note\n :typing)`
no longer parses as [':strips', ';', 'note', ':typing'].

Issue 5 — wrap os.path.isfile in normalize_pddl with try/except OSError.
A long non-PDDL string that slipped past the inline-content sniff would
hit ENAMETOOLONG (PATH_MAX=1024 on macOS) and propagate an exception
instead of returning the structured error shape the tool promises.

Issue 3 — expand the Java-detection comment in solver_server.py to
explain why the match is intentionally narrow (macOS stub binary) and
that Linux/Windows JVM-missing errors fall through to the generic
"Planner failed with status X" with the log attached. No behavior
change.

Verification:
- pddl-parser: 44/44 (was 43/43 — new cross-backend determinism test)
- pddl-validator: 21/21 (unchanged)
- pddl-solver: 12/12 (unchanged)

Issue 2 (PR description 2.2.0 → 2.2.1 typo) addressed separately via
gh pr edit.

* tool schema audit: fix Optional types and Literal enum

Several MCP tool params declared as bare types with `= None` defaults
generated internally-inconsistent JSON schemas (e.g., `type: "string"`
with `default: null`), which strict validators on the Ollama tool-calling
path reject and which LLMs are likely to skip. Verified by dumping
inputSchema directly from FastMCP.

- save_plan (solver): 5 params → Optional[str] / Optional[float]
- validate_pddl_syntax.problem (validator) → Optional[str]
- normalize_pddl.output_format (parser) → Literal["pddl","json"]
  so the schema emits an enum instead of a bare string
- ollama_mcp_bridge: drop stale Docker reference in no-tools error

All re-dumped schemas now show anyOf [type, null] for optionals and an
enum for output_format. 77 plugin tests pass (12 solver + 21 validator
+ 44 parser).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant