v0.1.5: stop leaking "Plan is VALID" on syntax-only calls#1
Merged
Conversation
format_plain_text emitted "All goals satisfied. Plan is VALID." whenever is_valid=True, including the validate_syntax(domain[, problem]) path where no plan was executed and no goal was checked. Gate the Goal Check block on "execution" in result.phases and, for the syntax-only success case, emit a single accurate line instead. Downstream consumers (pddl-copilot pddl-validator 2.2.0) can drop their string-strip workaround once pinned to >=0.1.5.
Extends test_plain_text_syntax_error to lock in the contract that a SYNTAX_ERROR result never emits "Plan is VALID/INVALID" or "Goal Check" — no plan was executed, no goals were checked. Cheap regression guard against anyone later removing the gate added in 0.1.5.
- README, CLAUDE.md, and four .claude/skills/*.md files referenced VALIDATOR_SPEC.md (deleted in 83618e6); point them at CLAUDE.md, pyval/models.py, and the inline templates in pyval/diagnostics.py / pyval/report_formatter.py instead. - CLAUDE.md: update "Output Modes" to note that syntax-only validation omits the "Plan is VALID/INVALID" verdict (v0.1.5 behavior), fix stale tests/domains/ layout claim. - CHANGELOG: add "Removed" entry under v0.1.5 for VALIDATOR_SPEC.md so the deletion is recorded alongside the syntax-only fix.
omereliy
added a commit
to SPL-BGU/pddl-copilot
that referenced
this pull request
May 19, 2026
… fix pyvalidator 0.1.5 (SPL-BGU/pyvalidator#1) lands the upstream gating fix: the report formatter's Goal Check block now only runs when "execution" is in result.phases, so the misleading "Plan is VALID" line no longer appears on validate_syntax() success. Plugin-side changes: - requirements.txt: pddl-pyvalidator>=0.1.4 → >=0.1.5 - validator_server.py: delete _PLAN_VERDICT_LINE_MARKERS, _strip_plan_verdict_lines, and the conditional strip call. ~12 lines removed; the upstream fix carries the load. - Version bump 2.2.0 → 2.2.1 (consolidated release; 2.2.0 is never cut — no transient release that ships the in-tree workaround). - CHANGELOG + docs/breaking-changes.md updated to credit pyvalidator 0.1.5 as the fix mechanism. The 5 regression tests added in the prior commit assert observable behavior ("Plan is VALID" absent from domain-only/domain+problem reports; retained for empty-plan-init-satisfies-goal). They pass against either implementation, so the test suite validates the transition without coupling to the mechanism — 21/21 green with the workaround removed.
7 tasks
omereliy
added a commit
to SPL-BGU/pddl-copilot
that referenced
this pull request
May 19, 2026
…olver env errors, schema cleanups (#50) * tool interface audit fixes: gate validator plan-VALID leak, surface solver env errors, schema cleanups Consolidated PR addressing the May-2026 tool interface audit. Headline bug fixes plus a batch of additive widenings and response-shape cleanups across all three MCP plugins. See `docs/breaking-changes.md` for the full migration log. Critical bugs: - pddl-validator: `validate_pddl_syntax` no longer leaks pyvalidator's "Plan is VALID/INVALID" line in `report` when no plan was executed. Empty plan with init satisfying the goal correctly retains the line via the full plan-execution path. - pddl-solver: `classic_planner` / `numeric_planner` now surface INTERNAL_ERROR / UNSUPPORTED_PROBLEM / INTERMEDIATE as `{"error": True, "message": ...}`. Java-missing returns a clear message. Only UNSOLVABLE_* remains as no-plan-found. Additive widenings (non-breaking): - list[str] plan accepted on validate_pddl_syntax, get_state_transition, get_trajectory - normalize_pddl accepts file paths; adds `errors` field separate from `warnings` (kept for compat) - parser parameter is Literal["pddl-plus-parser", "unified-planning"] Response-shape fixes: - inspect_domain.types drops implicit "object": null root - :requirements preserves source order (no more sorted()) - get_applicable_actions sorts lexicographically before truncation - closed-world semantics + would_add/would_delete diagnostic note in docstrings Versions: pddl-solver 2.1.1→2.2.0, pddl-validator 2.1.1→2.2.0, pddl-parser 1.4.0→1.5.0, marketplace 1.2.0→1.3.0. Verified locally (validator 16/16, parser 39/39, solver 11/12 — sole failure is Java-missing on this dev box, which is exactly what the fix now surfaces). Experiments-repo validator agent reports OVERALL: SAFE across solver error shape, report text, list-plan widening, parser changes, and pinned verbose=False. * address PR-50 review: tighten Java detection, add regression tests, fix pddl-plus requirements order Reviewer feedback (PR #50): Issue 1 — Java-runtime detection in solver_server.py was too loose ("java" in log.lower() AND "not found" in log.lower() matched both substrings anywhere). Reduced to the specific macOS stub message "Unable to locate a Java Runtime"; other env failures fall through to the generic "Planner failed with status X" with full log attached. Issue 2 — added inline regression tests covering every audit fix: - pddl-validator/tests/verify.py (+5 tests): * "Plan is VALID" / "Plan is INVALID" absent from report on domain-only and domain+problem calls (observable — passes under either the in- plugin strip or pyvalidator >=0.1.5 upstream fix) * empty plan [] with init satisfying goal → valid=True, "Plan is VALID" is legitimately RETAINED (plan-execution path) * list[str] plan input accepted by both validate_pddl_syntax and get_state_transition - pddl-parser/tests/verify.py (+4 tests): * normalize_pddl accepts file path argument * normalize_pddl populates `errors` (separate from `warnings`) on parse failure * get_applicable_actions returns lex-sorted output * inspect_domain :requirements preserves source declaration order (tests both backends) - pddl-solver/tests/verify.py: restructured numeric_planner test to be Java-aware via subprocess `java -version` check (shutil.which is unreliable on macOS — stub binary returns truthy without a working JRE). When Java is missing, asserts the error-shape regression (was bug #4 in the audit: empty plan + misleading "Planner ran" note). Issue 3 — folded into Issue 2 via the requirements-order test. Test surfaced that pddl-plus-parser internally stores requirements as a set (loses source order). Fix: backend_pddl_plus.inspect_domain now regex-extracts from source file, same approach as backend_up. Test passes for both backends. Issues 4, 5, 6 dropped per triage: - (4) anchored regex on strip: workaround being retired against pyvalidator 0.1.5 in a follow-up PR — hardening doomed code is waste. - (5) one-consumer helper in normalize_pddl: violates .claude/rules/simplification.md "one consumer rule". - (6) semver semantics on Literal change: already documented in docs/breaking-changes.md as breaking-schema-but-minor-bump. Verification: - pddl-validator: 21/21 (was 16/16) - pddl-parser: 43/43 (was 39/39) - pddl-solver: 12/12 (was 11/12 — the new Java-aware path also fixes the pre-existing Java-less-box failure) * retire Plan-VALID strip workaround against pyvalidator 0.1.5 upstream fix pyvalidator 0.1.5 (SPL-BGU/pyvalidator#1) lands the upstream gating fix: the report formatter's Goal Check block now only runs when "execution" is in result.phases, so the misleading "Plan is VALID" line no longer appears on validate_syntax() success. Plugin-side changes: - requirements.txt: pddl-pyvalidator>=0.1.4 → >=0.1.5 - validator_server.py: delete _PLAN_VERDICT_LINE_MARKERS, _strip_plan_verdict_lines, and the conditional strip call. ~12 lines removed; the upstream fix carries the load. - Version bump 2.2.0 → 2.2.1 (consolidated release; 2.2.0 is never cut — no transient release that ships the in-tree workaround). - CHANGELOG + docs/breaking-changes.md updated to credit pyvalidator 0.1.5 as the fix mechanism. The 5 regression tests added in the prior commit assert observable behavior ("Plan is VALID" absent from domain-only/domain+problem reports; retained for empty-plan-init-satisfies-goal). They pass against either implementation, so the test suite validates the transition without coupling to the mechanism — 21/21 green with the workaround removed. * address PR-50 review #2: deliver cross-backend determinism, defensive fixes Reviewer's main finding was that the "sorted before truncation, deterministic across backends" claim wasn't actually delivered: my post-hoc sort() in parser_server.py only ordered the displayed subset; each backend had already truncated using its own iteration order (source order for pddl-plus, UP's internal order for unified-planning), so max_results=50 on a 500-action domain returned different 50s. Issue 1 — make the determinism claim true: - backend_pddl_plus.get_applicable_actions: sort schemas by name; sort each per-parameter object list by name. itertools.product then yields grounded actions in lex order over the output string, so the early-exit at max_results captures the lex-smallest N applicable actions deterministically and identically to the UP backend. - backend_up.get_applicable_actions: symmetric — sort up_problem.actions by name; sort per-parameter object lists by name. - The server-side sorted() remains as defense-in-depth (cheap, ensures the displayed list is monotone even if a backend changes). - New regression test (parser verify): assert pddl-plus and UP return the same applicable_actions list; assert the truncated subset is the lex-smallest prefix and matches across backends. Issue 4 — strip ;-to-EOL comments inside (:requirements ...) before splitting in both backends. A `(:requirements :strips ; note\n :typing)` no longer parses as [':strips', ';', 'note', ':typing']. Issue 5 — wrap os.path.isfile in normalize_pddl with try/except OSError. A long non-PDDL string that slipped past the inline-content sniff would hit ENAMETOOLONG (PATH_MAX=1024 on macOS) and propagate an exception instead of returning the structured error shape the tool promises. Issue 3 — expand the Java-detection comment in solver_server.py to explain why the match is intentionally narrow (macOS stub binary) and that Linux/Windows JVM-missing errors fall through to the generic "Planner failed with status X" with the log attached. No behavior change. Verification: - pddl-parser: 44/44 (was 43/43 — new cross-backend determinism test) - pddl-validator: 21/21 (unchanged) - pddl-solver: 12/12 (unchanged) Issue 2 (PR description 2.2.0 → 2.2.1 typo) addressed separately via gh pr edit. * tool schema audit: fix Optional types and Literal enum Several MCP tool params declared as bare types with `= None` defaults generated internally-inconsistent JSON schemas (e.g., `type: "string"` with `default: null`), which strict validators on the Ollama tool-calling path reject and which LLMs are likely to skip. Verified by dumping inputSchema directly from FastMCP. - save_plan (solver): 5 params → Optional[str] / Optional[float] - validate_pddl_syntax.problem (validator) → Optional[str] - normalize_pddl.output_format (parser) → Literal["pddl","json"] so the schema emits an enum instead of a bare string - ollama_mcp_bridge: drop stale Docker reference in no-tools error All re-dumped schemas now show anyOf [type, null] for optionals and an enum for output_format. 77 plugin tests pass (12 solver + 21 validator + 44 parser).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
format_plain_textemitted"All goals satisfied. Plan is VALID."wheneveris_valid=True, includingvalidate_syntax(domain[, problem])calls where no plan was executed and no goal was checked.pyval/report_formatter.pyran unconditionally onis_valid=True."execution" in result.phases. For the syntax-only success case, emit a single accurate line:"All syntax and consistency checks passed. No plan was executed.".0.1.5so downstream consumers can pin and drop their workaround (pddl-copilot'spddl-validator2.2.0 currently string-strips the leaked verdict; that patch becomes dead code once pinned to>=0.1.5).Why upstream and not downstream
The bug affected every consumer of
result.report()— library API, CLI, the MCP plugin — not just pddl-copilot. A downstream substring-strip is brittle (silently breaks on rewording) and only helps one consumer.Test plan
python3 -m pytest tests/ -v— 58 passed, 8 skipped (56 prior + 2 new)python3 -m pytest tests/ -v -k numeric— 13 passed, 2 skippedpython3 -m pyval.cli <domain>andpython3 -m pyval.cli <domain> <problem>on a clean blocksworld pair both print the new success line (previously printed the misleading verdict)python3 -c "import pyval; print(pyval.__version__)"→0.1.5New tests
test_plain_text_syntax_only_success_domain_and_problemtest_plain_text_syntax_only_success_domain_onlyBoth assert: no
"Plan is VALID", no"Plan is INVALID", no"Goal Check", and the new success line present.Follow-up (separate PR, pddl-copilot repo)
plugins/pddl-validator/requirements.txt→pddl-pyvalidator>=0.1.5plugins/pddl-validator/server/validator_server.py— delete_PLAN_VERDICT_LINE_MARKERS,_strip_plan_verdict_lines, and theplan_executedstrip branch