v0.1.5: stop leaking "Plan is VALID" on syntax-only calls by omereliy · Pull Request #1 · SPL-BGU/pyvalidator

omereliy · 2026-05-19T07:31:22Z

Summary

format_plain_text emitted "All goals satisfied. Plan is VALID." whenever is_valid=True, including validate_syntax(domain[, problem]) calls where no plan was executed and no goal was checked.
Root cause: the "Goal Check" block in pyval/report_formatter.py ran unconditionally on is_valid=True.
Fix: gate it on "execution" in result.phases. For the syntax-only success case, emit a single accurate line: "All syntax and consistency checks passed. No plan was executed.".
Version bump to 0.1.5 so downstream consumers can pin and drop their workaround (pddl-copilot's pddl-validator 2.2.0 currently string-strips the leaked verdict; that patch becomes dead code once pinned to >=0.1.5).

Why upstream and not downstream

The bug affected every consumer of result.report() — library API, CLI, the MCP plugin — not just pddl-copilot. A downstream substring-strip is brittle (silently breaks on rewording) and only helps one consumer.

Test plan

python3 -m pytest tests/ -v — 58 passed, 8 skipped (56 prior + 2 new)
python3 -m pytest tests/ -v -k numeric — 13 passed, 2 skipped
CLI smoke: python3 -m pyval.cli <domain> and python3 -m pyval.cli <domain> <problem> on a clean blocksworld pair both print the new success line (previously printed the misleading verdict)
python3 -c "import pyval; print(pyval.__version__)" → 0.1.5

New tests

test_plain_text_syntax_only_success_domain_and_problem
test_plain_text_syntax_only_success_domain_only

Both assert: no "Plan is VALID", no "Plan is INVALID", no "Goal Check", and the new success line present.

Follow-up (separate PR, pddl-copilot repo)

plugins/pddl-validator/requirements.txt → pddl-pyvalidator>=0.1.5
plugins/pddl-validator/server/validator_server.py — delete _PLAN_VERDICT_LINE_MARKERS, _strip_plan_verdict_lines, and the plan_executed strip branch
Plugin CHANGELOG → 2.2.1

format_plain_text emitted "All goals satisfied. Plan is VALID." whenever is_valid=True, including the validate_syntax(domain[, problem]) path where no plan was executed and no goal was checked. Gate the Goal Check block on "execution" in result.phases and, for the syntax-only success case, emit a single accurate line instead. Downstream consumers (pddl-copilot pddl-validator 2.2.0) can drop their string-strip workaround once pinned to >=0.1.5.

Extends test_plain_text_syntax_error to lock in the contract that a SYNTAX_ERROR result never emits "Plan is VALID/INVALID" or "Goal Check" — no plan was executed, no goals were checked. Cheap regression guard against anyone later removing the gate added in 0.1.5.

- README, CLAUDE.md, and four .claude/skills/*.md files referenced VALIDATOR_SPEC.md (deleted in 83618e6); point them at CLAUDE.md, pyval/models.py, and the inline templates in pyval/diagnostics.py / pyval/report_formatter.py instead. - CLAUDE.md: update "Output Modes" to note that syntax-only validation omits the "Plan is VALID/INVALID" verdict (v0.1.5 behavior), fix stale tests/domains/ layout claim. - CHANGELOG: add "Removed" entry under v0.1.5 for VALIDATOR_SPEC.md so the deletion is recorded alongside the syntax-only fix.

… fix pyvalidator 0.1.5 (SPL-BGU/pyvalidator#1) lands the upstream gating fix: the report formatter's Goal Check block now only runs when "execution" is in result.phases, so the misleading "Plan is VALID" line no longer appears on validate_syntax() success. Plugin-side changes: - requirements.txt: pddl-pyvalidator>=0.1.4 → >=0.1.5 - validator_server.py: delete _PLAN_VERDICT_LINE_MARKERS, _strip_plan_verdict_lines, and the conditional strip call. ~12 lines removed; the upstream fix carries the load. - Version bump 2.2.0 → 2.2.1 (consolidated release; 2.2.0 is never cut — no transient release that ships the in-tree workaround). - CHANGELOG + docs/breaking-changes.md updated to credit pyvalidator 0.1.5 as the fix mechanism. The 5 regression tests added in the prior commit assert observable behavior ("Plan is VALID" absent from domain-only/domain+problem reports; retained for empty-plan-init-satisfies-goal). They pass against either implementation, so the test suite validates the transition without coupling to the mechanism — 21/21 green with the workaround removed.

…olver env errors, schema cleanups (#50) * tool interface audit fixes: gate validator plan-VALID leak, surface solver env errors, schema cleanups Consolidated PR addressing the May-2026 tool interface audit. Headline bug fixes plus a batch of additive widenings and response-shape cleanups across all three MCP plugins. See `docs/breaking-changes.md` for the full migration log. Critical bugs: - pddl-validator: `validate_pddl_syntax` no longer leaks pyvalidator's "Plan is VALID/INVALID" line in `report` when no plan was executed. Empty plan with init satisfying the goal correctly retains the line via the full plan-execution path. - pddl-solver: `classic_planner` / `numeric_planner` now surface INTERNAL_ERROR / UNSUPPORTED_PROBLEM / INTERMEDIATE as `{"error": True, "message": ...}`. Java-missing returns a clear message. Only UNSOLVABLE_* remains as no-plan-found. Additive widenings (non-breaking): - list[str] plan accepted on validate_pddl_syntax, get_state_transition, get_trajectory - normalize_pddl accepts file paths; adds `errors` field separate from `warnings` (kept for compat) - parser parameter is Literal["pddl-plus-parser", "unified-planning"] Response-shape fixes: - inspect_domain.types drops implicit "object": null root - :requirements preserves source order (no more sorted()) - get_applicable_actions sorts lexicographically before truncation - closed-world semantics + would_add/would_delete diagnostic note in docstrings Versions: pddl-solver 2.1.1→2.2.0, pddl-validator 2.1.1→2.2.0, pddl-parser 1.4.0→1.5.0, marketplace 1.2.0→1.3.0. Verified locally (validator 16/16, parser 39/39, solver 11/12 — sole failure is Java-missing on this dev box, which is exactly what the fix now surfaces). Experiments-repo validator agent reports OVERALL: SAFE across solver error shape, report text, list-plan widening, parser changes, and pinned verbose=False. * address PR-50 review: tighten Java detection, add regression tests, fix pddl-plus requirements order Reviewer feedback (PR #50): Issue 1 — Java-runtime detection in solver_server.py was too loose ("java" in log.lower() AND "not found" in log.lower() matched both substrings anywhere). Reduced to the specific macOS stub message "Unable to locate a Java Runtime"; other env failures fall through to the generic "Planner failed with status X" with full log attached. Issue 2 — added inline regression tests covering every audit fix: - pddl-validator/tests/verify.py (+5 tests): * "Plan is VALID" / "Plan is INVALID" absent from report on domain-only and domain+problem calls (observable — passes under either the in- plugin strip or pyvalidator >=0.1.5 upstream fix) * empty plan [] with init satisfying goal → valid=True, "Plan is VALID" is legitimately RETAINED (plan-execution path) * list[str] plan input accepted by both validate_pddl_syntax and get_state_transition - pddl-parser/tests/verify.py (+4 tests): * normalize_pddl accepts file path argument * normalize_pddl populates `errors` (separate from `warnings`) on parse failure * get_applicable_actions returns lex-sorted output * inspect_domain :requirements preserves source declaration order (tests both backends) - pddl-solver/tests/verify.py: restructured numeric_planner test to be Java-aware via subprocess `java -version` check (shutil.which is unreliable on macOS — stub binary returns truthy without a working JRE). When Java is missing, asserts the error-shape regression (was bug #4 in the audit: empty plan + misleading "Planner ran" note). Issue 3 — folded into Issue 2 via the requirements-order test. Test surfaced that pddl-plus-parser internally stores requirements as a set (loses source order). Fix: backend_pddl_plus.inspect_domain now regex-extracts from source file, same approach as backend_up. Test passes for both backends. Issues 4, 5, 6 dropped per triage: - (4) anchored regex on strip: workaround being retired against pyvalidator 0.1.5 in a follow-up PR — hardening doomed code is waste. - (5) one-consumer helper in normalize_pddl: violates .claude/rules/simplification.md "one consumer rule". - (6) semver semantics on Literal change: already documented in docs/breaking-changes.md as breaking-schema-but-minor-bump. Verification: - pddl-validator: 21/21 (was 16/16) - pddl-parser: 43/43 (was 39/39) - pddl-solver: 12/12 (was 11/12 — the new Java-aware path also fixes the pre-existing Java-less-box failure) * retire Plan-VALID strip workaround against pyvalidator 0.1.5 upstream fix pyvalidator 0.1.5 (SPL-BGU/pyvalidator#1) lands the upstream gating fix: the report formatter's Goal Check block now only runs when "execution" is in result.phases, so the misleading "Plan is VALID" line no longer appears on validate_syntax() success. Plugin-side changes: - requirements.txt: pddl-pyvalidator>=0.1.4 → >=0.1.5 - validator_server.py: delete _PLAN_VERDICT_LINE_MARKERS, _strip_plan_verdict_lines, and the conditional strip call. ~12 lines removed; the upstream fix carries the load. - Version bump 2.2.0 → 2.2.1 (consolidated release; 2.2.0 is never cut — no transient release that ships the in-tree workaround). - CHANGELOG + docs/breaking-changes.md updated to credit pyvalidator 0.1.5 as the fix mechanism. The 5 regression tests added in the prior commit assert observable behavior ("Plan is VALID" absent from domain-only/domain+problem reports; retained for empty-plan-init-satisfies-goal). They pass against either implementation, so the test suite validates the transition without coupling to the mechanism — 21/21 green with the workaround removed. * address PR-50 review #2: deliver cross-backend determinism, defensive fixes Reviewer's main finding was that the "sorted before truncation, deterministic across backends" claim wasn't actually delivered: my post-hoc sort() in parser_server.py only ordered the displayed subset; each backend had already truncated using its own iteration order (source order for pddl-plus, UP's internal order for unified-planning), so max_results=50 on a 500-action domain returned different 50s. Issue 1 — make the determinism claim true: - backend_pddl_plus.get_applicable_actions: sort schemas by name; sort each per-parameter object list by name. itertools.product then yields grounded actions in lex order over the output string, so the early-exit at max_results captures the lex-smallest N applicable actions deterministically and identically to the UP backend. - backend_up.get_applicable_actions: symmetric — sort up_problem.actions by name; sort per-parameter object lists by name. - The server-side sorted() remains as defense-in-depth (cheap, ensures the displayed list is monotone even if a backend changes). - New regression test (parser verify): assert pddl-plus and UP return the same applicable_actions list; assert the truncated subset is the lex-smallest prefix and matches across backends. Issue 4 — strip ;-to-EOL comments inside (:requirements ...) before splitting in both backends. A `(:requirements :strips ; note\n :typing)` no longer parses as [':strips', ';', 'note', ':typing']. Issue 5 — wrap os.path.isfile in normalize_pddl with try/except OSError. A long non-PDDL string that slipped past the inline-content sniff would hit ENAMETOOLONG (PATH_MAX=1024 on macOS) and propagate an exception instead of returning the structured error shape the tool promises. Issue 3 — expand the Java-detection comment in solver_server.py to explain why the match is intentionally narrow (macOS stub binary) and that Linux/Windows JVM-missing errors fall through to the generic "Planner failed with status X" with the log attached. No behavior change. Verification: - pddl-parser: 44/44 (was 43/43 — new cross-backend determinism test) - pddl-validator: 21/21 (unchanged) - pddl-solver: 12/12 (unchanged) Issue 2 (PR description 2.2.0 → 2.2.1 typo) addressed separately via gh pr edit. * tool schema audit: fix Optional types and Literal enum Several MCP tool params declared as bare types with `= None` defaults generated internally-inconsistent JSON schemas (e.g., `type: "string"` with `default: null`), which strict validators on the Ollama tool-calling path reject and which LLMs are likely to skip. Verified by dumping inputSchema directly from FastMCP. - save_plan (solver): 5 params → Optional[str] / Optional[float] - validate_pddl_syntax.problem (validator) → Optional[str] - normalize_pddl.output_format (parser) → Literal["pddl","json"] so the schema emits an enum instead of a bare string - ollama_mcp_bridge: drop stale Docker reference in no-tools error All re-dumped schemas now show anyOf [type, null] for optionals and an enum for output_format. 77 plugin tests pass (12 solver + 21 validator + 44 parser).

omereliy added 4 commits May 19, 2026 10:31

removed redundent doc

83618e6

omereliy merged commit f362607 into main May 19, 2026
3 checks passed

omereliy deleted the fix/syntax-only-plan-verdict-leak branch May 19, 2026 10:33

omereliy mentioned this pull request May 19, 2026

tool interface audit fixes: gate validator plan-VALID leak, surface solver env errors, schema cleanups SPL-BGU/pddl-copilot#50

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.5: stop leaking "Plan is VALID" on syntax-only calls#1

v0.1.5: stop leaking "Plan is VALID" on syntax-only calls#1
omereliy merged 4 commits into
mainfrom
fix/syntax-only-plan-verdict-leak

omereliy commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

omereliy commented May 19, 2026

Summary

Why upstream and not downstream

Test plan

New tests

Follow-up (separate PR, pddl-copilot repo)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant