fix(output): correct kiosk negative-case rendering and OWA discipline#68
Conversation
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
The Gate 1 Technical evidence details panel rendered "Matched: <system> -> <component> -> <disposition>" unconditionally. After PR #68 surfaced asserted-but-not-in-regulated-union nodes for negative cases, the same "Matched:" prefix labelled the asserted path on the kiosk fixture even though the gate-answer text three lines above explicitly said no triggering capability is asserted. A reader expanding the disclosure saw a contradiction. Conditional prefix: - bindings non-empty -> "Matched:" - disp_outside_union True -> "Asserted (not matched in regulated union):" - both empty -> "No asserted disposition path:" The component / disposition tokens themselves are unchanged (they were already graph-bound from get_asserted_dispositions); only the prefix changes. Locus: 03_TECHNICAL_CORE/scripts/run_pipeline.py Gate 1 details block Verification: kiosk shows "Asserted (not matched in regulated union):"; sentinel/creditscorer/decoy/blanknode show "Matched:" (no regression on positive cases). Closes PR #68 adversarial audit M-A1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Plain Language box in the negative-case Executive Summary used the closed-world verb "found in the provided structured description" (run_pipeline.py:797-801), inconsistent with the OWA-bounded language the Gate 1 / Gate 2 cards adopted in PR #68. The plain-language box appears above the gate cards in document order, so a reader saw the closed-world phrasing first. Replaces "found in the provided structured description" with "is asserted in the loaded graph for this system. Under the Open World Assumption, this is not a closed-world denial of what the system can do; it is the absence of the commitments required to entail Annex III applicability." This matches the gate-card vocabulary and CLAUDE.md "Forbidden prose patterns" rule on closed-world absence claims. Locus: 03_TECHNICAL_CORE/scripts/run_pipeline.py write_html_view's negative-branch plain_english_summary Verification: kiosk HTML's plain-language box now reads OWA-bounded; positive cases unchanged (this branch only fires when no category is applicable). Closes PR #68 adversarial audit M-A5 / M-A8 and QA AGENT B R2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The HTML view's audit-table badges were updated in PR #68 to render the three negative-control fields (latent_risk_flag, obligation_link, regulatory_alignment) as gray neutral on non-applicable runs. summary.json and certificate.txt continued to emit binary "FAIL" for the same fields, producing a same-document contradiction: the kiosk HTML embeds summary.json in a <details> Raw Pipeline Outputs block, so a reader expanding the disclosure saw the visible audit table show gray "NOT LINKED" / "NOT ALIGNED" while the embedded JSON said "FAIL" for the same data points. New _status_label() helper mirrors _status_badge() but emits enum labels rather than HTML classes. Three states: - val is None -> NOT_RUN - val is True -> present_label (PRESENT / DETECTED / PASS) - val is False: - if not_applicable_label provided -> that label (NOT_APPLICABLE) - else -> absent_label (NOT_PRESENT / FAIL) Per output_manifest_v2.yaml enum mapping: - latent_risk_flag (manifest line 124): [present, not_present, not_run] - obligation_check (line 201): [pass, fail, not_run] - regulatory_alignment_check (line 208): [pass, fail, not_run] Schema bumps: - summary.json 1.3 -> 1.4 (adds regulatory_alignment field, applies ternary _status_label to entailment / latent_risk / obligation / regulatory_alignment) - certificate.txt: adds REGULATORY ALIGNMENT line, applies _status_label to LATENT RISK and OBLIGATION LATENT_RISK uses domain labels DETECTED / NOT_DETECTED — NOT_DETECTED is a substantive answer for a non-high-risk system, not a "not applicable" outcome. Obligation and regulatory_alignment use NOT_APPLICABLE on non_applicable_run so the False outcome reads as out-of-scope rather than as a real audit failure. Locus: 03_TECHNICAL_CORE/scripts/run_pipeline.py - new _status_label() helper near _pf() - certificate stdout block (LATENT RISK / OBLIGATION / REGULATORY ALIGNMENT) - certificate file block (cert_lines.append) - summary.json field emission Verification: - Kiosk negative case: summary.json shows latent_risk=NOT_DETECTED, obligation=NOT_APPLICABLE, regulatory_alignment=NOT_APPLICABLE, entailment=NOT_PRESENT. Certificate shows the same. HTML embedded JSON now matches the visible audit table. - Sentinel positive case: summary.json shows latent_risk=DETECTED, obligation=PASS, regulatory_alignment=PASS, entailment=PRESENT. Certificate shows the same. - Decoy / Blanknode (positive cases with audit defects): regulatory_alignment=FAIL (real audit fail, not_applicable_label is None for applicable runs). Tests: test_scenarios.py PASS, test_gate_removal.py PASS, test_kiosk_html_no_false_concretization.py PASS, test_output_provenance.py 1 failure (unchanged baseline). Closes PR #68 adversarial audit H-A2 and QA AGENT A R1 / RR4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… negative cases
Pre-fix evidence.json was an empty list [] for negative-control runs (e.g.
the kiosk fixture). The HTML view embeds evidence.json in a <details> block,
so a reader expanding "Raw Pipeline Outputs" inside the kiosk HTML saw the
visible Determination Path strip naming Kiosk_VeriComp_Module ->
Kiosk_VeriComp_Disposition -> Biometric Verification Capability while the
embedded evidence.json showed [] for the same data points.
Restructures evidence.json from a bare list of regulated-capability bindings
into a schema-versioned object with three lists:
- regulated_capability_bindings (the prior list, unchanged shape)
- asserted_dispositions_outside_regulated_union (populated when bindings
is empty AND get_asserted_dispositions returned rows; empty otherwise)
- asserted_prescribed_processes_outside_regulated_union (populated when
no regulated process is bound AND get_asserted_prescribed_processes
returned rows; empty otherwise)
The asserted-* lists are emitted ONLY when the regulated-union queries
returned zero rows for that respective gate; on positive cases they remain
empty (no duplication of the regulated bindings).
Schema bump: evidence.json 1.0 (implicit list shape) -> 1.4 (object,
paired with summary.json 1.4).
Read order: get_asserted_dispositions / get_asserted_prescribed_processes /
get_system_comment now run before evidence.json so the same data backs both
evidence.json and write_html_view (the prior reads downstream of write_html_view
were redundant).
Locus: 03_TECHNICAL_CORE/scripts/run_pipeline.py evidence.json emission
block; the duplicate downstream read block is removed.
Verification:
- Kiosk negative case: regulated_capability_bindings=[],
asserted_dispositions_outside_regulated_union=[1 row naming
Kiosk_VeriComp_Disposition typed Biometric Verification Capability],
asserted_prescribed_processes_outside_regulated_union=[1 row naming
Kiosk_VerificationProcess_Token typed Biometric Verification Process].
HTML embedded evidence.json now matches the visible Determination Path.
- Sentinel positive case: regulated_capability_bindings=[1 row],
both asserted-* lists empty (correct — only emitted when regulated
union returned zero rows).
- Decoy positive case (via owl:equivalentClass closure): same as Sentinel.
- Blanknode positive case (anonymous disposition): same as Sentinel.
Tests: test_scenarios.py PASS, test_gate_removal.py PASS,
test_kiosk_html_no_false_concretization.py PASS, test_output_provenance.py
1 failure (unchanged baseline).
Closes PR #68 adversarial audit H-A1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
Resolves the Wave 2 adversarial doc-drift finding: docs/versions.md still referenced packet schema 1.3 for the combined certificate-output row when PR #68 Wave 2 already bumped summary.json 1.3 -> 1.4 (added regulatory_alignment; ternary _status_label for entailment / latent_risk / obligation / regulatory_alignment) and evidence.json bare-list -> 1.4 object. Split the row into three independent rows so each artifact's current schema is visible. determination_packet.json stays at 1.3. Closes Wave 2 adversarial H-W2-2 / Wave 3 W3-1.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
summary.json schema 1.4 (PR #68 Wave 2 commit 1c8340c) renamed the entailment field enum from binary {PASS, FAIL} to ternary {PRESENT, NOT_PRESENT, NOT_RUN} per output_manifest_v2.yaml line 124. The MCP arco_run_pipeline tool was still reading entailment == "PASS" and computing classification_layer = FAIL for all 6 positive-case fixtures despite classification being correct. Fix: read entailment in ("PASS", "PRESENT") so the wrapper accepts both schema 1.3 and 1.4 shapes. Verified across all 7 fixtures: positive cases now return classification_layer=PASS; kiosk negative returns classification_layer=PASS via the non_applicable_run path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ing surface PR #68 Wave 3 commit f9d8b87 surfaced the system-level rdfs:comment into the positive-case Provider Obligations panel. FlagTest_Bio and FlagTest_Credit carried test-spec metadata ("Positive test: classified as X AND has Y; both must fire simultaneously") on the :System instance, which then leaked into the deployer-facing HTML. CreditScorer carried fixture-authored classification re-statements ("Entailed as ...; NOT entailed as ...") which duplicated the reasoner output. Fix: rewrite the system-instance rdfs:comment on each of the three :System instances so the prose describes only what the fixture's own TTL asserts (hardware component bearing a typed disposition, intended-use specification prescribing typed processes, use-scenario designating the affected role, provider-submitted exception artifacts). Test-spec scaffolding stays in the ontology-level header comment (top of ARCO_instances_flag_tests.ttl), which is not surfaced into the Provider Obligations panel. Files: - ARCO_instances_flag_tests.ttl:53 (FlagTest_Bio system comment) - ARCO_instances_flag_tests.ttl:117 (FlagTest_Credit system comment) - ARCO_instances_creditscoring.ttl:57 (CreditScorer trim trailing classification re-statement) Verified: test_scenarios.py all 7 fixtures pass; HTML Provider Obligations panel for each fixture now reads as deployer-appropriate prose without test-spec leak or classification duplication. Closes counter-adversarial H-W3-1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…h review dates PR #68 Wave 3 commit ef9a858 created docs/versions.md with summary.json and evidence.json rows updated to schema 1.4, but the packet_schema_version row (row 114) carried a stale reference to pipeline_output_v2.py. That file does not exist — output_manifest_v2.yaml:31 declares it as a future target. The actual constant lives at run_pipeline.py:2504. Fix: update row 114's Where pinned cell to cite run_pipeline.py:2504 as the current location and note the forward-looking future target. Refresh the Last-reviewed header on docs/versions.md and LIMITATIONS.md from 2026-05-11 to 2026-05-14 to reflect Wave 2/3/4 output-shape work landing in this PR. Closes counter-adversarial H-W3-2 and N-2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
… commitments, strict ternary entailment enum Output-layer accuracy pass over run_pipeline.py and mcp/arco_mcp.py. Same-document consistency: certificate.txt, summary.json, evidence.json, determination_packet.json, and the HTML view now report the same epistemic state on every fixture. The kiosk negative case is the load-bearing case (asserted Gate-1/Gate-2 evidence exists but does not match the regulated union); the previous emission silently dropped it from packet and HTML while keeping it in certificate and evidence ledger. Changes: - HTML view (write_html_view, run_pipeline.py): kiosk negative-case Provider Obligations panel surfaces the system rdfs:comment when the fixture authored a regulatory framing note; positives render the 8-card obligations grid alone. Executive summary box uses OWA-bounded prose for the non-entailment kiosk path. Audit-table badges render ternary PRESENT / NOT_PRESENT / NOT_RUN labels matching the manifest enum. Gate 1 and Gate 2 evidence prefixes mirror their match outcome. - Certificate (certificate.txt): EVIDENCE PATH on negative runs carries the asserted-but-outside-regulated-union disposition and process evidence. Adds REGULATORY ALIGNMENT line. Latent-risk and obligation rows use the ternary enum. - summary.json (schema 1.4): adds regulatory_alignment field; applies ternary _status_label to entailment (PRESENT / NOT_PRESENT / NOT_RUN), latent_risk (DETECTED / NOT_DETECTED / NOT_RUN), and obligation / regulatory_alignment (PASS / FAIL / NOT_APPLICABLE / NOT_RUN). Mirrors output_manifest_v2.yaml lines 124, 201, 208. - evidence.json (schema 1.4): bare-list -> object restructure with regulated_capability_bindings, asserted_dispositions_outside_regulated_union, and asserted_prescribed_processes_outside_regulated_union. Negative cases now carry the asserted Gate-1 / Gate-2 evidence the regulated- union bindings did not match. - determination_packet.json (schema 1.4): mirrors the asserted_*_outside_regulated_union fields so packet, certificate, and evidence ledger communicate the same state on negative runs. - MCP (mcp/arco_mcp.py): read entailment field strictly against the manifest ternary enum (== "PRESENT"). No dual-shape bridge. All 7 fixtures resolve to classification_layer=PASS: positives match PRESENT, kiosk negative reaches PASS through the non_applicable_run branch on shacl_pass + no_category. - New SPARQL queries (graph-backed sources for the output values above): - reasoning/select_asserted_component_disposition.sparql - reasoning/select_asserted_prescribed_process.sparql - reasoning/select_system_comment.sparql - LIMITATIONS.md: Last-reviewed header refreshed to 2026-05-14. Cross-fixture backtest (all 7 :System instances): | system | entail | layer | obligation | reg_align | | Sentinel_ID_System | PRESENT | PASS | PASS | PASS | | CreditScorer_001 | PRESENT | PASS | PASS | PASS | | VerificationKiosk_001 | NOT_PRESENT | PASS | NOT_APPLICABLE | NOT_APPLICABLE| | DecoySystem_001 | PRESENT | PASS | PASS | FAIL | | GhostSystem_001 | PRESENT | PASS | PASS | FAIL | | FlagTest_Biometric_w_Derogation | PRESENT | PASS | PASS | FAIL | | FlagTest_Credit_w_FraudProcess | PRESENT | PASS | PASS | FAIL | Classification result identical pre/post on every fixture. Regression tests: - test_scenarios.py PASS - test_gate_removal.py PASS - test_kiosk_html_no_false_concretization.py PASS - test_output_provenance.py baseline (1 failure, unchanged: L4 derogation qualifier; queued, not in PR scope) Closes OPEN_PROBLEMS L4.4-L4.6 (output-side label drift). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New documentation registry pinning every load-bearing version touched by the output-layer accuracy work in this PR: - summary.json schema 1.4 (ternary status enums per output_manifest_v2.yaml lines 124 / 201 / 208) - evidence.json schema 1.4 (object form with regulated_capability_bindings, asserted_dispositions_outside_regulated_union, asserted_prescribed_processes_outside_regulated_union) - determination_packet.json schema 1.4 (mirrors evidence's asserted-data fields for same-document consistency) - certificate template / language spec / per-category profile placeholders for future bumps - pinned dependency versions (Python, rdflib, pyshacl, owlrl, ROBOT, HermiT) and CI runtime pins, with their authoritative source paths Each row names: field name, current version, where pinned, advance trigger. Pin path for determination_packet.json points at run_pipeline.py:2504 (the current location); future relocation target (pipeline_output_v2.py) is noted via output_manifest_v2.yaml:31 cross-reference so the row tracks the actual state, not a wished-for state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
30be60d to
f30fe2b
Compare
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
Fixes
Output-layer accuracy pass over
run_pipeline.pyandmcp/arco_mcp.py. No changes to classification semantics, ontology, SHACL shapes, or audit-ASK queries. Same-document consistency restored:certificate.txt,summary.json,evidence.json,determination_packet.json, and the HTML view now report the same epistemic state on every fixture. The kiosk negative case (asserted Gate-1 / Gate-2 evidence exists but does not match the regulated union) is the load-bearing case the pre-fix emission was dropping silently from packet and HTML while preserving it in certificate and evidence ledger.run_pipeline.pywrite_html_viewrdfs:commentwhen the fixture authored a regulatory framing note; positives render the 8-card obligations grid alone. Executive summary uses OWA-bounded prose on non-entailment runs. Audit-table badges render ternaryPRESENT/NOT_PRESENT/NOT_RUNper manifest. Gate 1 / Gate 2 evidence prefixes mirror their match outcome.run_pipeline.pycertificate blockEVIDENCE PATHon negative runs carries the asserted-but-outside-regulated-union disposition and process evidence. AddsREGULATORY ALIGNMENTline. Latent-risk and obligation rows use the ternary enum.run_pipeline.pysummary.json (1.4)regulatory_alignment; applies_status_labelternary toentailment(PRESENT/NOT_PRESENT/NOT_RUN),latent_risk(DETECTED/NOT_DETECTED/NOT_RUN),obligation/regulatory_alignment(PASS/FAIL/NOT_APPLICABLE/NOT_RUN). Mirrorsoutput_manifest_v2.yamllines 124, 201, 208.run_pipeline.pyevidence.json (1.4)regulated_capability_bindings,asserted_dispositions_outside_regulated_union,asserted_prescribed_processes_outside_regulated_union. Negative cases now carry the asserted Gate-1 / Gate-2 evidence the regulated-union bindings did not match.run_pipeline.pydetermination_packet.json (1.4)asserted_*_outside_regulated_unionfields so packet, certificate, and evidence ledger communicate the same state on negative runs.mcp/arco_mcp.pyentailmentstrictly against the manifest ternary enum (== "PRESENT"). No dual-shape bridge. All 7 fixtures resolve toclassification_layer=PASSvia the strict path: positives matchPRESENT, kiosk negative reachesPASSthrough thenon_applicable_runbranch onshacl_pass + no_category.LIMITATIONS.mddocs/versions.mdreasoning/select_asserted_*.sparql(new)reasoning/select_system_comment.sparql(new)rdfs:commentsurface.Schema / axioms / queries
summary.json1.3 -> 1.4 (addsregulatory_alignment; ternary_status_labelforentailment/latent_risk/obligation/regulatory_alignment).evidence.jsonbare-list -> 1.4 object (regulated-union bindings + asserted-outside-union dispositions and processes).determination_packet.json1.3 -> 1.4 (adds the asserted-outside-union mirror fields).certificate.txt: addsREGULATORY ALIGNMENTline;LATENT RISKandOBLIGATIONuse ternary_status_label; negative-caseTRIGGERING CAPABILITY/EVIDENCE PATHsurface asserted-but-outside-regulated-union data.reasoning/. TTL fixtures unchanged.Tests
test_gate_removal.py: PASStest_scenarios.py: PASS (all 7 fixture/system combinations)test_kiosk_html_no_false_concretization.py: PASStest_output_provenance.py: 1 failure unchanged baseline ("VERIFIED (ENTAILED, Article 6(3) derogation not evaluated)";OPEN_PROBLEMS.mdL4.3 polarity remainder)Per-fixture backtest (7 fixtures x 5 channels)
regulatory_alignment: FAILon Decoy / Ghost / FlagTests reflects pre-existing fixture-distribution perLIMITATIONS.md §2.1(the:AnnexIII_Condition_1a cco:prescribestriple lives only inARCO_instances_sentinel.ttl); not introduced or addressed here.Determination_packet vs evidence.json consistency check (7 fixtures):
asserted_dispositions_outside_regulated_unionandasserted_prescribed_processes_outside_regulated_unionmatch between the two emitters on every fixture (kiosk: 1 disposition + 1 process; six positives: empty + empty).Deferred
OPEN_PROBLEMS.mdL4.3 polarity remainder:"VERIFIED (ENTAILED, Article 6(3) derogation not evaluated)"Python-literal scope qualifier (baselinetest_output_provenance.pyfailure).OPEN_PROBLEMS.mdL3.2: Gate 3 SELECT not constrained to:NaturalPersonRole.evidence.jsonblank-node IRI determinism for the Ghost fixture.:AnnexIII_Condition_1a cco:prescribestriple distribution across non-Sentinel fixtures (LIMITATIONS §2.1 known gap).Revert
git revert f30fe2b 2f23495reverts both commits.