diff --git a/.claude/agents/docs-test-subject.md b/.claude/agents/docs-test-subject.md new file mode 100644 index 0000000000000..e056c3c29f5da --- /dev/null +++ b/.claude/agents/docs-test-subject.md @@ -0,0 +1,22 @@ +--- +name: docs-test-subject +description: Documentation-only test subject for the HTML API doc-improvement experiment. Implements a PHP function using only the two provided documentation files. Tool access is restricted to Read and Grep by design — do not widen it. +tools: Read, Grep +--- + +You are a test subject in a documentation-quality experiment. You implement +a single PHP function using the WordPress HTML API. + +Hard rules: + +- Your ONLY information sources are the documentation files whose absolute + paths are given in your task prompt. Read or search them as much as you + like. +- You must not attempt to access any other file, directory, or resource. +- You never execute code; you reason from documentation alone. +- Do not invent methods, constants, or behaviors that the documentation + does not describe. If the documentation seems incomplete, choose the + best-supported approach it does describe. + +Your final message is your deliverable and must follow the output format +specified in your task prompt exactly. diff --git a/GOAL.md b/GOAL.md new file mode 100644 index 0000000000000..30851ddc71bfc --- /dev/null +++ b/GOAL.md @@ -0,0 +1,136 @@ +# HTML API Documentation Improvement Goal + + +I, the user, approve using the local Codex judge runner for round this round and all subsequent rounds, including sending judge-visible materials to the model provider. + + +Improve the rendered documentation usability for `WP_HTML_Tag_Processor` and +`WP_HTML_Processor`, measured by how well weaker models complete real HTML API +tasks using only the staged rendered documentation. + +The only source documentation hypothesis edits are docblock changes in: + +- `src/wp-includes/html-api/class-wp-html-tag-processor.php` +- `src/wp-includes/html-api/class-wp-html-processor.php` + +Do not change PHP behavior. Infrastructure/tooling changes are allowed only +when needed to keep the experiment valid, and must be tracked separately from +documentation hypothesis edits. + +The primary deliverable is improved source documentation in the two HTML API +docblock files. Tooling, handoff files, audits, manifests, and result hygiene +are support work only; they are not progress on the goal unless they unblock +the next documentation-measurement or documentation-edit step. + +## Authoritative State + +`GOAL.md` defines the stable objective and guardrails. It must not be treated +as the current phase record. + +At the start of every run, determine the active phase and next action from: + +- `doc-experiment/PLAN.md` - experiment contract +- `doc-experiment/PROTOCOL.md` - operational runbook +- `doc-experiment/NEXT-HYPOTHESES.md` - current hypothesis backlog +- `doc-experiment/LOG.md` - latest experiment narrative +- `doc-experiment/results/round-*` - persisted measurements +- `git status` - unresolved local drift + +If these sources conflict, pause scoring and reconcile the experiment state +before continuing. + +## Start-of-Run Checklist + +Before making edits or running a score: + +1. Inspect the worktree and preserve existing user changes. +2. Identify the latest completed trusted round and its score. +3. Identify the current round mode using the modes defined in `PROTOCOL.md` + and the state in `LOG.md`, results, and the worktree. +4. Identify the current model policy, subject tier, judge tier, and whether the + subject tier has a no-edit baseline. +5. Check whether source docs, tooling, corpus, or results changed since the last + trusted score. +6. Determine the next action implied by the plan: calibration, probe, scratch + A/B, normal scoring, checkpoint, source promotion, revert, or stop. +7. Record any mismatch before trusting new scores. +8. Classify the next action as one of: + - `documentation-edit` + - `measurement` + - `result-ingestion` + - `state-reconciliation` + - `external-action-required` + If the next action is `external-action-required`, do not substitute + unrelated tooling work for it. + +## Operating Rules + +### Progress Priority + +- Prefer actions in this order: + 1. Run or ingest the measurement required by the active phase. + 2. Analyze trusted measurements and choose a documentation hypothesis. + 3. Edit source docblocks for one evidence-backed hypothesis. + 4. Stage, score, aggregate, log, and commit that hypothesis. + 5. Fix tooling only when a specific observed or imminent failure would make + the above steps invalid or non-retryable. +- Do not perform opportunistic infrastructure hardening merely because the + required scoring or documentation action is unavailable. +- A tooling change must name the experiment-validity failure it prevents and + must be followed by a re-audit of the actual next documentation/measurement + action. + +- Test subjects may read only the staged markdown docs and task prompt. +- Never expose `reference.php`, `tests.json`, source files, logs, plans, or + hypothesis docs to test-subject agents. +- Use one primary subject tier per scored round. Do not mix model tiers into a + main round score. +- Use the judge/model policy from `PLAN.md` and `PROTOCOL.md`; if runner tooling + disagrees with that policy, fix or explicitly record the mismatch before + comparing scores. +- Held-out tasks are checkpoint/regression sentinels only and must never drive + documentation edits. +- Compare scores only across comparable rounds: same corpus, same round mode, + same primary subject tier, same judge policy, and compatible tooling. +- Scratch rendered-doc variants must stay out of source docblocks until they win + by evidence. +- Promote only general API documentation improvements, not task-shaped answers. +- When trials, judges, or probes repeatedly reveal surprising API behavior, + recurring hallucinated methods, or missing API affordances, record the pattern + in a consistent backlog location for later consideration. Use + `doc-experiment/NEXT-HYPOTHESES.md` for documentation hypotheses, and keep + future API/design observations distinct from immediate docblock edits. +- Keep `@since` tags intact and do not fabricate changelog entries. +- After every source docblock edit, run the docs-only guard, stage docs, run the + appropriate scored flow, aggregate results, update `LOG.md`, and commit one + source hypothesis at a time. +- Commit experiment results separately from source documentation hypotheses + where practical. +- Stop or pause according to `PLAN.md`/`PROTOCOL.md`, especially when signal is + exhausted, failures are generic model noise, or the experiment state is + inconsistent. + +### External Runner Gate + +- If the active next action is to launch trials or judges in an external + Workflow runner and that runner is not available in the current session: + 1. Generate or verify the exact handoff payload once. + 2. Report the command/files needed for the external runner. + 3. Stop work and ask for one of: + - external runner output to ingest, + - explicit authorization to use an alternative runner, + - explicit authorization to bypass the measurement gate. +- Do not continue with additional tooling, corpus, or documentation edits while + waiting for that external action unless the user explicitly asks for them. +- Do not mark the documentation goal as making substantive progress from + handoff preparation alone. + +## Promotion Standard + +A source documentation edit is justified only when local evidence shows a +specific documentation usability failure: missing contract, misleading wording, +poor placement, low discoverability, or excessive rendered-doc noise. + +Evidence may come from scored train rounds, no-edit baselines, citation-only +discoverability probes, judge analyses, or paired scratch-doc A/B tests. +Held-out-only evidence is not sufficient. diff --git a/doc-experiment/LOG.md b/doc-experiment/LOG.md new file mode 100644 index 0000000000000..85e2c6b44e287 --- /dev/null +++ b/doc-experiment/LOG.md @@ -0,0 +1,1851 @@ +# Experiment log + +Hypothesis → outcome narrative, one entry per round. Newest first. + +## Round 61 — citation probes find facts discoverable + +`round-61` staged current rendered source docs under `discoverability-probe` +with subjects `gpt-5.4-mini` / `low` / `priority`. No source docblocks, +scratch docs, corpus fixtures, or scoring harness behavior changed. + +Four citation-only probes ran against the method-local contracts surfaced by +rounds 58-60: + +- `next-tag-boundary-detector`: 3/3 subjects answered that plain `next_tag()` + is not the subtree-boundary traversal and cited `next_token()`, + `get_current_depth()`, and `next_tag()` headings. +- `bounded-region-completion-scope`: 3/3 subjects answered that the docs do + not require draining to EOF after a bounded region scan just to find unrelated + trailing malformed input. Subjects also noted that a more explicit + trailing-suffix sentence is missing. +- `breadcrumbs-ancestor-check`: 3/3 subjects answered that breadcrumbs include + the current node and that breadcrumb queries are DOM path/sub-path checks, not + arbitrary ancestor-set checks. Subjects inferred slicing off the current node, + but noted the docs do not show that exact ancestor-only idiom. +- `html-processor-factory-lifecycle`: 3/3 subjects answered that callers should + use `create_fragment()` or `create_full_parser()` and should not instantiate + `WP_HTML_Processor` directly. The docs do not spell out a runtime consequence + beyond the do-not-use constructor warning. +- `html-processor-attribute-value-contract`: 3/3 subjects found the + `get_attribute()` return cases (`null`, `true`, `''`, and decoded strings), + but subjects also said the docs do not explicitly name the predicate for a + usable non-empty URL string. This supports a future attribute-value contrast + card only if a train task repeats the confusion; held-out N02 alone must not + drive it. + +Interpretation: do not promote a traversal or factory source edit from these +signals. The traversal and factory facts are discoverable when weak subjects +are asked directly, and two transfer-oriented traversal A/B variants already +lost. The attribute-value probe found one missing named idiom, but current +train evidence is only near-miss level, so this is backlog rather than a source +promotion gate. + +Full-round reanalysis after these probes found no remaining non-held-out, +non-noise train pattern strong enough to justify a source docblock edit. + +Next action: keep the selected subject policy at `gpt-5.4-mini` / `low` / +`priority` and pause under the protocol's signal-exhaustion rule rather than +adding speculative prose. Resume only if the corpus changes, a future trusted +train round repeats one of the backlogged patterns, or the experiment owner +explicitly asks to test a new hypothesis despite the weak signal. + +## Round 60 — bounded-loop scratch A/B also loses + +`round-60` was a second scratch-only HTML Processor rendered-doc variant for +the same traversal subset as round 58/59: +`N03-first-list-count`, `T07-nested-lists`, and `T08-table-extract`. It used +`shadow-doc-a/b`, subjects `gpt-5.4-mini` / `low` / `priority`, and judge +`gpt-5.5` / `xhigh` / `priority`. Source docblocks were unchanged. + +Variant: replace the failed closer contrast with a full generic bounded-loop +recipe. The loop checked `get_current_depth() < $container_depth` immediately +after advancing and before token-type, closer, and direct-child filters. It +also added regional completion wording: do not drain to EOF solely to reject +trailing malformed input unless the caller requires whole-document +completeness. + +Numeric result: variant lost, **90.18 vs the round-58 control 97.35**. N03 +improved only 93.66 -> 94.26: two trials followed the intended `next_token()` +bounded-loop shape, but one still used plain `next_tag()` and over-scanned past +the list into trailing malformed input. T07 fell 99.40 -> 98.60. T08 collapsed +99.00 -> 77.68 because one trial misapplied the repeated-region state-machine +guidance and manufactured empty cells by flushing a null child accumulator and +pre-flushing on sibling openers instead of trusting the processor's virtual +closers. + +Interpretation: do not promote. Two adjacent traversal A/B variants have now +failed to beat the control. The N03 issue is real but the tested generic +recipes are not an improvement as rendered; they add enough state-machine +surface area to hurt T08. Treat the T08 hierarchical-state notes as +variant-induced evidence only, not a source-edit driver by themselves. The +remaining repeated signals are method-local discoverability questions: +whether subjects can cite that plain `next_tag()` is not a subtree-boundary +detector, whether completion checks are scoped to the promised bounded region, +and whether breadcrumbs should be sliced before ancestor checks. + +Next action: commit round-60 results separately, then prepare a +`discoverability-probe` round on current source docs with subjects +`gpt-5.4-mini` / `low` / `priority`. Probe the method-local contracts above +with citation-only questions before any further traversal source edit or +scratch A/B. Keep `seek()` unknown-bookmark behavior as a separate candidate +unless it repeats outside the losing round-59 sample. + +## Rounds 58/59 — depth-boundary closer-card scratch A/B loses + +`round-58` was the control rendered-doc round and `round-59` was a +scratch-only HTML Processor rendered-doc variant for +`N03-first-list-count`, `T07-nested-lists`, and `T08-table-extract`. Both used +`shadow-doc-a/b`, subjects `gpt-5.4-mini` / `low` / `priority`, and judge +`gpt-5.5` / `xhigh` / `priority`. Source docblocks were unchanged. + +Variant: add compact class-level and method-local contrast wording stating +that depth-boundary scans must visit the boundary token: use `next_token()`, or +use `next_tag( array( 'tag_closers' => 'visit' ) )` for tag-only scans because +plain `next_tag()` skips closers. + +Numeric result: variant lost, **90.74 vs 97.35**. N03 fell 93.66 -> 76.51, +T07 fell 99.40 -> 96.30, and T08 rose 99.00 -> 99.40. The N03 target pattern +improved in one trial, which used `tag_closers => 'visit'` and scored 100, but +another trial still skipped the boundary because it checked `is_tag_closer()` +before checking whether depth had dropped below the recorded list depth. A +third N03 trial introduced a separate bookmark misuse, calling `seek()` for a +bookmark that was never set and then reparsing. + +Interpretation: do not promote the closer-card wording. The failure mode is +more precise than the tested wording: weaker subjects need a full generic +bounded-subtree loop where the first operation after advancing is +`get_current_depth() < $container_depth` break, followed only then by token +type, closer, tag-name, and direct-child predicates. Judges also identified two +separate candidates: `seek()` should make the set-bookmark precondition and +unknown-name behavior explicit, and clean-scan checks should be scoped to the +caller's promised region rather than automatically treating malformed trailing +markup after a closed target subtree as invalid. + +Next action: commit round-59 results separately, then do not edit source from +this losing variant. If continuing the traversal hypothesis, run a new +scratch-rendered A/B with subjects `gpt-5.4-mini` / `low` / `priority`, judge +`gpt-5.5` / `xhigh` / `priority`, the same traversal subset, a complete +bounded-loop recipe, and separate regional completion wording; keep the +bookmark `seek()` precondition as its own method-local diagnostic or small +source candidate only after evidence confirms it is not one-off sampling noise. + +## Round 57 — checkpoint after serialization fallback source edit + +**All 97.90 / train 97.95 / held-out 97.73 / core 97.66** under +`checkpoint`, with subjects `gpt-5.4-mini` / `low` / `priority` and judge +`gpt-5.5` / `xhigh` / `priority`. This checkpoint scored the current source +docs after the round-56 source confirmation. + +Operational note: two audit-only tooling commits landed between round 56 and +this checkpoint to stop the process from blocking on a log-requested checkpoint +or on expected prepared-round result artifacts. They changed +`doc-experiment/tools/audit-state.py` only. Source docs, corpus, staging, +subject runner, judge runner, harness, and aggregation policy were unchanged. + +Outcome: keep the round-56 source edit. The train split moved 99.61 -> 97.95 +versus round 56, below the 2-point revert threshold, and no train task +regressed across all trials. The target serialization tasks stayed stable: +T09 remained 99.40 and T12 moved 99.30 -> 98.80. Held-out is sentinel-only; +N02 scored 93.31 because two trials treated a valueless `src` as usable, but +held-out evidence must not drive source edits. + +The largest train dip was T06 at 80.00, caused by one trial with a PHP array-key +typo; the judge explicitly said this was not an HTML API misconception. The +strongest train documentation signal is N03 at 94.56: one trial used plain +`next_tag()` plus `get_current_depth()` as though it could detect a subtree +boundary, but plain `next_tag()` skips closers by default and can over-scan into +later incomplete or unsupported markup. Judges pointed to a missing contrast: +depth-boundary logic only works on a stream that visits the boundary token, +such as `next_token()` or `next_tag( array( 'tag_closers' => 'visit' ) )`. + +Decision: do not revert. Do not edit source directly from held-out N02 or from +the T06 generic PHP typo. Treat the N03 train failure as the next diagnostic +candidate. + +Next action: commit round-57 results separately, then run a focused +`shadow-doc-a/b` diagnostic with `gpt-5.4-mini` / `low` / `priority` on N03 and +nearby traversal controls, testing a compact generic contrast card for +depth-boundary scans: use `next_token()` or visit closers when the loop relies +on `get_current_depth()` to leave a subtree; plain `next_tag()` skips closers. + +## Round 56 — serialization fallback source edit confirmed + +**Train 99.61 / core 99.55** under `scored-train`, with subjects +`gpt-5.4-mini` / `low` / `priority` and judge `gpt-5.5` / `xhigh` / +`priority`. This scored commit `1107adb72d`, which promoted the winning +rounds-54/55 serialization rewrite fallback card into the +`WP_HTML_Processor` source docs. + +Outcome: keep. All 45 train trials passed all hidden cases. Compared with the +comparable weak-tier no-edit baseline, round 53, train moved 99.51 -> 99.61 and +core moved 99.43 -> 99.55. The target serialization concept moved 98.85 -> +99.35; T09-mark-keyword moved 99.10 -> 99.40; and T12-unwrap-spans moved +98.60 -> 99.30. No task crossed the revert threshold and no previously passing +task regressed across all trials. + +The source wording transferred the core recipe: candidates used +`get_modifiable_text()` for decoded inspection and `serialize_token()` for +emitting rewritten tokens. The residual near-miss is narrower than the promoted +hypothesis: T09 and T12 candidates still sometimes used +`normalize( $html ) ?? $html` as an explicit parser-error fallback after a +rewrite loop. Judges accepted this for the tested inputs but flagged that raw +input is not a normalized fallback and that `normalize( $html )` abandons +emitted rewrites. + +Decision: keep the source edit. Treat the remaining fallback issue as a future +diagnostic, not an immediate source edit, because the current weak tier is still +functionally saturated and the source hypothesis just scored stable. + +Next action: commit round-56 results separately, then run a checkpoint with the +same primary subject tier, `gpt-5.4-mini` / `low` / `priority`, and the same +judge tier, `gpt-5.5` / `xhigh` / `priority`, before promoting another source +docblock edit. + +## Rounds 54/55 — serialization rewrite fallback scratch A/B wins + +`round-54` was the control rendered-doc round and `round-55` was a +scratch-only HTML Processor rendered-doc variant for +`T09-mark-keyword`, `T12-unwrap-spans`, and the normalization control +`N04-normalize-or-placeholder`. Both used `shadow-doc-a/b`, subjects +`gpt-5.4-mini` / `low` / `priority`, and judge `gpt-5.5` / `xhigh` / +`priority`. Source docblocks were unchanged. + +Variant: add a compact string-returning rewrite checklist near the +class-level `serialize_token()` recipe and a method-local wrapper example. +The key distinctions are: use `get_modifiable_text()` for decoded inspection, +not for hand-escaped output; use `serialize_token()` to emit the current token; +the accumulated `$output` is the rewrite; and `normalize( $html )` or raw input +discard wrappers, skipped tokens, replacements, and other emitted changes. + +Numeric result: variant won, **99.53 vs 98.87**. Serialization rose 98.30 -> +99.55. T09 improved 98.50 -> 99.60, and T12 improved 98.10 -> 99.50. N04 +moved 100.00 -> 99.50 because one variant trial used the lower-level +`create_fragment()` + `serialize()` path rather than the direct `normalize()` +helper, but all N04 hidden cases still passed. + +Transfer result: the variant eliminated the control's worst T09 pattern: +decoded `get_modifiable_text()` plus `htmlspecialchars()` as a substitute for +token serialization. It also reduced T12 fallback-policy penalties. The +remaining near-miss is narrower: subjects may still use `normalize( $html )` +or raw input as an explicit abandonment fallback after a parser error. + +Interpretation: promotable as an adapted source hypothesis. Keep it generic +and compact. Promote the class-level checklist and method-local wrapper / +anti-pattern examples, but avoid suggesting one universal fallback policy for +all string-returning rewrites. + +Next action: commit rounds 54/55 results, then edit +`src/wp-includes/html-api/class-wp-html-processor.php` to promote one adapted +serialization rewrite fallback recipe. Run the docs-only guard, stage docs, and +score the source hypothesis with `gpt-5.4-mini` / `low` / `priority`. + +## Round 53 — mini/low calibration exhausts weak-tier ladder + +**Train 99.51 / core 99.43** under `weak-tier-calibration`, with subjects +`gpt-5.4-mini` / `low` / `priority` and judge `gpt-5.5` / `xhigh` / +`priority`. This was the final no-edit calibration rung defined in +`PROTOCOL.md`. + +Outcome: the weakest configured subject tier is still functionally saturated. +All 45 subject trials passed all hidden cases. The round score was essentially +flat with round 52, 99.53 -> 99.51. Concept means: classes 100.00, traversal +99.62, normalization 99.60, attributes 99.57, text 99.50, and serialization +98.85. + +The most repeated weaker-tier signal is not a hidden-test failure but an +adherence pattern around normalized rewrite fallback. T12-unwrap-spans scored +98.60 and T09-mark-keyword scored 99.10; candidates again used raw input or +`normalize( $html )` as generic recovery after a `serialize_token()` rewrite +loop, which discards accumulated insertions/removals/replacements. T05/T06/N06 +read-only extraction remained strong but still showed smaller caller-policy +near-misses. + +Decision: treat `gpt-5.4-mini` / `low` as the selected weak diagnostic tier +because the ladder is exhausted, even though it remains saturated. Do not +promote source docs directly from the calibration. The next evidence-building +step should be a scratch rendered-doc A/B, not a source edit. + +Next action: commit round-53 results separately, then run a focused +`shadow-doc-a/b` diagnostic at `gpt-5.4-mini` / `low` on the serialization +rewrite tasks, testing a compact generic recipe/card in the HTML Processor +class docs for string-returning `serialize_token()` rewrites and explicit +fallback policy. + +## Round 52 — mini/high weak-tier calibration still saturated + +**Train 99.53 / core 99.46** under `weak-tier-calibration`, with subjects +`gpt-5.4-mini` / `high` / `priority` and judge `gpt-5.5` / `xhigh` / +`priority`. This was a no-edit calibration on the current source docs, +staged after the audit tool was taught to follow the weak-tier subject +ladder. The tooling change affected preflight next-action selection only; the +rendered docs, source docblocks, corpus, runners, and judge policy were +unchanged. + +Outcome: still saturated. All 45 subject trials passed all hidden cases. The +round score fell only slightly from round 51, 99.65 -> 99.53. Concept means: +classes 100.00, text 99.73, attributes 99.73, normalization 99.50, +traversal 99.52, and serialization 98.75. + +The clearest adherence signal moved from read-only text extraction toward +string-returning normalized rewrites. T09-mark-keyword scored 98.60 and +T12-unwrap-spans scored 98.90 because candidates still used raw input or +`normalize( $html )` as generic fallbacks after a `serialize_token()` rewrite +loop, which discards the accumulated rewrite. Text extraction stayed strong: +T05 was 99.60, T06 was 99.60, and N06 was 99.20. + +Decision: record round 52 as the no-edit baseline for `gpt-5.4-mini` / +`high`, but do not promote source docs from another saturated calibration. +Per the subject ladder in `PROTOCOL.md`, step down one final rung before +choosing a primary weak tier for scratch A/B or source-hypothesis work. + +Next action: commit round-52 results separately, then prepare and run a +`weak-tier-calibration` round on current docs using `gpt-5.4-mini` / `low` / +`priority`. + +## Round 51 — weak-tier calibration still saturated + +**Train 99.65 / core 99.59** under `weak-tier-calibration`, with subjects +`gpt-5.4` / `low` / `priority` and judge `gpt-5.5` / `xhigh` / +`priority`. This was a no-edit calibration on the current source docs after +round 50, run because the experiment owner asked to move to a weaker testing +tier before promoting another documentation hypothesis. + +Outcome: still too saturated to be the main source-edit driver. All 45 +subject trials passed all hidden cases. The weakest task scores were +T06-collect-links at 98.50, T05-text-excerpt at 99.00, +T07-nested-lists at 99.20, T08-table-extract at 99.30, +T09-mark-keyword at 99.40, and N06-extract-toc at 99.50. Concept means were +attributes/classes/normalization 100.00, serialization 99.70, traversal 99.56, +and text 99.17. + +The useful signal remains adherence-only: T05/N06 still show occasional +fail-closed handling of already visited read-only text after +`paused_at_incomplete_token()` or `get_last_error()`, T06 still varies on +read-only completion policy, and T09 still shows occasional uncertainty about +normalized rewrite fallback. None of this justifies a new source docblock edit +before a less saturated tier is calibrated. + +Decision: record round 51 as a no-edit calibration baseline for +`gpt-5.4` / `low`, but do not use it to promote source documentation. Per the +subject ladder in `PROTOCOL.md`, step down one more rung. + +Next action: commit round-51 results separately, then prepare and run a +`weak-tier-calibration` round on current docs using `gpt-5.4-mini` / `high` / +`priority`. + +## Round 50 — checkpoint before weaker-tier calibration + +**All 99.08 / train 99.65 / held-out 96.93 / core 98.97** under +`checkpoint`, with subjects `gpt-5.4` / `medium` / `priority` and judge +`gpt-5.5` / `xhigh` / `priority`. This scored the current source docs after +the round-47 text-policy source edit and after the rounds-48/49 read-only +completion-policy scratch A/B. Source docblocks were unchanged since +`29a148a4f7`. + +Outcome: stable enough not to revert. Compared with the previous checkpoint, +round 46, train rose 99.63 -> 99.65 while held-out fell 98.33 -> 96.93. The +held-out movement is below the 2-point revert threshold and is not an +all-trial task regression. The drop is concentrated in N02 trial 3, which +passed 6/9 after interpreting `array( 'FIGURE', 'IMG' )` breadcrumbs as +arbitrary-depth containment rather than a contiguous breadcrumb path. This is +held-out-only sentinel evidence and must not drive a source edit. + +The train tasks tied to the read-only completion-policy candidate stayed +strong: T05 was 99.90, T06 was 98.40, T08 was 99.30, and N06 was 100.00. +This keeps the round-49 scratch variant viable, but the current primary tier +is saturated enough that another immediate source promotion would have weak +signal. + +Decision: do not revert. Do not promote another source docblock edit yet. +Per experiment-owner direction, move to a weaker subject tier and run a +no-edit calibration before using that tier to drive source edits. + +Next action: commit round-50 results separately, then prepare and run a +`weak-tier-calibration` round on current docs using the next subject tier in +`PROTOCOL.md`, `gpt-5.4` / `low` / `priority`. + +## Rounds 48/49 — read-only completion-policy scratch A/B wins + +`round-48` was the control rendered-doc round and `round-49` was a +scratch-only HTML Processor rendered-doc variant for four train tasks: +`T05-text-excerpt`, `T06-collect-links`, `T08-table-extract`, and +`N06-extract-toc`. Both used `shadow-doc-a/b`, subjects `gpt-5.4` / +`medium` / `priority`, and judge `gpt-5.5` / `xhigh` / `priority`. Source +docblocks were unchanged. + +Variant: add one compact read-only completion-policy rule of thumb under the +class-level DOM-style text recipe. It separates best-effort extraction from +complete-source validation and from mutation, normalization, or token-rewrite +output. The key contract is that `paused_at_incomplete_token()` and +`get_last_error()` report scan status; they do not retroactively invalidate +tokens already visited. + +Numeric result: variant won, **99.65 vs 99.03** on the paired subset. All 24 +subject trials passed all hidden cases. T05 improved 98.30 -> 100.00, T08 +improved 99.00 -> 99.80, and N06 improved 99.40 -> 100.00. T06 dipped 99.40 +-> 98.80 because one variant trial still cleared read-only results on +`get_last_error()`. + +Transfer result: the variant removed several over-strict completion-policy +near-misses. Control N06 trial 2 rejected accumulated headings after +`paused_at_incomplete_token()`, while variant N06 was 100/100/100 adherence. +Control T05 trials 1 and 2 used a risky Tag Processor fallback after an HTML +Processor abort; variant T05 used the HTML Processor pattern directly in all +trials. T06 shows the remaining weakness: a compact policy note helps but +does not fully prevent all fail-closed read-only collectors. + +Interpretation: promotable after a checkpoint gate, but adapt carefully. The +source edit should keep the small rule-of-thumb shape and avoid implying that +all read-only extractors must keep partial results. It should state the +choice as caller contract: best-effort extraction may return accumulated +visited-token data, while complete-source validation and mutations/rewrites +should fail closed when required. + +Next action: commit rounds 48/49 results separately, then run the required +checkpoint/regression sentinel before promoting another source docblock edit. +If held-out remains stable, promote an adapted read-only completion-policy +note as one source hypothesis. + +## Round 47 — text-policy decision table source edit confirmed + +**Train 99.55 / core 99.48** under `scored-train`, with subjects +`gpt-5.4` / `medium` / `priority` and judge `gpt-5.5` / `xhigh` / +`priority`. This scored commit `29a148a4f7`, which promoted the winning +rounds-44/45 text-policy decision table into the `WP_HTML_Processor` source +docs. + +Outcome: keep. All 45 subject trials passed all hidden cases. Compared with +the previous comparable scored-train round, round 43, train rose 98.18 -> +99.55. That comparison includes round 43's known generic T05 PHP bug, so the +more useful read is that round 47 is back in the high-signal band and below +round 36 only by judge noise, 99.65 -> 99.55. There is no revert signal and +no all-trial task regression. + +Target tasks stayed strong: T03 was 100.00, T05 was 98.00, T06 was 99.40, +T08 was 99.40, and N06 was 99.20. Judges credited the promoted table and +method-local reminders for the key transfer: candidates consistently used +ordinary `#text` tokens for DOM-style heading, table-cell, link, and article +text, and treated SCRIPT/STYLE/TITLE/TEXTAREA opener-carried text as opt-in +data rather than ordinary subtree text. + +Residual signal: read-only completion policy is still not crisp enough. In +T05, T06, T08, and N06, judges repeatedly saw candidates erase already +collected read-only results when `paused_at_incomplete_token()` was true, even +though the new source docs say this is caller policy. This is a real train +near-miss, but the source docs already contain the basic fact, so do not +promote another source wording change directly. Test a scratch variant that +makes the read-only best-effort vs complete-source-validation decision more +concrete. + +Next action: commit round-47 results separately, then run a focused scratch +A/B for read-only completion policy on the affected train tasks before any +additional source promotion. + +## Round 46 — checkpoint clears text-policy promotion gate + +**All 99.36 / train 99.63 / held-out 98.33 / core 99.28** under +`checkpoint`, with subjects `gpt-5.4` / `medium` / `priority` and judge +`gpt-5.5` / `xhigh` / `priority`. This scored the current source docs after +the round-43 serialization fallback source edit and before promoting the +rounds-44/45 text-policy decision-table scratch variant. + +Outcome: stable enough to continue. All 57 subject trials passed all hidden +cases. Compared with the previous checkpoint, round 42, train rose 99.54 -> +99.63 while held-out was effectively flat, 98.38 -> 98.33. The held-out +movement is below the revert threshold and is not an all-trial functional +regression. Held-out judge gaps remain regression-sentinel data only and must +not drive the next edit. + +The train tasks tied to the text-policy candidate stayed strong: T03 was +100.00, T05 was 98.80, T06 was 99.50, T08 was 98.60, and N06 was 98.60. The +checkpoint also repeated the same useful T05 near-miss from train evidence: +visited parser artifacts are not necessarily emitted normalized content, so +conditional subtree emission should test the serialized token string when the +contract depends on emitted output. + +Decision: checkpoint gate is clear. Promote one adapted source docblock +hypothesis for the text-policy decision table: ordinary DOM-style text reads +visited `#text` tokens by default; special-element opener text is an explicit +opt-in with different decoding/raw-text semantics; and read-only partial-scan +fallback remains caller policy rather than a blanket reject-or-keep rule. + +Next action: commit round-46 results separately, then edit the +`WP_HTML_Processor` source docs for the text-policy hypothesis, run the +docs-only guard, stage docs, and score the source edit as the next normal +source round. + +## Rounds 44/45 — text-policy decision table scratch A/B wins + +`round-44` was the control rendered-doc round and `round-45` was a +scratch-only HTML Processor rendered-doc variant for five train tasks: +`T03-first-h1-text`, `T05-text-excerpt`, `T06-collect-links`, +`T08-table-extract`, and `N06-extract-toc`. Both used `shadow-doc-a/b`, +subjects `gpt-5.4` / `medium` / `priority`, and judge `gpt-5.5` / +`xhigh` / `priority`. Source docblocks were unchanged. + +Variant: add a compact "where text lives / extraction policy" table near the +class-level DOM-style text recipe, plus short method-local reminders in +`next_token()` and `get_modifiable_text()`: ordinary DOM-style text reads only +visited `#text` tokens; special-element opener text is explicit opt-in for +that element's own contents; TITLE/TEXTAREA are decoded while SCRIPT/STYLE are +raw; and read-only extraction policy for partial scans is separate from +mutation, normalization, and token-rewrite fail-closed policy. + +Numeric result: variant won, **99.56 vs 98.94** on the paired subset. All 30 +subject trials passed all hidden cases. T03 improved 99.10 -> 100.00, T05 +98.90 -> 99.90, T08 98.60 -> 99.50, and N06 98.70 -> 99.50. T06 dipped only +99.40 -> 98.90, still with all trials passing all hidden cases. + +Transfer result: the variant eliminated the main special-element over-inclusion +pattern in the paired tasks. Control T03 trial 3, T08 trials 1 and 3, and N06 +trial 2 still treated special-element opener text as ordinary subtree text. +Variant T03, T08, and N06 trials all used ordinary `#text`-only extraction for +those tasks. The remaining weak spot is read-only partial-scan policy: T06 +variant trial 2 still returned an empty result on `paused_at_incomplete_token()` +even though all hidden cases passed. + +Interpretation: promotable after the checkpoint gate, but adapt carefully. The +source edit should keep the compact decision-table shape and the method-local +opt-in reminder. It should not over-expand the prose or imply that all +read-only extractors should keep partial results; the contract remains caller +policy. + +Next action: commit rounds 44/45 results separately, then run the required +checkpoint/regression sentinel before promoting another source docblock edit. +If held-out is stable, promote an adapted text-policy decision table as one +source hypothesis. + +## Round 43 — serialization fallback source edit scored neutral + +**Train 98.18 / core 97.89** under `scored-train`, with subjects +`gpt-5.4` / `medium` / `priority` and judge `gpt-5.5` / `xhigh` / +`priority`. This scored commit `27c764f6f0`, which promoted the round-41 +fallback-policy card into source docs around the HTML Processor class recipe, +`create_fragment()`, `normalize()`, and `serialize_token()`. + +Outcome: keep under the revert rule, but treat as neutral rather than a clean +win. Compared with the primary scored-train comparator, round 36, train fell +99.65 -> 98.18. The drop is below the 2-point revert threshold and is not an +all-trial task regression. It is concentrated in one unrelated T05-text-excerpt +trial that passed 2/10 because the candidate treated `preg_match_all()` as a +boolean/single-match API and skipped multi-codepoint text chunks. The judge +explicitly called this a PHP bug, not an HTML API documentation failure. + +Target serialization tasks remained stable but did not show a decisive win: +N04-normalize-or-placeholder stayed 100.00, T12-unwrap-spans rose 99.70 -> +99.80, and T09-mark-keyword fell 99.30 -> 99.10. All target hidden cases +passed. The remaining near-miss is still raw-input fallback after parser +abort: T09 candidates returned the original HTML even though the source docs +now state that raw input is not normalized rewritten output. The edit improved +local correctness of the docs, but the transfer problem is not fully solved. + +Decision: keep `27c764f6f0`; do not revert. Do not spend another immediate +source edit on fallback-policy wording without fresh diagnostic evidence. + +Next action: commit round-43 results separately, then analyze trusted judge +notes for the next diagnostic. The strongest current signals are still text +policy/read-only extraction and UTF-8 decoded-text measurement, but the T05 +functional failure alone is generic model noise and should not drive a source +edit by itself. + +## Round 42 — checkpoint clears fallback-policy promotion gate + +**All 99.29 / train 99.54 / held-out 98.38 / core 99.21** under +`checkpoint`, with subjects `gpt-5.4` / `medium` / `priority` and judge +`gpt-5.5` / `xhigh` / `priority`. This scored the current source docs after +the round-36 depth/direct-child source edit and before promoting the winning +round-41 serialization fallback-policy scratch card. + +Outcome: stable enough to continue. All 57 subject trials passed all hidden +cases. Compared with the previous checkpoint, round 35, train rose 99.50 -> +99.54 while held-out fell 99.38 -> 98.38. The held-out decline is below the +2-point revert threshold and is not an all-trial functional regression: +N01-remove-external-class stayed 100.00, N02-collect-figure-images was 98.90, +H04-remove-empty-paragraphs was 98.20, and N05-document-title fell to 96.40 +from one adherence-only trial. Held-out judge gaps remain regression-sentinel +data only and must not drive the next edit. + +The train tasks tied to the fallback-policy candidate stayed strong: +N04-normalize-or-placeholder was 100.00, T12-unwrap-spans was 98.80, and +T09-mark-keyword was 99.80. Round-42 judges still noted the same generic gap: +after a token-by-token `serialize_token()` rewrite, `normalize( $html )` on +the original input or returning raw input discards the accumulated rewrite and +is only a caller-chosen fallback, not normalized rewritten output. + +Decision: checkpoint gate is clear. Promote one adapted source docblock +hypothesis for serialization fallback policy, making the anti-pattern more +explicit than the round-41 scratch wording. + +Next action: commit round-42 results separately, then edit the +`WP_HTML_Processor` source docs for the fallback-policy hypothesis, run the +docs-only guard, stage docs, and score the source edit as the next normal +source round. + +## Rounds 40/41 — serialization fallback scratch A/B wins + +`round-40` was the control rendered-doc round and `round-41` was a +scratch-only HTML Processor rendered-doc variant for three train tasks: +`T09-mark-keyword`, `T12-unwrap-spans`, and +`N04-normalize-or-placeholder`. Both used `shadow-doc-a/b`, subjects +`gpt-5.4` / `medium` / `priority`, and judge `gpt-5.5` / `xhigh` / +`priority`. Source docblocks were unchanged. + +Variant: add method-local fallback-policy guidance around +`WP_HTML_Processor::create_fragment()`, `normalize()`, and +`serialize_token()`: factory `null` means no processor was created; later +`get_last_error()` is an unsupported-parser abort; the accumulated +`serialize_token()` output is the rewrite; `normalize( $html )` on the +original input discards emitted rewrite changes; raw original input is not +normalized output; and `paused_at_incomplete_token()` is a separate +complete-input policy check. + +Numeric result: variant won, **99.83 vs 99.57** on the paired subset. All +18 subject trials passed all hidden cases. N04 stayed perfect at 100.00. +T12 improved 98.90 -> 100.00, with all variant trials using an explicit +empty-string fallback instead of raw input or `normalize( $html )` after the +rewrite loop. T09 fell slightly, 99.80 -> 99.50, because one variant trial +still used `normalize( $html )` as an error fallback. + +Interpretation: promotable after the checkpoint gate, but adapt carefully. +The source edit should keep the winning method-local fallback-policy shape, +but should make the anti-pattern more explicit than the scratch wording: +after a `serialize_token()` rewrite loop, `normalize( $html )` and raw input +both abandon the accumulated rewrite; choose a caller-defined failure signal +instead. + +Next action: run a checkpoint/regression sentinel on the current source docs +before promoting another source docblock edit. If held-out remains stable, +promote an adapted fallback-policy card as one source hypothesis and score it +normally. + +## Round 39 — serialization fallback citation probe passes + +`round-39` was a `discoverability-probe` against the current rendered docs, +with subjects `gpt-5.4` / `medium` / `priority`. The question asked how a +token-by-token `serialize_token()` rewriter should distinguish +`create_fragment()` returning `null`, later `get_last_error()`, trailing +incomplete input via `paused_at_incomplete_token()`, post-rewrite +`normalize( $html )` / `serialize()` calls, and raw-input fallback when the +caller promises normalized output. + +Outcome: 3/3 subjects answered correctly with local citations. They found +that factory `null` is construction-time failure while non-null +`get_last_error()` is a later parser abort; `paused_at_incomplete_token()` is +a separate complete-input policy check after scanning; the accumulated +`serialize_token()` string is the rewrite; calling `normalize( $html )` on +the original input discards emitted changes; `serialize()` returns `null` +after scanning has started; and raw original input is not documented as a +normalized-output fallback. + +Interpretation: the facts are discoverable when directly requested. The +remaining problem is transfer into implementation tasks, where round-36 and +round-37/38 candidates still improvised raw-input or `normalize( $html )` +fallbacks after a rewrite loop. + +Next action: test a scratch-only method-local fallback-policy card around +`serialize_token()` / `create_fragment()` / `normalize()` on +`T09-mark-keyword`, `T12-unwrap-spans`, and `N04-normalize-or-placeholder`. +Do not source-edit from this probe alone. + +## Rounds 37/38 — method-local text policy scratch A/B loses + +`round-37` was the control rendered-doc round and `round-38` was a +scratch-only HTML Processor rendered-doc variant for five train tasks: +`T03-first-h1-text`, `T05-text-excerpt`, `N06-extract-toc`, +`T08-table-extract`, and `T09-mark-keyword`. Both used `shadow-doc-a/b`, +subjects `gpt-5.4` / `medium` / `priority`, and judge `gpt-5.5` / +`xhigh` / `priority`. Source docblocks were unchanged. + +Variant: change the method-local `WP_HTML_Processor::next_token()` special +elements paragraph from "important exception" framing to explicit +caller-policy framing, and add a method-local `get_modifiable_text()` warning +that the method is not a predicate for ordinary text. The intended target was +the recurring over-inclusion of SCRIPT/STYLE/TEXTAREA/TITLE opener-carried +text in ordinary subtree extraction. + +Numeric result: variant lost, **98.72 vs 99.18** on the paired subset. All +30 subject trials passed all hidden cases, so the loss is adherence-only. +T03 was flat at 98.80, but T05 fell 99.60 -> 98.60, N06 fell 98.90 -> +98.80, T08 fell 98.70 -> 98.30, and T09 fell 99.90 -> 99.10. The variant did +not eliminate the target pattern: variant T03 still had one trial including +special-element opener text, and variant T08 still had two such trials. + +Interpretation: do not promote this wording. The method-local text-policy +direction is not dead, but this particular phrasing adds noise and can pull +models into broader fallback or special-element reasoning without fixing the +transfer problem. Keep the existing source docs unchanged. + +Next action: run the separate normalized-output / `serialize_token()` +fallback diagnostic as a citation-only probe before any source edit. Round-36 +and round-37/38 judges repeatedly show candidates improvising raw-input or +`normalize( $html )` fallbacks after token-by-token rewrites, but that +hypothesis has not had a fresh focused probe after the round-36 source state. + +## Round 36 — depth-bounded traversal source edit confirmed + +**Train 99.65 / core 99.59** under `scored-train`, with subjects +`gpt-5.4` / `medium` / `priority` and judge `gpt-5.5` / `xhigh` / +`priority`. This scored the source promotion of the round-34 class-level +HTML Processor recipe for subtree membership and direct-child opener checks. +The prepared round was at `4a39f7802c`, with the documentation hypothesis in +`6548356f1f`. + +Outcome: confirmed. All 45 subject trials passed all hidden cases. Compared +with the primary same-mode scored-train baseline, round 32, the round is +essentially tied: 99.67 -> 99.65, well clear of the revert threshold. The +targeted traversal tasks held or improved: N03-first-list-count stayed +perfect at 100.00, T07-nested-lists rose 99.30 -> 100.00, and +T08-table-extract rose 97.60 -> 98.50. N06-extract-toc was 99.00, down only +0.4 from round 32 and still all hidden cases passed. + +Secondary context: compared with the immediate pre-promotion checkpoint's +train split, round 35 train 99.50 -> round 36 train 99.65. This is useful +local context but not the primary comparator because round 35 was +`checkpoint` mode and included held-out tasks. + +Decision: keep the traversal recipe source edit. It is general API +documentation and the scored source round does not show a regression. The +remaining judge signal is separate: special-element opener text can still be +over-included in ordinary subtree text, `serialize_token()` rewriters still +vary in fallback policy, and examples that call the inherited +`paused_at_incomplete_token()` from HTML Processor workflows could be made +more explicit. + +Next action: commit round-36 results separately from the source hypothesis, +then analyze trusted round-36 judge notes against the backlog. Do not add more +traversal/depth source prose unless a new measurement exposes a distinct +failure. + +## Round 35 — checkpoint clears depth-card promotion gate + +**All 99.47 / train 99.50 / held-out 99.38 / core 99.41** under +`checkpoint`, with subjects `gpt-5.4` / `medium` / `priority` and judge +`gpt-5.5` / `xhigh` / `priority`. This scored the current source docs after +the round-32 `next_tag()` source edit and before promoting the round-34 +scratch traversal card. + +Outcome: stable. All 57 subject trials passed all hidden cases. Compared with +the previous checkpoint, round 24, all-score rose 99.35 -> 99.47, train rose +99.41 -> 99.50, and held-out rose 99.12 -> 99.38. Held-out scores were +N01-remove-external-class 100.00, N02-collect-figure-images 99.80, +N05-document-title 98.80, and H04-remove-empty-paragraphs 98.90. There is no +held-out functional regression and no reason to revert the current source +docs. + +The checkpoint also confirms the round-32 cursor edit held in the broader +sentinel: N03-first-list-count was 100.00 and T07-nested-lists was 99.30. The +lowest train task remains T08-table-extract at 98.10, with the same residual +text-policy issue: subjects sometimes over-include SCRIPT/STYLE/TEXTAREA/TITLE +opener-carried modifiable text when ordinary `#text` extraction was intended. +That is separate from the depth/direct-child traversal card. + +Decision: the held-out gate is clear. + +Next action: promote an adapted, concise version of the round-34 +depth-bounded traversal/direct-child card into the `WP_HTML_Processor` class +documentation as one source hypothesis, then run the docs-only guard, stage +docs, and score it as the next normal source round. + +## Rounds 33/34 — depth-bounded traversal scratch A/B wins + +`round-33` was the control rendered-doc round and `round-34` was a +scratch-only HTML Processor rendered-doc variant for four train tasks: +`N03-first-list-count`, `N06-extract-toc`, `T06-collect-links`, and +`T08-table-extract`. Both used `shadow-doc-a/b`, subjects `gpt-5.4` / +`medium` / `priority`, and judge `gpt-5.5` / `xhigh` / `priority`. Source +docblocks were unchanged. + +Variant: add a compact class-level card after the existing "scan a region +before editing its opener" recipe explaining depth-bounded subtree membership +and direct-child opener tests: record the container opener depth; later tokens +remain inside while depth is `>=` that value; direct child element openers +require `get_token_type() === '#tag'`, `! is_tag_closer()`, and +`get_current_depth() === $container_depth + 1`; child closers report parent +depth and must not be counted; repeated regions should generally use one +`next_token()` loop with explicit state rather than nested token loops. + +Numeric result: variant won, **99.08 vs 97.34** on the paired subset. +Traversal improved from 96.62 to 99.00. N03 moved from 94.46 to 100.00: the +control had one 9/11 trial that treated a depth drop plus null +`get_last_error()` as a complete scan and missed +`paused_at_incomplete_token()`, while all variant N03 trials passed 11/11 +with 100 adherence. T08 moved from 96.50 to 98.00. N06 was flat/slightly up +at 99.00, and T06 dipped only 0.2 to 99.30. All variant hidden tests passed. + +Interpretation: promotable as a source hypothesis after the held-out cadence +is satisfied. The edit is generic API documentation rather than a task-shaped +answer, and it directly addresses repeated judge gaps around subtree +membership, direct-child detection, and one-cursor traversal. Caveat: it does +not solve the separate text-policy issue. Variant judges still saw +special-element opener text over-inclusion in N06 and T08, so that remains a +separate method-local/text-policy hypothesis. + +Next action: run a checkpoint/regression sentinel on the current source docs +before promoting another source docblock edit. If held-out remains stable, +promote an adapted, concise version of the depth-bounded traversal card into +the `WP_HTML_Processor` class documentation and score it as one source +hypothesis. + +## Round 32 — HTML Processor next_tag() cursor source edit confirmed + +**Train 99.67 / core 99.62** under `scored-train`, with subjects +`gpt-5.4` / `medium` / `priority` and judge `gpt-5.5` / `xhigh` / +`priority`. This scored commit `19a49c1479`, which promoted the winning +round-31 scratch method-local card into `WP_HTML_Processor::next_tag()`: +searches are cursor-relative, a failed search does not rewind, `tag_name` is +one string or null rather than a list of alternatives, and first-of-several +tag searches should use one forward scan plus `get_tag()` branching unless +the caller intentionally bookmarks/seeks or creates a new processor. + +Outcome: confirmed. The round improved from the comparable round-29 +scored-train baseline 98.31 to 99.67, well clear of the revert threshold. +All 45 subject trials passed all hidden cases. The target failure recovered: +T07-nested-lists moved from 81.13 to 99.30, and all three T07 trials used a +single forward scan rather than sequential filtered searches. N03 stayed +perfect at 100.00. + +Residual signal is adherence-only. The lowest task was T08-table-extract at +97.60, with judges again pointing at generic traversal/depth traces, +virtual-closer and incomplete-token policy, and ordinary-text versus +special-element opt-in wording. T03 and N06 passed all hidden cases but still +showed occasional special-element text over-inclusion in explanations or +implementations. T09 and T12 were strong, but judges still noted inconsistent +fallback policy for token-serialization helpers that promise normalized +output. + +Decision: keep `19a49c1479`. The suggested generic recipe direction remains +plausible, but should be tested by a discoverability probe or scratch +rendered-doc A/B before source promotion; do not directly add broad +class-level recipe prose from round-32 judge suggestions alone. If such a +diagnostic wins, check the held-out checkpoint cadence before promoting the +next source docblock edit. + +## Round 29 — ordinary subtree text policy source edit is mixed + +**Train 98.31 / core 98.05** under `scored-train`, with subjects +`gpt-5.4` / `medium` / `priority` and judge `gpt-5.5` / `xhigh` / +`priority`. This scored commit `95173a4486`, which promoted the winning +round-28 scratch direction into the HTML Processor class docs: ordinary +subtree text is `#text` tokens by default, special-element opener text is +explicit opt-in, and unguarded `get_modifiable_text()` is too broad. + +Outcome: mixed, keep under the revert rule but do not treat the hypothesis as +fully confirmed. The round dropped from the comparable round-23 scored-train +baseline 99.50 to 98.31, below the 2-point revert threshold. There was no +all-trials regression on a previously passing task, but T07-nested-lists had +one functional miss and fell to 81.13 because one subject ran separate +cursor-relative `next_tag()` scans for `UL` and then `OL`; the second scan +started at EOF and never revisited earlier `OL` elements. Judges attributed +that to missing HTML Processor `next_tag()` cursor/OR-query guidance, not to +the text-policy edit. + +Target text results were split. T03-first-h1-text improved to 99.40 and +T05-text-excerpt improved to 99.80. N06-extract-toc fell to 97.60: all three +subjects still included SCRIPT/STYLE/TEXTAREA/TITLE opener text in ordinary +heading text. The N06 judge identified the competing method-local +`next_token()` special-element paragraph as the stronger remaining source of +over-inclusion; the overview recipe now says opt-in, but the method section +can still read like a general instruction to include special-element opener +text whenever collecting element text. + +Decision: do not revert `95173a4486`; it stays below the protocol's revert +threshold and improved adjacent text tasks. Also do not add another broad +overview recipe for this same text policy. If continuing text-policy work, the +next diagnostic should be method-local and focused on the `next_token()` +special-element paragraph. The stronger immediate train failure is the +repeated `WP_HTML_Processor::next_tag()` cursor-relative / one-of-several-tags +gap exposed by T07 and previously seen in N03-style scans. + +Follow-up citation-only probe: `round-29-next-tag-cursor-or-search` asked +three subjects whether a `next_tag( 'UL' )` scan followed by a +`next_tag( 'OL' )` scan on the same processor rescans earlier tags, and how to +find the first of several tag names. All three answered correctly: the second +scan does not restart; a failed `next_tag()` leaves the cursor at the end; use +one forward scan and branch on `get_tag()` for alternatives; `tag_name` is a +single string or null. They mostly cited the Tag Processor "Finding tags" and +"Custom queries" sections plus the HTML Processor one-cursor `next_token()` +note. Interpretation: the facts are discoverable when asked directly, but +placement is weak for HTML Processor `next_tag()` task work. The next +documentation diagnostic can be a scratch method-local HTML Processor +`next_tag()` contrast card rather than another broad overview recipe. A +sidecar doc-location check confirmed there is no local HTML Processor +`next_tag()` warning and no HTML Processor first-of-several-tags idiom; the +only OR-style idiom found is in the Tag Processor "Custom queries" section. + +Follow-up scratch A/B: rounds 30/31 tested a method-local +`WP_HTML_Processor::next_tag()` card under `shadow-doc-a/b` on N03 and T07. +The card stated that searches are cursor-relative, false does not reset the +cursor, `tag_name` is one string or null, first-of-several tags should use one +forward `next_tag()` scan plus `get_tag()` branching, and intentional rescans +require a bookmark/seek or a new processor. Result: variant won cleanly, +99.80 versus 99.30. N03 stayed 100.00 in both rounds, while T07 improved from +98.60 to 99.60 and all variant T07 trials used a one-pass approach. This +supports promoting the method-local cursor/OR-search card as a source +hypothesis. + +## Rounds 27/28 — ordinary-text negative example scratch A/B + +`round-27` was a fresh control rendered-doc round and `round-28` was a +scratch-only HTML Processor rendered-doc variant for the same three train +tasks (`T03-first-h1-text`, `N06-extract-toc`, `T05-text-excerpt`). Both used +`shadow-doc-a/b`, subjects `gpt-5.4` / `medium` / `priority`, and judge +`gpt-5.5` / `xhigh` / `priority`. Source docblocks were unchanged. + +Variant: instead of the broad policy matrix from round 26, the scratch docs +added a default-first policy under the HTML Processor DOM-style text recipe: +ordinary subtree text is only reached `#text` tokens; special-element opener +text is available through `get_modifiable_text()` only when the caller +explicitly opts into those node types. The variant also included a negative +example intended to discourage treating all modifiable text as ordinary text. + +Numeric result: the variant improved the paired subset from **99.27** to +**99.50**. T03 moved from 99.60 to 100.00, N06 from 98.20 to 98.90, and T05 +from 100.00 to 99.60. All trials in both rounds passed all hidden tests. + +Interpretation: promotable after revising the scratch wording. The target +failure improved cleanly: in the control, T03 trials 2/3 and N06 trials 2/3 +included SCRIPT/STYLE/TEXTAREA/TITLE opener text in ordinary heading text; in +the variant, all three T03 implementations and all three N06 implementations +used `#text` only for ordinary heading/subtree text. T05 still included +TITLE/TEXTAREA and excluded SCRIPT/STYLE, so the stronger default rule did not +erase the explicit opt-in path needed by callers that ask for those elements. + +Caveat before source promotion: the scratch negative example used +`null !== $processor->get_modifiable_text()`, but `get_modifiable_text()` +returns a string and should not be taught as a presence test. Promote the +default-first/explicit-opt-in wording, plus a negative example based on +calling `get_modifiable_text()` from an unguarded token loop, but do not copy +the null-check code. + +Next action: commit these result artifacts, then promote the adapted generic +recipe to the `WP_HTML_Processor` class documentation and score it as one +source hypothesis. + +## Rounds 25/26 — read-only text policy matrix scratch A/B + +`round-25` was the control rendered docs and `round-26` was a scratch-only +HTML Processor rendered-doc variant adding a compact read-only text extraction +policy matrix near the class-level DOM-style text recipe. Both rounds used +`shadow-doc-a/b`, the same three train tasks (`T03-first-h1-text`, +`N06-extract-toc`, `T05-text-excerpt`), subjects `gpt-5.4` / `medium` / +`priority`, and judge `gpt-5.5` / `xhigh` / `priority`. Source docblocks were +unchanged. + +Numeric result: the variant improved the paired subset from **98.70** to +**99.17**. T05 moved from 99.40 to 100.00, T03 from 99.70 to 100.00, and N06 +from 97.00 to 97.50. All trials in both rounds passed all hidden tests. + +Interpretation: mixed, not promotable as written. The matrix helped the task +that explicitly wanted TITLE/TEXTAREA text while excluding SCRIPT/STYLE, but +it did not solve the target N06 over-inclusion pattern. More importantly, it +worsened the ordinary-heading-text signal in T03: control had two pure +`#text` implementations and one implementation that added special-element +opener text, while the variant had all three T03 subjects append SCRIPT, +STYLE, TEXTAREA, and TITLE opener text. Judges scored this as documented API +use because hidden cases did not cover special elements, but they still noted +that it was broader than the ordinary text-node extraction policy. + +Decision: do not promote this policy matrix to source docs. The next text +diagnostic, if pursued, should be a revised scratch-only variant that stresses +the default exclusion rule and a negative example: ordinary heading/subtree +text appends only `#text`; special-element opener text is available but is not +included unless the caller explicitly asks for those node types. Keep the +serialization/decoded-text reparse signal separate. + +## Round 24 — checkpoint after lexical-text boundary edit + +**All 99.35 / train 99.41 / held-out 99.12 / core 99.28** under +`checkpoint`, with subjects `gpt-5.4` / `medium` / `priority` and judge +`gpt-5.5` / `xhigh` / `priority`. This was the held-out regression sentinel +after the round-23 Tag Processor lexical-text boundary source edit. + +Outcome: stable. All 57 subject trials passed all hidden tests, including all +four held-out tasks. Held-out scores were H04 98.70, N01 100.00, N02 99.00, +and N05 98.80. There is no held-out functional regression and no reason to +revert the source edit. + +The target train signal held: T05-text-excerpt scored 99.80 in the checkpoint +with all three trials passing 10/10 and adherence 100/99/99. The Tag Processor +lexical-token example is no longer pulling subjects away from +`WP_HTML_Processor::create_fragment()` for parsed BODY-fragment text +extraction. + +Residual train signal: the lowest task was T09-mark-keyword at 98.10 because +one trial reparsed decoded `get_modifiable_text()` with +`WP_HTML_Processor::normalize()` instead of wrapping `serialize_token()`. +N06-extract-toc scored 98.30 because two trials over-included special-element +opener modifiable text in ordinary heading text. These are separate candidate +diagnostics: (1) decoded modifiable text is application text, not an HTML token +to reparse during serialization, and (2) ordinary subtree text is `#text` by +default, with special-element opener text as explicit caller opt-in. + +Next action: run a citation-only discoverability probe before any source edit. +Prefer probing the HTML Processor read-only text policy first because it spans +round-23 T03/N06/T05 and round-24 N06/N02 notes. Keep the +`serialize_token()`/decoded-text reparse issue as a separate follow-up probe or +scratch A/B candidate; do not merge the two hypotheses into one source edit. + +Follow-up citation-only probe: +`round-24-readonly-text-extraction-policy` asked three `gpt-5.4` / `medium` +subjects to explain ordinary read-only subtree text extraction, special +element opener text opt-in, and fallback policy after `get_last_error()` or +`paused_at_incomplete_token()`. All three answered the main boundary +correctly: ordinary subtree text uses only `#text`; callers should not call +`get_modifiable_text()` on every opening tag; SCRIPT/STYLE/TITLE/TEXTAREA +opener text is opt-in; and read-only fallback is caller policy rather than an +automatic discard of already collected text. Interpretation: the facts are +discoverable when directly requested. The remaining train near-misses are a +placement/transfer or signal-density problem, so the next diagnostic should be +a scratch rendered-doc A/B for a compact policy matrix before source +promotion. + +## Round 23 — Tag Processor lexical-text boundary confirmed + +**Train 99.50 / core 99.42** under `scored-train`, with subjects +`gpt-5.4` / `medium` / `priority` and judge `gpt-5.5` / `xhigh` / +`priority`. This scored commit `f7c83bfb6b`: a narrow Tag Processor class-doc +placement edit before the `next_token()` text example, labeling it as lexical +token processing and pointing parsed BODY-fragment text extraction to +`WP_HTML_Processor::create_fragment()` plus HTML Processor subtree text walks. + +Outcome: confirmed, with no functional regressions. All 45 subject trials +passed all hidden tests. Round score moved from the comparable round-22 +current-docs medium baseline 99.45 to 99.50 (+0.05), and core moved from +99.36 to 99.42 (+0.06). Concept means: attributes 99.87, classes 100.00, +normalization 100.00, serialization 99.15, text 99.07, traversal 99.48. + +The target task moved strongly: T05-text-excerpt improved from 96.70 to 99.20. +All three T05 trials now chose `WP_HTML_Processor::create_fragment()`, filtered +ordinary `#text`, and handled TITLE/TEXTAREA opener text intentionally. This +resolves the repeated round-20/21/22 failure where subjects copied the Tag +Processor lexical token walk as if it were the parsed fragment text-content +recipe. + +Residual signal is now different. T03 fell from 100.00 to 98.40 and N06 stayed +at 99.00 because some subjects over-included special-element opener modifiable +text in ordinary heading/subtree text. Judges also noted T05 trials 1 and 3 +used an all-or-nothing `get_last_error()` fallback for a read-only text walk, +discarding text collected before an unsupported parser abort. These are not +functional regressions in this round, but they sharpen the next text hypothesis: +ordinary subtree text means `#text` tokens by default; special-element +modifiable text and read-only abort fallback are explicit caller policies. + +Next action: commit the round-23 result artifacts, then run the required state +audit. Because a source edit just landed and the post-refresh train loop has +not run a held-out checkpoint recently, prefer a checkpoint/regression +sentinel before another source edit unless the audit/protocol state says +otherwise. + +## Round 22 — current-docs medium calibration restored + +**Train 99.45 / core 99.36** under `weak-tier-calibration`, with subjects +`gpt-5.4` / `medium` / `priority` and judge `gpt-5.5` / `xhigh` / +`priority`. This was a no-edit calibration on the current committed docs after +round 21, run because `audit-state.py` correctly reported that the current +source docs no longer had a current-docs no-edit baseline at the default +subject policy. + +Outcome: all 45 subject trials passed all hidden tests. Concept means: +attributes 100.00, classes 100.00, normalization 100.00, serialization 99.45, +text 98.43, traversal 99.50. The tier remains functionally saturated. + +The calibration confirms the main residual signal from round 21: +T05-text-excerpt again scored 96.70 with all three trials passing 10/10 but +adherence 90/88/89. Judges again identified the Tag Processor lexical token +text example as competing with the processor-selection guidance that parsed +BODY-fragment text content belongs on `WP_HTML_Processor::create_fragment()`. +This is now present at both `gpt-5.4` / `low` and `gpt-5.4` / `medium`. + +Follow-up citation-only probe: `round-22-tag-vs-html-text-boundary` asked +three `gpt-5.4` / `medium` subjects to choose between the Tag Processor +`next_token()` text example and `WP_HTML_Processor::create_fragment()` for +parsed BODY-fragment text-content extraction. All three chose +`create_fragment()`, cited the Tag Processor "Which processor should I use?", +"Tokens and finer-grained processing", and `get_modifiable_text()` sections, +and cited the HTML Processor DOM-style text recipe, `create_fragment()`, and +`next_token()` sections. Interpretation: the boundary facts are discoverable +when asked directly. The remaining failure mode is transfer/placement: task +agents enter through the Tag Processor text example and do not carry the +processor-choice contrast into implementation. + +Next action: a narrow Tag Processor source hypothesis is justified before +more broad recipe prose. Clarify that the Tag Processor `next_token()` text +example is lexical token processing, not parsed fragment text-content +extraction, and point callers needing BODY-fragment semantics, implied closing +behavior, tree order, or unsupported-markup policy to the HTML Processor. + +## Round 21 — generic HTML Processor recipes are mixed + +**Train 98.97 / core 98.81** under `scored-train`, with subjects +`gpt-5.4` / `low` / `priority` and judge `gpt-5.5` / `xhigh` / +`priority`. This scored the generic main-class recipe hypothesis from commit +`27077e06b1`: add HTML Processor class-level recipes for collecting +DOM-style text from a subtree and rewriting while serializing tokens, plus a +method-local `serialize_token()` completion-policy note. + +Outcome: keep for now under the protocol's revert rule, but this is not a +clean win. Round score moved from the round-20 low-effort no-edit calibration +99.43 to 98.97 (-0.46), below the 2-point revert threshold. All but one +subject trial passed all hidden tests; N03 trial 2 failed 10/11 because it +treated sequential filtered `next_tag( 'UL' )` then `next_tag( 'OL' )` calls +as alternate searches from the same cursor. Judges attributed that to missing +`WP_HTML_Processor::next_tag()` cursor/lookahead guidance, not to the recipe +edit. + +The target tasks were mixed: +- T09-mark-keyword improved slightly from 98.80 to 99.20. The new + `serialize_token()` policy avoided the exact probe failure where subjects + rejected all incomplete trailing syntax, but judges still saw inconsistent + fallback choices for factory failure and unsupported parser aborts. +- T05-text-excerpt fell from 96.70 to 94.40, with all three trials still + passing hidden tests but choosing `WP_HTML_Tag_Processor` for text + extraction. The new HTML Processor text recipe did not overcome the existing + Tag Processor lexical-token text example, which still looks like a ready + whole-fragment text-content recipe. +- N06 improved from 98.50 to 98.90, but two trials over-opted into special + element text while extracting heading text, reinforcing that + "modifiable text" is broader than ordinary parsed text. + +Interpretation: a broad HTML Processor recipe block is not enough. The next +evidence-backed source hypothesis should clarify the Tag Processor text-walk +example as lexical token processing and cross-reference the HTML Processor for +parsed BODY-fragment text, implied closing behavior, tree order, and +unsupported-markup policy. Separately, `WP_HTML_Processor::next_tag()` needs a +small cursor/lookahead warning and a first-of-several-tags idiom, but that is +a different hypothesis. + +## Round 20 — low-effort weak-tier calibration still saturated + +**Train 99.43 / core 99.34** under `weak-tier-calibration`, with subjects +`gpt-5.4` / `low` / `priority` and judge `gpt-5.5` / `xhigh` / +`priority`. This was a no-edit calibration round using the round-19 source +docs to test whether one step down the subject ladder gives a less saturated +measurement instrument. + +Outcome: the tier is still functionally saturated on the current train corpus. +All 45 subject trials passed all hidden tests. Concept means: attributes +100.00, classes 100.00, normalization 100.00, serialization 99.40, text +98.47, traversal 99.44. + +The round does produce useful adherence-only signal, especially for generic +main-class recipe candidates: +- T05-text-excerpt was the lowest task at 96.70, with all three trials passing + 10/10 but adherence 90/88/89. Judge notes point to scattered guidance for + DOM-style text extraction: use `WP_HTML_Processor`, filter ordinary text + with `get_token_type() === '#text'`, skip comments and attributes, and opt + into element-carried text only when wanted. +- N06-extract-toc scored 98.50. Trial 3 passed hidden cases but overused + `get_modifiable_text()` on non-closing named tokens; a judge probe showed it + would include comment text in a heading. This reinforces the same + "where text lives" / "DOM text versus modifiable text" gap. +- T09-mark-keyword scored 98.80. Trial 3 over-applied incomplete-input and + normalization fallback guidance after a token-rewrite loop, risking loss of + accumulated edits. This supports a clearer token-rewrite completion policy, + not a task-shaped example. + +Interpretation: `gpt-5.4` / `low` is not a meaningfully weaker measuring +instrument for functional failures, but it strengthens the case for a +scratch-tested generic recipe block in the class-level docs: text extraction +and token-rewrite recipes should teach broad API contracts rather than solve +specific corpus tasks. Per the subject ladder, the next measurement action is +a no-edit `gpt-5.4-mini` / `high` / `priority` calibration before using weaker +tier results to promote another source docblock hypothesis. + +Follow-up citation-only probe: a generic text/rewrite recipe probe at +`gpt-5.4` / `low` asked for (1) DOM-style text collection from a subtree and +(2) token-by-token rewrite completion policy when input may end incomplete or +unsupported. All three subjects found the DOM-style `#text` recipe and cited +the rendered docs correctly, but all three gave an over-conservative rewrite +policy: reject or fall back whenever `paused_at_incomplete_token()` is true. +That repeats the round-20 T09 near-miss where a rewrite loop risks discarding +already-emitted changes by re-normalizing the original HTML. The evidence +supports a narrow generic recipe/source hypothesis: token-by-token rewrites +should distinguish unsupported parser aborts from acceptable best-effort +omission of an incomplete trailing token, and should make the accumulated +output the rewrite. + +## Round 19 — generic region-scan recipe lands + +**Train 99.59 / core 99.53** against the current train corpus with subject +`gpt-5.4` / `medium` / `priority` and judge `gpt-5.5` / `xhigh` / +`priority`. This scored the round-18 N03 hypothesis as a source docblock edit: +add a class-level HTML Processor recipe for "scan a region before editing its +opener," plus compact method-local guard notes in `next_token()` and +`get_current_depth()`. + +Outcome: N03-first-list-count moved from 85.07 to 100.00. All three trials +passed 11/11 hidden cases and received 100 adherence. The candidates used the +documented pattern directly: bookmark the opener, walk the bounded region with +`next_token()` and `get_current_depth()`, reject incomplete or unsupported +scans with `paused_at_incomplete_token()` and `get_last_error()`, seek back, +mutate with `set_attribute()`, and read with `get_updated_html()`. + +All 45 subject trials passed all hidden tests. Concept means: attributes +100.00, classes 100.00, normalization 100.00, serialization 99.80, text +98.77, traversal 99.60. Small adherence-only movement on T05/T06/T08 remains +well under the revert threshold, and no previously passing task regressed +functionally. + +Round-19 judge residuals are now lower-signal polish: the stale +`next_token()` "do not use" since note, a direct-child predicate +(`get_current_depth() === $parent_depth + 1`), read-only extraction policy +for partial scans, and factory/serialization fallback clarity. The measured +N03 failure is resolved. + +Follow-up citation-only probe: a text-content recipe probe asked how to collect +an element's text, where SCRIPT/STYLE/TITLE/TEXTAREA contents appear, and what +not to append. All three `gpt-5.4` / `medium` subjects answered correctly and +cited `next_token()`, `get_current_depth()`, and `get_modifiable_text()`. +Interpretation: the text-location facts are discoverable when named directly; +do not promote another text recipe at this tier without weaker-tier or A/B +evidence that task code still fails by transfer rather than model judgment. + +## Round 18 — current-corpus weak-tier baseline scored + +**Train 98.73 / core 98.54** under the current corpus and current weak-tier +policy: subject `gpt-5.4` / `medium` / `priority`, judge `gpt-5.5` / +`xhigh` / `priority`, 15 train tasks × 3 trials. This is the first trusted +current-corpus no-edit baseline after the post-round-17 corpus refresh; round +17 remains historical and is not a comparable baseline for source edits. + +The baseline is nearly saturated but still has one strong train signal: +N03-first-list-count scored 85.07, with all three trials passing 9/11 and +failing only `incomplete-token-inside-list` and +`incomplete-comment-inside-list`. Judges agreed on the root cause: subjects +used the documented HTML Processor depth-bounded subtree pattern and trusted +virtual closers as proof that the bounded region was fully scanned. The docs +do not connect that pattern to `paused_at_incomplete_token()`: after truncated +syntax at the end of input, `WP_HTML_Processor` can still emit virtual closers +while `paused_at_incomplete_token()` remains true and `get_last_error()` stays +null. The next source hypothesis should be general, not task-shaped: document +that region scans which will drive mutations must treat a depth drop as a +structural boundary only, then separately check incomplete-token and parser +error state before trusting the scan. + +A focused citation-only probe against the same staged rendered docs asked +whether an HTML Processor virtual closer proves the source region was complete +when input may be truncated, and which methods to check. All three +`gpt-5.4` / `medium` probe subjects answered correctly and cited +`next_token()`, `paused_at_incomplete_token()`, `get_last_error()`, and +`get_unsupported_exception()`. Interpretation: the facts are discoverable when +the question names the issue, so the source hypothesis should be a short +placement/transfer edit near the subtree-walk and mutation examples, not a +large new concept section. + +Concept means: attributes 100.00, classes 100.00, normalization 100.00, +serialization 99.90, text 99.03, traversal 96.81. Secondary non-failing gaps +remain useful as low-risk polish candidates, especially factory null/failure +fallbacks, where text lives, special-element text lists, and clearer +get_updated_html vs serialize()/serialize_token() contracts, but they should +not displace the measured N03 failure unless diagnostic probes show higher +signal at a weaker tier. + +Prepared the required current-corpus weak-tier calibration round with no source +docblock edits: `round-metadata.json` records 15 train tasks and the staged +scratch directory `/tmp/html-api-docs-eval/round-18`. Scratch isolation +passed: only the two rendered docs and selected task prompts are exposed. +Local Codex CLI subject trials and judge verdicts are complete and ingested: +45/45 subject responses, hidden-test executions, 15/15 judge verdicts, and +subject-isolation attestation are persisted. + +Operational note: the first local judge-runner attempt failed before producing +verdicts because the local Codex structured-output validator now requires +`additionalProperties: false` on nested object schemas. The runner schema was +fixed in a separate tooling commit, then the full judge run was rerun and +validated before ingestion. + +Added a local Codex CLI trial runner to avoid deadlocking on the external +Workflow UI when it is unavailable. The runner writes the same trial-output +shape as the Workflow script, but records `subject_isolation.isolation_mode` +as `isolated-workdir`: each subject gets a private non-repo directory +containing only the two rendered docs, one task prompt, and the output schema; +the task and rendered docs are embedded directly in the subject prompt because +local `codex exec` does not expose the experiment's Read/Grep-only tools; +project rules and user config are ignored, the sandbox is read-only, and the +approval policy is `never`. Scores from this runner must be compared only with +rounds using the same isolation mode and `input_delivery: +prompt-embedded-docs`. +`audit-state.py` now prints the local runner command sequence for prepared +rounds waiting on trials, so autonomous continuations do not reinterpret that +state as an external-only Workflow gate. +Added the matching local Codex CLI judge runner for the next round-18 phase. +It uses the same judge model policy as the Workflow script, runs from the repo +root under a read-only sandbox, and writes the existing judge-output envelope +for `ingest-judges.py`. +`audit-state.py` now prints the local judge command sequence when a prepared +round is trial-complete, so the next autonomous continuation can move straight +to judging once the judge data-export approval is present. + +Added `validate-round.py` as an artifact lifecycle gate. It reports whether a +round is prepared, partially trialed, trial-complete, judged, or scored, and it +lists missing trial, judge, or summary files before a score can be trusted. + +Added `workflow-args.py` to emit trial and judge workflow JSON directly from +`round-metadata.json`, avoiding hand transcription of task IDs, scratch paths, +and model policy when the runner becomes available. + +Hardened trial and judge ingestion plus aggregation for metadata-backed rounds: +trial outputs must match the recorded task/trial matrix, judge outputs must +cover the recorded task set, and aggregation now refuses missing judges, +missing executions, or mismatched task directories instead of silently scoring +them. + +Round preparation now records SHA-256 hashes for every staged rendered doc and +task prompt. Round 18 metadata was backfilled with hashes for the staged +current-corpus baseline scratch files so the exact docs/prompts can be audited +without trusting the transient `/tmp` path alone. + +Round validation and workflow argument generation now verify the recorded +scratch hashes before a prepared round is trusted or handed to agents. This +closes the remaining transient-`/tmp` drift hole: if staged rendered docs or +task prompts change after preparation, validation fails before scoring. + +Round preparation now also records source-file fingerprints for the two HTML +API class files: raw source SHA-256 plus a comment/whitespace-stripped PHP +token-stream SHA-256 matching the docs-only guard invariant. Round 18 metadata +was backfilled with those fingerprints. This is infrastructure/results metadata +only; no source docblock or PHP behavior changed. + +Added `validate-workflow-output.py` and wired it into trial/judge ingestion. +Workflow output files are now checked against round metadata and structured +output shape before any candidate, execution, judge, or summary file is +written. + +Added `validate-corpus.py` so the corpus precondition is reproducible: active +reference implementations are run against their hidden tests before a fresh +baseline is trusted. Current result: 19 active references pass 151/151 cases; +N04 records expected unsupported-markup `wp_trigger_error()` events as +warnings, not output failures. + +Updated `audit-state.py` to detect a matching prepared current-corpus +calibration round and report its lifecycle. For round 18 it now distinguishes +"baseline missing" from "round prepared; launch trials next," while still +blocking scoring on local drift or invalid scratch artifacts. + +Added a `manifest` mode to `workflow-args.py`. The manifest preflights scratch +hashes and emits trial/judge workflow script paths, exact model-policy args, +and the ingest/validation command sequence for the external workflow runner. + +Tightened trial workflow preflight so metadata-backed ingestion rejects +incomplete subject responses before writing partial trial directories: every +trial output must include non-empty `code` and `explanation` strings plus +integer `confidence` 0-100. + +Made the trial launch isolation contract explicit in both the workflow script +and manifest: trusted scored trials require the `docs-test-subject` agent type +or an equivalent Read+Grep-only tool boundary. Prompt-only fallback must be +treated as diagnostic unless transcript isolation is recorded. + +The bundled trial workflow now passes `agent_type: docs-test-subject` on each +subject `agent()` call, instead of relying only on workflow metadata, prompt +text, and returned isolation attestation to describe the required boundary. + +Round 18 was restaged before launch after the tooling-only isolation commits. +The refreshed metadata now records git head `5d02b91636`; rendered-doc, task +prompt, source, and corpus file hashes stayed unchanged. + +The launch manifest now reports current checkout provenance separately from +round metadata provenance, plus SHA-256 hashes for the trial and judge workflow +scripts. This avoids treating metadata's staged content ref as the workflow +execution ref after tooling-only commits. + +Trial and judge ingestion now refuse to overwrite persisted artifacts. Existing +trial files, `subject-isolation.json`, `judge.json`, or `round-summary.json` +must be reconciled explicitly before a runner output can be ingested again. + +`workflow-args.py` now runs `validate-corpus.py` for the exact tasks selected +in the round metadata before emitting trial, judge, or manifest payloads, so +the launch handoff cannot skip reference-fixture validation accidentally. +The manifest's human-readable preflight command now mirrors that exact task +selection instead of using a train-split shortcut. +`workflow-args.py` can also write the emitted JSON with `--output`, so the +external runner handoff can persist exact launch payloads without manual +copy/paste. + +`validate-round.py` lifecycle counts now require valid artifacts. Malformed +trial files or judge verdicts no longer count toward `trials-complete` or +`judged` just because the files are present. +`persist-trials.py` now validates harness execution JSON before finalizing a +trial artifact directory, and removes the just-created trial directory if the +harness output is unusable. +That cleanup now applies to the entire current ingest attempt, preventing a +mid-batch harness failure from stranding earlier trial artifacts without a +matching isolation attestation. +`ingest-trials.py` now also writes the isolation attestation atomically and +removes the current attempt's trial directories if attestation persistence +fails. +Judge ingestion now similarly removes artifacts created by the current attempt +if judge writing, post-write validation, aggregation, or summary persistence +fails. + +Tightened judge workflow preflight and schema hints so malformed judge verdicts +cannot be persisted: trial notes, failure analysis, and doc-gap fields must be +non-empty strings, and hallucinated method entries must be strings. + +Round validation now verifies recorded HTML API source digests against their +recorded git ref, in addition to staged scratch hashes. This makes round 18's +metadata provenance check executable instead of merely documentary. + +Round validation now also content-checks trial artifacts before reporting a +round as trial-complete: candidate files must be non-empty PHP, responses must +carry explanation/confidence, and execution files must contain harness +pass/total/cases data. + +Round validation now also content-checks persisted judge artifacts before +reporting a round as judged or scored: `judge.json` files must contain exactly +the expected trial verdicts, integer adherence scores, string +hallucinated-method entries, non-empty notes, non-empty failure analysis, and +structured doc-gap fields. + +Trial ingestion now rejects subject `code` payloads that do not start with +`' sample, T06/T08 single cases, judge +adherence spread). No new actionable gap. + +Round 17 runs as a HOLD round — no doc edits — to measure pure +round-to-round variance and sharpen the noise floor against which +future deltas are judged. + +## Round 15 — Haiku, checkpoint: T05 cured; N05 one placement away + +**All-19 96.16 / train 97.59 / held-out 90.79 (flat vs 91.04 — N05's +single 0/7 trial swings the 4-task holdout mean ±10).** T05 back to +9/9×3 (construction-asymmetry note), T08 +15.5. N05's only failure +called create_full_parser() on the wrong class while otherwise +following the documented TITLE idiom — the asymmetry note exists but +not where that subject was reading. + +Round-16 hypothesis (committed): one-line asymmetry reminder inside +get_modifiable_text() on the Tag Processor (placement refinement of +the same train-licensed hypothesis). + +## Round 14 — Haiku, the construction-asymmetry gap crosses into train + +**Train 95.92 (−2.6).** The dip is dominated by one T05 trial (1/9) +that hallucinated WP_HTML_Tag_Processor::create_fragment() — the exact +failure held-out N05 has shown since round 12, which the protocol +correctly refused to act on until train evidence appeared. It now has. +Remaining wobbles are single-case sampling noise (N06 5/7, T06 7/8, +T09 7/8). + +Round-15 hypothesis (committed BEFORE this entry, after the trials but +ahead of judging): construction asymmetry stated on both classes — +new-only for the Tag Processor, factories only on the HTML Processor. +Round 15 is a held-out checkpoint; N05 should now benefit directly. + +## Round 13 — Haiku, first 100% functional sweep + +**Train 98.54; 45/45 trials passed 343/343 hidden cases — first fully +clean round of the campaign.** T08 +20.7 → 96.9 (implied-structure +rule), T06 +5.9 → 99.6. All remaining score variance is +adherence-judge prose assessment; judges' gap lists are now +second-order discoverability nits (the chooser is abstract; the +recipe lacks a measurement example). + +Round-14 hypothesis (committed): decoded-UTF-8/mb_substr measurement +note at the recipe's accumulation point (flagged twice by T05). + +## Round 12 — Haiku, checkpoint: held-out at new high + +**All-19 96.05 / train 97.39 / held-out 91.04 (new high; was 88.79 at +round 9, 87.38 at the round-2 baseline).** N05 +12.4 → 70.6: two +perfect trials at last (the walk-path RCDATA note generalized); its +remaining failure is a NEW, narrower gap — a trial hallucinated +WP_HTML_Tag_Processor::create_fragment() (the factory exists only on +the HTML Processor). That construction-asymmetry gap has only ever +been flagged from held-out, so no edit — monitoring for train +evidence. T08 had one 1/8 relapse (implied-TBODY depth surprises); +T06 trials add needless is_tag_closer() guards. + +Round-13 hypotheses (committed): the skip-default's consequence stated +affirmatively (no closer guard needed after plain next_tag()); implied +elements appear in walks (synthesized TBODY verified), anchor on +matched depth rather than absolute numbers. + +## Round 11 — Haiku, equality-case fix lands; asymptote territory + +**Train 98.28 (within noise of round-10's 98.70).** T03 +5.2 → 98.9 +(the stated-causally equality rule); T09 100.0; remaining misses are +single hidden cases (T06 ×2, T08 ×1). Judge findings are now +prose-bleed nits: a trial attributed remove_class's +attribute-dropping to add_class; the quoting caveat and the +byte-preservation rule live far apart. + +Round-12 hypotheses (committed): add_class add-only scope stated +contrastively; only-written-attributes-requoted co-located with +get_updated_html's contract. Round 12 is a held-out checkpoint. + +## Round 10 — Haiku, T08 perfect for the first time + +**Train 98.70 — new high.** T08 +10.0 → 96.8 with 8/8 in every trial +(RCDATA-on-the-walk-path + walk-to-EOF caveat completed the cursor +series begun in round 9). Failure-handling and classes at 100. The +only functional miss in the whole train set: one T03 trial (7/8) again +sampling the `>` bound; judges note the equality case (child closer +depth == ancestor opener depth) is shown numerically but never stated +as the REASON for `>=`. + +Round-11 hypotheses (committed): the equality case stated causally on +get_current_depth(); empty-region flush property added to the +closer-driven state-machine note. + +## Round 9 — Haiku, checkpoint: train 98.66 (high), shared-cursor fix lands + +**All-19 96.58 / train 98.66 (+1.0, new high) / held-out 88.79.** +T08 +8.7 → 86.8 with no sub-50% trials (one-cursor contract + +state-machine example); T10 +2.6; 17/19 tasks functionally perfect. +N05 (58.2) is the only weak task left anywhere: subjects now apply the +well-taught walk-for-#text recipe to TITLE, where it silently returns +'' (RCDATA has no #text children — verified). The exception lived only +in get_modifiable_text(), off the walk path. + +Round-10 hypothesis (committed): the RCDATA exception stated inside +next_token()'s walk guidance + the unguarded-walk-runs-to-EOF caveat +(train-licensed via T05 round-7 and T08 round-9 gaps). + +## Round 8 — Haiku, UTF-8 fix lands; T08 isolated as the last functional gap + +**Train 97.70 — new high.** T05 +14.0 → 99.3 (UTF-8/mb-encoding +statement); T07 at 100; T01 produced the experiment's first EMPTY +judge gap list (smoke task fully saturated). Only T08 weak (78.1, +traversal 91.3): failing trials nest collect-until-close loops which +double-advance the single shared cursor — the inner loop exits already +matched on the next region's boundary token and the outer loop's +next_token() skips it (second cell of each row dropped, rows lost). + +Round-9 hypotheses (committed): the one-cursor contract on +next_token() with a verified closer-driven single-pass state-machine +example (DT terms from a DL); the last-X bookmark idiom surfaced at +the top of the bookmarks narrative (T10). + +## Round 7 — Haiku, RCDATA + drain idioms land + +**Train 97.51 (statistically flat vs round-6 train 97.84; nothing near +the revert threshold).** N03 → 100 (drain idiom), failure-handling +concept 100, 13/15 tasks functionally perfect across all trials. +Remaining wobbles: one T05 trial 5/9 (sliced multibyte text without an +explicit mb encoding — docs never said output is UTF-8) and T08's +boundary confusion resurfacing in break-form code that the +continue-form-only `>=` warning misses. + +Round-8 hypotheses (committed): UTF-8 output statement + explicit +mb-encoding idiom on get_modifiable_text() in both classes; the +break-form boundary equivalence (break at `< depth`, never `<=`). + +## Round 6 — Haiku, checkpoint: held-out generalization confirmed + +**All-19 95.92 / train 97.84 (+3.1) / held-out 88.69** (vs 87.38 at the +round-2 baseline and 75.22 at round 3 — held-out now ABOVE baseline on +purely train-driven edits). T06 +24.5 and T08 +20.0 (chooser + +tree-awareness boundary landed); T04 holds at 98.7; H04 and N02 perfect. +N05 remains the only weak task (60.6): two trials still walked TITLE +looking for #text children. Its root cause is covered by a TRAIN gap +(T08 flagged that the HTML Processor's get_modifiable_text() override +documents neither decoding nor where RCDATA text lives) — so the fix is +train-driven, as the protocol requires. + +Round-7 hypotheses (committed): RCDATA/raw-text contents live on the +element token, with a verified full-parser TITLE example, plus the +decoding statement, on the HTML Processor override; the >= rule beside +the operator with the nested-closer/sibling-text note inline; the +drain-all-tokens idiom on paused_at_incomplete_token(); add_class() +return = enqueued-not-applied. + +## Round 5 — Haiku, template section lands; tree-awareness boundary surfaces + +**Train 94.77 (+0.6).** T04 +49.2 → 98.6: all trials used the new +'Building markup from a template' section; attributes concept 74.7 → +99.3. Offsetting single-trial collapses: T06 −26.4 (one trial tried +tree-aware work in the Tag Processor — whose docs never say it lacks +depth/breadcrumbs) and T08 −15.1 (breadcrumbs-on-closer confusion); +plus one T03 trial copied the next_token() example but guessed '>' +since the >= warning lived only in get_current_depth(). + +Round-6 hypotheses (committed): processor-chooser sections in both +class docblocks with the no-tree-awareness boundary stated; a real +description for get_updated_html() (was a verbatim copy of +__toString's); the >= warning inline in the next_token() example. +Backlog: breadcrumbs read on a closer token (last crumb is the parent, +not the closed element); empty elements still produce closers. + +## Round 4 — Haiku, serialization boundary + modifiable-text fixes + +**Train 94.18 (+3.5 vs round-3 train).** T07 +35.0 → 100 (the +serialize()-vs-get_updated_html() boundary cured the induced +regression — refine-not-revert vindicated). T08 +8.1, T06 +6.3, +T10 +2.5. T04 +4.3 but still 49.4: each failing trial absorbed exactly +ONE of the two template-building facts (placeholder text OR attribute +order) — they live in distant method docblocks. + +Round-5 hypotheses (committed): +1. 'Building markup from a template' overview section uniting + pre-seeded attribute order + placeholder text, verified link-card + example unlike any corpus task (T04). +2. next_tag() 'What this matches' contract: ASCII case-insensitive + names, comments/rawtext never match, truncated tails never matched + (T01/T03/T10 backlog). +3. get_attribute() returns decoded values; add_class() idempotency + with exact byte-for-byte duplicate check (probe caught and fixed a + wrong case-insensitivity claim before commit). +4. Why the subtree walk uses >= — deep-nesting rule, '>' failure mode + verified (T08). + +## Round 3 — Haiku, first edits under test on revised corpus (checkpoint) + +**All-19 87.41 / core 85.92 / train 90.66 (−1.9) / held-out 75.22.** +Mixed: round-3 edits helped their targets — T09 +8.6, T12 +2.2, N06 ++10.7 (support-claims rewrite), N04 at 100 — but the serialize_token() +idiom INDUCED a T07 regression (−33.7): two trials called serialize() +after add_class(), got null (scanning had begun), and fell back to the +unmodified input. Decision: refine, not revert, disclosed here — the +edit measurably helped its targets; the harm is one missing boundary +statement (get_updated_html() vs serialize()). T04 unchanged (45.1): +trials missed the placement note AND hit a new gap — calling +set_modifiable_text() on an empty FIGCAPTION is a silent no-op (no +#text token exists). Held-out N05 fell further (RCDATA text location; +still no edit — held-out must not drive edits, but the T04-driven +modifiable-text inventory edit covers the same general fact). + +Round-4 hypotheses (committed): +1. Serialization is not how you read edits — boundary stated on + serialize() and serialize_token(); get_updated_html() is the + post-edit read path (T07). +2. Which tokens carry modifiable text: container elements carry none, + empty elements cannot receive text, placeholder-template idiom, + check the return value (T04). +3. Bookmark same-name re-set MOVES the bookmark — the last-X idiom + (T10 adherence); also stated tag_closers default ('skip'). + +Train gap backlog (not yet acted on): tag-name query case-insensitivity; +comment/rawtext can't match next_tag(); add_class idempotency at the +method heading; get_attribute returns decoded values; get_namespace and +foreign-content naming; Tag-vs-HTML-Processor chooser note; multi-cell +subtree text-collection example; get_updated_html prominence in the +HTML Processor method index. + +## Round 2 — Haiku re-baseline on the revised corpus + +All 19 tasks × 3 Haiku trials against the round-1 docs. **All-19 91.47, +core 90.47, train 92.56, held-out 87.38.** Round-1 doc edits transfer +to Haiku: T03 and T06 (round-0's worst) are perfect. + +Per-concept means (the new labels paying off — the aggregate hides +these): attributes 72.2, full-document 78.0, namespace 85.9, +traversal 91.6, vs classes/failure-handling ~99. + +Diagnosed causes: +- T04 build-figure 44.3 (two 0/6 trials): output correct except src/alt + order — set_attribute() placement rules are undocumented (verified: + in-place update keeps position; new attributes insert after the tag + name sorted by NAME, not call order). +- N05 document-title (held-out) one 2/7 trial: subject walked TITLE + looking for #text children; RCDATA text lives on the tag token. No + doc edit made — held-out must not drive edits; noted for monitoring. +- T08 adherence 55-72: the false class-docblock claims (tables/foreign + content/head unsupported) still driving defensive fallback code. +- T09 adherence 52-76: serialize_token() purpose/idiom undocumented. + +Round-3 hypotheses (committed before round 3 trials): +1. set_attribute() placement rules + order-control idiom (also fixes + the judge-found get_next_tag() typo). +2. Correct class-level support claims with verified abort conditions + (foster parenting, advance-rewind formatting reconstruction) and how + aborts surface (get_last_error/get_unsupported_exception/null). +3. serialize_token() rewrite idiom with verified example. + +Operational note: first judge attempt hit the account session limit and +returned zero verdicts; retried clean after reset. Isolation: trial +transcripts spot-checked, zero external reads. + +## Corpus revision (after Jon's review) + +Per the review: stay task-first; train was saturated for Sonnet and +clustered on a few patterns. Changes: +- Added N01 (remove class), N02 (images inside figures), N03 (detect + truncated HTML), N04 (can-normalize failure handling), N05 (document + title via full parser), N06 (HTML img vs SVG image). All references + validated in the harness; N02/N05/N06 cross-checked against + Dom\HTMLDocument (including the image→img conversion and + img-breaks-out-of-svg parsing behaviors). +- Held-out is now N01/N02/N05/H04 (class manipulation, contextual + selection, full-document, advanced extraction). H01–H03 retired to + corpus-retired/. T01/T02 relabeled smoke. +- All tasks labeled (role, commonness, concept, processor); + aggregate-round.py now reports per-concept and per-split means. +Held-out history note: round-0 held-out (93.47) was measured on the OLD +held-out set; the new set's baseline comes from the Haiku re-baseline. + +## Round 1 — closer-depth semantics, next_token() rehab, decoded text + +Doc edits under test (commits 58140b2235, 2d763ed14f, 0b9366fe70): +closer-token depth rule on get_current_depth()/is_tag_closer(); rewrite +of WP_HTML_Processor::next_token() with the canonical subtree-walk +example; explicit decoded-text rule on get_modifiable_text(). + +**TRAIN 98.78 (+5.21 vs round-0 train 93.57).** 36/36 trials passed +100% of hidden cases — the first all-green functional sweep. +- T03 +13.95 → 100: all trials now use the documented `>=` depth guard + and several cite the new next_token() example and decoding rule + verbatim in their explanations. +- T06 +46.33 → 99.8: the two previously-empty-result trials are gone. +- No regression beyond judge noise (T07 −0.7, T08 −0.7; threshold 2.0). +All three hypotheses confirmed; nothing reverted. + +Residual signal for round 2 (adherence-only; functional is saturated +for Sonnet): +- T08 adherence stuck at 68–78: the misleading "tables unsupported" + bullet still causes defensive fallback code; "which class do I use" + guidance still missing. +- Judge-discovered doc bug: paused_at_incomplete_token() example calls + nonexistent `get_next_tag()` (should be `next_tag()`). +- next_tag() contract never states it matches only real tag openers + (comments/rawtext can't match); get_updated_html() description is a + copy of __toString()'s and never says it applies queued edits. + +Sonnet train score has now been ≥90 for two consecutive rounds — per +PLAN.md, switch the test model to Haiku and re-baseline before further +edits. Isolation: round-1 transcripts spot-checked, zero external +reads (same benign grep-on-scratch and draft-write-to-scratch pattern). + +## Round 0 — baseline + +Unmodified docs. All 16 tasks (12 train + 4 held-out) × 3 Sonnet trials, +to establish the train baseline and the held-out baseline for later +checkpoints. Isolation note: run from the session that created the +`docs-test-subject` agent type, so trials used a general agent with +prompt-level restriction; all 48 transcripts scanned — zero reads outside +the scratch dir (two benign Bash greps of the scratch markdown, one +solution draft written into scratch). + +**TRAIN 93.57 / HELD-OUT 93.47** (scores 0–100; 0.7·pass + 0.3·adherence). + +Weak spots and judge-diagnosed causes: +- T06 collect-links 53.5 (two trials 1/8) and T03 first-h1-text 86.1 + (all trials 7/8, same case) and H04 trial-3 1/7: all share one root + cause — nothing documents that a tag-closer token reports the PARENT's + depth (element already popped), and no doc shows the canonical + "walk a subtree until it closes" loop. Subjects guessed + `depth <= opener_depth` break conditions and exited subtrees early or + collected nothing. +- T08 table-extract 92.3 but adherence only 70–77: the "Supported + elements" bullet wrongly implies tables abort the HTML Processor, so + subjects bolted on needless fallbacks; also get_modifiable_text() + never states its output is entity-decoded (several subjects added a + redundant html_entity_decode pass, risking double-decode bugs). +- T12 unwrap-spans adherence 88: the next_token()/serialize_token() + selective-rewrite idiom is undocumented; subjects mixed it with + whole-string normalize() unsure which was right. + +Round-1 hypotheses (each its own commit): +1. Document closer-token depth semantics on get_current_depth() and + is_tag_closer(). +2. Add the canonical subtree-walk example (depth guard + breadcrumbs + alternative) to WP_HTML_Processor::next_token() and soften its + "use the Tag Processor instead" steer. +3. State that get_modifiable_text() returns decoded text (and + set_modifiable_text() encodes), with a one-line example. +Deferred to round 2 (adherence-only): serialize_token() rewrite idiom; +"which class do I use" guidance; fix the tables-unsupported bullet. diff --git a/doc-experiment/NEXT-HYPOTHESES.md b/doc-experiment/NEXT-HYPOTHESES.md new file mode 100644 index 0000000000000..cc8ba38366a40 --- /dev/null +++ b/doc-experiment/NEXT-HYPOTHESES.md @@ -0,0 +1,1015 @@ +# Next hypotheses and test strategy + +This document captures the next phase after round 17. The current train +score is high enough that another ordinary "add the latest judge gap" loop +has weak signal. The next tests should deliberately lower model capability, +increase the signal density of the rendered docs, and separate content gaps +from discoverability gaps. + +## Current read + +Latest update: rounds 58/59 and 60 tested two weak-tier traversal-boundary +scratch A/B variants against the round-58 control. Both lost: the compact +closer card scored 90.74 vs 97.35, and the full bounded-loop/regional +completion recipe scored 90.18 vs 97.35. Round 61 then ran citation-only +probes on current source docs for the remaining method-local contracts: +plain `next_tag()` is not a subtree-boundary detector, bounded-region +completion does not require EOF draining for unrelated suffix markup, +breadcrumbs include the current node and breadcrumb queries are DOM sub-paths, +and `WP_HTML_Processor` should be created through `create_fragment()` or +`create_full_parser()`. All probes passed 3/3 at `gpt-5.4-mini` / `low`. +A follow-up attribute-value probe also passed 3/3 for the +`get_attribute()` return cases (`null`, `true`, `''`, decoded strings), but +subjects noted that the docs do not explicitly name the +`is_string( $value ) && '' !== $value` style predicate for usable non-empty URL +strings. + +Do not promote either traversal variant, and do not promote a constructor or +breadcrumbs source edit from these probes alone. The facts are discoverable +when asked directly, and the transfer-oriented A/B variants lost. + +Next action: keep the selected subject policy at `gpt-5.4-mini` / `low` / +`priority` and pause under the signal-exhaustion rule instead of adding +speculative prose. +Full-round reanalysis found no remaining non-held-out, non-noise train pattern +strong enough to justify a source docblock edit. Keep the usable-attribute +predicate as backlog unless a train task repeats the confusion; held-out N02 +alone is not a source-edit driver. Resume only if the corpus changes, a future +trusted train round repeats one of the backlogged patterns, or the experiment +owner explicitly asks to test a new hypothesis despite the weak signal. + +Round 17 was a no-edit hold round on the previous active corpus and scored +98.93 on train. After that hold round, several active tasks were intentionally +replaced or tightened: N03, N04, N06, T07, T11, H04, plus smaller prompt or +reference updates. Those committed corpus changes reset comparability: round +17 remains a trusted historical score for the previous corpus, but it is not a +current-corpus baseline. + +Round 18 is the first trusted current-corpus no-edit baseline: +`gpt-5.4` / `medium` / `priority` subjects, `gpt-5.5` / `xhigh` / +`priority` judges, train score 98.73 / core 98.54. The current tier is close +to saturated, but it produced one concrete train failure with three-trial +agreement: N03-first-list-count scored 85.07 because all trials trusted +HTML Processor virtual closers after truncated syntax inside the scanned +region. This is usable source-edit evidence because it is a current-corpus +train failure, not held-out-only signal. + +The next valid action is either a focused source hypothesis for the N03 +incomplete-token subtree-guard gap, or another no-edit weak-tier calibration +one step down the subject ladder if the experiment owner wants a less +saturated measuring instrument before promotion. Do not compare round 18 +against round 17 except as historical context. + +A focused citation-only probe after round 18 asked the current subject tier +whether an HTML Processor virtual closer proves a truncated source region was +complete, and which methods to check. All three probes answered correctly and +cited the relevant rendered-doc headings. Round 19 promoted the resulting +placement/transfer edit as a generic class-level recipe plus compact +method-local guard notes. N03 moved from 85.07 to 100.00 with all three +trials at 11/11 and 100 adherence, so this hypothesis is confirmed. + +Round 20 calibrated the next subject setting, +`gpt-5.4` / `low` / `priority`, against the same current docs. It scored +99.43 train / 99.34 core with every hidden test passing, so this tier is still +too saturated to be the main source-edit driver. Its adherence-only signal +does support generic class-level recipe candidates, especially DOM-style text +collection and token-rewrite completion policy. The next protocol-consistent +action is a no-edit calibration one step lower, `gpt-5.4-mini` / `high` / +`priority`, or a scratch A/B for the generic recipe idea if the owner chooses +diagnostics over another ladder step. + +Round 21 scored a broad HTML Processor recipe edit. It did not cross the +revert threshold, but it was not a clean win: T09 improved slightly, while T05 +fell because all three subjects still chose the Tag Processor's lexical token +walk for a BODY-fragment text-content task. Treat the next text hypothesis as +processor-choice/discoverability work in the Tag Processor docs, not as more +HTML Processor recipe prose. + +Round 22 restored the current-docs no-edit calibration at +`gpt-5.4` / `medium` / `priority`. It scored 99.45 with all hidden tests +passing and reproduced the same T05 signal: all three T05 trials chose +`WP_HTML_Tag_Processor`, passed hidden tests, and lost adherence because the +Tag Processor token-walk example competed with the HTML Processor +text-content guidance. This makes the Tag Processor lexical-text boundary the +best next source hypothesis. + +A round-22 citation-only probe confirmed that this is placement/transfer +rather than a missing fact: all three `gpt-5.4` / `medium` subjects correctly +selected `WP_HTML_Processor::create_fragment()` for parsed BODY-fragment +text-content extraction when asked directly, and cited both the Tag Processor +lexical sections and the HTML Processor text recipe. Promote only a short +contrast near the Tag Processor text example, not another broad HTML Processor +recipe. + +Round 23 confirmed that source hypothesis. The narrow Tag Processor placement +edit moved T05 from 96.70 to 99.20, and all three subjects chose +`WP_HTML_Processor::create_fragment()` for the parsed BODY-fragment text task. +All hidden tests passed across the round, with train 99.50 / core 99.42. +Treat the lexical-text boundary as resolved for now. + +The next text signal is the extraction policy boundary inside the HTML +Processor docs: ordinary subtree text means `#text` tokens by default; +TITLE/TEXTAREA/SCRIPT/STYLE opener-token modifiable text is an explicit +caller opt-in; and read-only text walks need a caller policy for +`get_last_error()` or `paused_at_incomplete_token()` rather than automatically +discarding already collected text. Round-23 T03, N06, and T05 judge notes all +pointed at this shape. + +Round 24 checkpoint stayed stable after the Tag Processor source edit: +99.35 all / 99.41 train / 99.12 held-out, with every hidden test passing. +T05 held at 99.80, so the processor-choice fix generalized through the +checkpoint. The next diagnostic should be citation-only, not a direct source +edit: ask whether the rendered docs already distinguish ordinary `#text` +subtree extraction, special-element opener text as opt-in, and read-only +fallback policy after `get_last_error()` or `paused_at_incomplete_token()`. +Keep the T09/T12 serialization fallback and decoded-text reparse signal as a +separate hypothesis. + +The round-24 read-only text policy probe passed 3/3 at +`gpt-5.4` / `medium`: subjects found the ordinary `#text` rule, the +special-element opt-in rule, and the caller-policy distinction for read-only +fallbacks. Treat this as a placement/density problem before editing source. +The next diagnostic should be a scratch rendered-doc A/B that adds a compact +policy matrix near the HTML Processor text recipe and/or `next_token()`, then +tests whether task implementation stops over-including special-element opener +text. + +Round 25/26 tested that scratch policy matrix. It raised the three-task +paired subset from 98.70 to 99.17 and made T05 perfect, but it was not a +clean source-promotion win: T03 moved from one special-element over-inclusion +in the control to three in the variant, and N06 still over-included +special-element opener text inside heading text. Treat the matrix as mixed/no +promotion. If continuing this hypothesis, test a narrower scratch variant +with a negative example that makes the default exclusion rule dominant: +ordinary heading/subtree text reads only `#text`; SCRIPT/STYLE/TITLE/TEXTAREA +opener text is explicit opt-in, not automatically part of ordinary text. + +Round 27/28 tested that narrower scratch variant. It improved the paired +subset from 99.27 to 99.50, moved N06 from 98.20 to 98.90, and eliminated the +special-element over-inclusion pattern in both T03 and N06 while preserving +T05's explicit TITLE/TEXTAREA inclusion behavior. This is promotable as an +adapted source hypothesis: add default-first ordinary-text policy and +explicit opt-in wording near the HTML Processor text recipe. Do not copy the +scratch negative example's `null !== get_modifiable_text()` guard; teach +token-type/name guards instead because `get_modifiable_text()` returns a +string and is not a presence test. + +Round 29 promoted that adapted source edit. It is mixed: T03 and T05 improved, +but N06 still over-included special-element opener text in all three trials. +Judges identified the method-local `next_token()` special-element paragraph as +the remaining competing cue. Keep the source edit under the revert rule, but +do not spend more source budget on broad class-level text recipes. A further +text hypothesis should be method-local and scratch-tested against the +`next_token()` wording before promotion. + +Round 29 also exposed a stronger current train functional failure unrelated +to the text edit: T07 trial 2 ran one `next_tag()` scan for `UL`, then another +for `OL`, assuming the second scan restarted from the beginning. It did not; +`next_tag()` is cursor-relative. This same family appeared earlier in +N03-style sequential tag searches. Treat HTML Processor `next_tag()` cursor +semantics and first-of-several-tags idiom as a strong next source candidate. + +Rounds 30/31 confirmed that candidate in scratch rendered docs, and round 32 +confirmed it as a source edit. The method-local `WP_HTML_Processor::next_tag()` +card raised train from 98.31 to 99.67, recovered T07 from 81.13 to 99.30, and +kept N03 perfect. Treat the cursor/OR-search gap as resolved for now. + +The next diagnostic tested the user-suggested "generic recipes in the main +class documentation" direction as a compact depth-bounded traversal card. +Rounds 33/34 showed that this was promotable after a held-out checkpoint: +variant 99.08 vs control 97.34 on N03/N06/T06/T08, with N03 recovering from +94.46 to 100.00 and T08 improving from 96.50 to 98.00. The remaining +special-element over-inclusion signal did not disappear and should stay +separate. Round 35 supplied the checkpoint: all 99.47 / train 99.50 / +held-out 99.38, with all hidden cases passing and held-out above round 24. +Round 36 confirmed the source promotion: train 99.65 / core 99.59, all 45 +subject trials passed all hidden cases, N03 stayed 100.00, T07 rose to +100.00, and T08 rose to 98.50. Treat the depth/direct-child card as resolved +for now. Next action: analyze the remaining trusted judge notes and choose a +separate diagnostic; the strongest recurring candidates are the +special-element ordinary-text policy near `next_token()` / +`get_modifiable_text()` and normalized-output fallback policy for +`serialize_token()` rewriters. + +Rounds 37/38 tested a method-local text-policy scratch variant near +`next_token()` and `get_modifiable_text()`. It lost 98.72 vs 99.18 on the +paired subset and did not eliminate special-element opener over-inclusion. +Do not promote that wording. The next best action is the separate +normalized-output / `serialize_token()` fallback citation-only probe. + +Round 39 ran that citation-only probe. It passed 3/3: subjects found the +factory-null versus later parser-abort distinction, incomplete-token policy, +the accumulated `serialize_token()` output rule, and the warning that +`normalize( $html )` discards emitted rewrites. Treat this as evidence that +the facts are present and discoverable when directly asked. The next +diagnostic, if pursuing this hypothesis, should be scratch A/B transfer +testing on implementation tasks, not a source edit from the probe alone. + +Rounds 40/41 tested that transfer with a scratch-only fallback-policy card. +The variant won 99.83 vs 99.57 on T09/T12/N04, mainly by moving T12 to +100.00 while keeping N04 perfect. T09 dipped 99.80 -> 99.50 because one +variant trial still used `normalize( $html )` after the rewrite loop, so +source promotion should adapt rather than copy the scratch wording. Next +action: run a checkpoint before promoting another source docblock edit. + +Round 42 supplied that checkpoint: all 99.29 / train 99.54 / held-out 98.38, +with all 57 subject trials passing hidden cases. Held-out fell 1.0 from round +35, mostly one N05 adherence-only trial, but this is below the revert +threshold and not a source-edit driver. The promotion gate is clear. Next +action: promote one adapted source docblock hypothesis for serialization +fallback policy, emphasizing that after a `serialize_token()` rewrite loop the +accumulated string is the rewrite, while `normalize( $html )` on the original +input and raw-input return paths both abandon emitted changes unless the +caller deliberately chooses them as fallbacks. + +Round 43 scored that source promotion. It was neutral, not a clean win: train +fell 99.65 -> 98.18 versus the comparable scored-train source round, below the +2-point revert threshold and without an all-trial task regression. The drop +came from one T05 PHP `preg_match_all()` bug that the judge classified as not +HTML API misuse. Serialization targets stayed stable (N04 100.00, T12 99.80, +T09 99.10) but the raw-input fallback near-miss persisted. Keep the source +edit under the revert rule, but do not immediately add more fallback-policy +source prose without a fresh diagnostic. + +Rounds 44/45 revisited the text-policy transfer problem with a scratch-only +decision-table variant. The variant won 99.56 vs 98.94 on T03/T05/T06/T08/N06, +with all hidden cases passing. It eliminated the special-element opener-text +over-inclusion pattern in T03, T08, and N06, while T06 dipped only 0.5 from an +unchanged read-only partial-scan policy near-miss. Treat this as promotable +after the checkpoint gate: run a checkpoint before editing source, then promote +an adapted compact table / method-local opt-in reminder if held-out remains +stable. + +Round 46 supplied that checkpoint: all 99.36 / train 99.63 / held-out 98.33, +with all 57 subject trials passing hidden cases. Held-out was effectively flat +versus round 42 and did not show a functional regression. The promotion gate is +clear. Next action: promote one adapted source docblock hypothesis for the +text-policy decision table in `WP_HTML_Processor`, keeping the compact +decision-table shape and method-local opt-in reminder while preserving the +caller-policy framing for read-only partial scans. + +Round 47 confirmed that source promotion: train 99.55 / core 99.48, all 45 +train trials passed hidden cases, and the ordinary `#text` vs special-element +opener-text boundary held across T03/T05/T06/T08/N06. Keep the source edit. +The remaining train near-miss is narrower: read-only extractors still often +discard already visited tokens when `paused_at_incomplete_token()` is true. +Because the fact is already present but weakly transferred, the next valid +action is a scratch rendered-doc A/B, not a direct source edit. Test a compact +read-only completion-policy note/example against T05/T06/T08/N06, with the +decision framed as best-effort extraction versus complete-source validation. + +Rounds 48/49 tested that scratch variant. It won 99.65 vs 99.03 on the +paired T05/T06/T08/N06 subset with all hidden cases passing. T05 moved to +100.00, T08 to 99.80, and N06 to 100.00; T06 dipped to 98.80 because one +trial still failed closed on `get_last_error()`. Treat the note as promotable +after the checkpoint gate, but adapt rather than copy: keep it short, keep the +caller-contract framing, and do not imply that all read-only extraction should +keep partial results. + +Round 50 supplied the checkpoint: all 99.08 / train 99.65 / held-out 96.93. +The held-out decline is below the revert threshold, but N02 had one functional +holdout miss from treating a breadcrumbs query as arbitrary-depth containment. +Keep that as sentinel-only evidence; held-out must not drive the next edit. +Per owner direction, pause source promotion and move to weaker-tier testing. +Next action: run a no-edit `weak-tier-calibration` on current docs with the +next protocol subject tier, `gpt-5.4` / `low` / `priority`. + +Round 51 supplied that calibration: train 99.65 / core 99.59 with all 45 +subject trials passing hidden cases. This tier is still saturated enough that +the remaining signal is adherence-only, concentrated in read-only completion +policy for T05/T06/N06 and normalized rewrite fallback for T09. Record +`gpt-5.4` / `low` as a current-docs no-edit baseline, but do not promote a +source edit from it. The next protocol-consistent action is to step down to +`gpt-5.4-mini` / `high` / `priority` and run another no-edit +`weak-tier-calibration`. + +Round 52 supplied the `gpt-5.4-mini` / `high` calibration: train 99.53 / core +99.46, again with all 45 subject trials passing hidden cases. This tier is +also saturated. The strongest adherence-only signal is now serialization +fallback policy for string-returning `serialize_token()` rewrites: T09 scored +98.60 and T12 scored 98.90 because candidates still used raw input or +`normalize( $html )` as generic fallbacks after accumulating rewritten output. +Text extraction remained strong, with T05 and T06 at 99.60 and N06 at 99.20. +Do not promote source docs from this saturated calibration alone. The next +protocol-consistent action is to step down to `gpt-5.4-mini` / `low` / +`priority` and run one more no-edit `weak-tier-calibration`. + +Round 53 supplied the final `gpt-5.4-mini` / `low` calibration: train 99.51 / +core 99.43, with all 45 subject trials still passing hidden cases. The ladder +is exhausted and still saturated, so use `gpt-5.4-mini` / `low` as the +selected weak diagnostic tier rather than looking for another model. The +strongest repeated signal is serialization fallback policy for string-returning +`serialize_token()` rewrites: T12 scored 98.60 and T09 scored 99.10, again +because candidates used raw input or `normalize( $html )` as generic recovery +after accumulating rewrite output. Next action: run a focused scratch +`shadow-doc-a/b` diagnostic on T09/T12, and optionally N04 as a normalization +control, testing a compact generic class-level recipe/card for rewrite output +and explicit fallback policy. Do not edit source docs until that variant wins. + +Rounds 54/55 supplied that diagnostic. The scratch-only variant won 99.53 vs +98.87, raised serialization from 98.30 to 99.55, moved T09 from 98.50 to +99.60, and moved T12 from 98.10 to 99.50. N04 dipped from 100.00 to 99.50 +because one variant trial used `create_fragment()` + `serialize()` rather than +the direct `normalize()` helper, but all N04 hidden cases still passed. The +variant eliminated the worst control behavior of rebuilding a text token from +decoded `get_modifiable_text()` plus `htmlspecialchars()`, and improved the +fallback-policy transfer. Promote an adapted source edit in +`WP_HTML_Processor`: a compact class-level string-rewrite checklist plus a +method-local `serialize_token()` wrapper / anti-pattern example. Keep fallback +wording as caller policy; do not prescribe one universal return value. + +Round 56 confirmed that adapted source edit under `scored-train`: +train 99.61 / core 99.55 with subjects `gpt-5.4-mini` / `low` / `priority`. +All 45 subject trials passed hidden cases. Against the comparable weak-tier +no-edit baseline, round 53, train moved 99.51 -> 99.61, serialization moved +98.85 -> 99.35, T09 moved 99.10 -> 99.40, and T12 moved 98.60 -> 99.30. Keep +the source edit. The remaining serialization pattern is narrower: candidates +still sometimes choose `normalize( $html ) ?? $html` after a rewrite loop, +which can abandon emitted changes and return raw source bytes if normalization +fails. Record this as a future scratch-test candidate, not an immediate source +edit. Next action: run a checkpoint/regression sentinel with +`gpt-5.4-mini` / `low` / `priority` before any further source promotion. + +Round 57 supplied that checkpoint: all 97.90 / train 97.95 / held-out 97.73 / +core 97.66. Two audit-only tooling commits occurred between round 56 and this +checkpoint to keep next-action selection autonomous; they did not change source +docs, corpus, runners, harness, or aggregation. The source edit stays under the +revert rule: train fell 1.66 from round 56, below the 2-point threshold, and no +task regressed across all trials. T09 held at 99.40 and T12 moved 99.30 -> +98.80. Held-out N02 exposed the valueless-attribute `true`/`''` distinction +again, but it remains sentinel-only evidence. T06's low trial was a PHP array-key +typo, not an HTML API misconception. The strongest train documentation signal is +N03: one trial used plain `next_tag()` plus `get_current_depth()` as a bounded +subtree scan, forgetting that plain `next_tag()` skips closers and therefore may +miss the depth boundary. Next action: run a focused `shadow-doc-a/b` diagnostic +on N03 and nearby traversal controls, testing a compact contrast card that +states depth-boundary scans must use `next_token()` or +`next_tag( array( 'tag_closers' => 'visit' ) )`; plain `next_tag()` skips the +closing boundary. + +Historical round-17 judge gaps had mostly reduced to these shapes: + +- The fact exists, but is too far from the method heading readers enter + through. +- The docs describe a positive capability, but not the contrasting wrong + move. +- The docs are accurate, but a long surrounding section dilutes the line that + matters. +- The subject passed by delegating to the API, but could not explain the API's + boundary conditions. + +Treat future edits as precision edits. A one-line contrast in the right +method docblock is probably worth more than another long example. + +## Strong candidates + +These are the best next candidates after a local review plus three read-only +subagent passes. Treat them as hypotheses to test through no-edit baselines, +discoverability probes, or scratch-rendered A/B variants before promoting any +source docblock changes. + +### 0. Incomplete-token guard for HTML Processor region scans — confirmed in round 19 + +Core idea: connect the documented subtree-walk/depth-boundary pattern to the +existing incomplete-token API. A depth drop or virtual closer proves that the +HTML parser unwound the element stack; it does not prove the source region was +complete. After a forward scan that will drive a mutation or other trusted +result, callers should check both parser abort state and incomplete-token +state: + +- `get_last_error()` / `get_unsupported_exception()` for unsupported parser + states. +- `paused_at_incomplete_token()` for lexical truncation at the input tail. +- A bounded scan can visit virtual closers after truncation while + `paused_at_incomplete_token()` is true and `get_last_error()` is still null. + +Why this is strong: round 18's only functional train failure was exactly this +gap. All three N03 trials used the documented depth-bounded HTML Processor +walk, passed ordinary omitted-end-tag and malformed-list cases, and failed +only incomplete token/comment tails inside the scanned list. + +Round-19 result: source docs now include a generic "scan a region before +editing its opener" recipe in the HTML Processor class docs plus compact notes +near `next_token()` and `get_current_depth()`. N03 passed 11/11 in all three +trials with 100 adherence. Do not keep spending source-edit budget here unless +a weaker tier or future task exposes a new variant. + +Risk: low-medium. Keep it framed as a general scan-completion contract, not as +a list-counting recipe. Best placement is near +`WP_HTML_Processor::next_token()`, `get_current_depth()`, and the inherited +`paused_at_incomplete_token()` docs/cross-reference. + +### 1. Depth-boundary equivalence card — confirmed in round 36 + +Core idea: make the subtree-walk boundary mechanically hard to copy wrong. +Show both safe forms side by side near `WP_HTML_Processor::next_token()` and +`get_current_depth()`: + +- Continue form: walk while `get_current_depth() >= $anchor_depth`. +- Break form: break only when `get_current_depth() < $anchor_depth`. +- Wrong forms: `>` drops equal-depth content; `<=` exits too early in break + form. + +Why this is strong: round 17's only functional miss was still T08, and the +same off-by-one family has appeared across T03, T06, T08, N02, and H04-style +walks. This is the clearest remaining train signal. + +Round-33/34 scratch A/B result: the compact class-level traversal card won the +paired subset, 99.08 vs 97.34. It made subtree/direct-child checks more +mechanical without source edits: N03 went from one incomplete-token functional +miss in the control to 100.00 in the variant, T08 improved 96.50 to 98.00, +N06 was effectively flat/slightly up, and T06 had only a -0.2 adherence dip. +Round 35 checkpoint satisfied the held-out gate: all 99.47 / held-out 99.38, +with no hidden failures. + +Round-36 result: source promotion confirmed. Train scored 99.65 / core 99.59 +against round 32's same-mode 99.67 / core 99.62, with no functional misses. +The target traversal tasks held or improved: N03 100.00, T07 100.00, and T08 +98.50. Do not spend more source-edit budget on this depth/direct-child card +unless a future weaker tier or task exposes a distinct traversal failure. + +Risk: medium. Avoid a table-specific solution. The invariant should be +explained with generic "container and descendants" language, optionally backed +by a compact trace that stresses sibling/implicit structures. + +### 2. Factory lifecycle contract + +Core idea: clarify construction failure versus parse/serialization failure at +`WP_HTML_Processor::create_fragment()` and `create_full_parser()`. + +Contract to test: + +- These factories belong only to `WP_HTML_Processor`. +- `null` from construction means unsupported context/encoding, not malformed + body content. +- A non-null processor does not prove the document is fully supported. +- Unsupported markup surfaces later while walking, or through `serialize()`, + `normalize()`, `get_last_error()`, or `get_unsupported_exception()`. +- Callers promising normalized output should not return raw input as a fallback + when processing fails. +- Reference implementations should get extra credit for explicit incomplete + token and last-error handling where relevant: Tag Processor and HTML Processor + loops can stop at an incomplete tail, while HTML Processor walks can also + encounter unsupported parser states after construction. + +Why this is strong: repeated judge notes across N04, T09, T11, T12, and N05 +show invented null branches, wrong fallback choices, and cross-class factory +hallucinations. This is a broad API boundary, not a task-specific patch. + +Round-39 citation probe result: passed 3/3 at the current subject tier. +Subjects correctly distinguished factory `null` from later `get_last_error()`, +found `paused_at_incomplete_token()` as a separate complete-input policy +check, and identified `normalize( $html )` after a token rewrite as discarding +the accumulated changes. This is not source-edit evidence by itself. Use a +scratch A/B next to test whether a compact method-local fallback card improves +T09/T12/N04 transfer. + +Rounds 40/41 scratch A/B result: variant won 99.83 vs 99.57. T12 improved +98.90 -> 100.00 and N04 stayed 100.00; T09 dipped slightly because one +variant trial still normalized the original input in an error branch. This is +promotable after checkpoint, but adapt the wording to foreground the exact +anti-pattern: after a `serialize_token()` rewrite, `normalize( $html )` and +raw input both discard the accumulated rewrite and are not normalized +rewrites. + +Risk: low. + +### 2b. HTML Processor next_tag() cursor and OR-search contract — confirmed in round 32 + +Core idea: make `WP_HTML_Processor::next_tag()` cursor movement and +multi-name searches explicit near the method heading. + +Contract to test: + +- Each `next_tag()` search starts after the current cursor position. +- When `next_tag()` returns false, a later call with a different query will + not rescan earlier tags. +- To find the first of several tag names, do one forward walk and branch on + `get_tag()`, or use bookmarks/new processor instances when a true rescan is + required. +- `tag_name` is a single tag name, not an array of alternatives. + +Evidence: round 21 N03 had a sequential filtered-search failure, and round 29 +T07 repeated the same cursor misconception as a functional failure: a subject +scanned for `UL`, then scanned for `OL` on the same processor and missed +earlier nested `OL` elements because the cursor was already at EOF. Judges +noted that the Tag Processor overview has the cursor warning, but the HTML +Processor `next_tag()` method docs do not make it local enough. + +Probe result: `round-29-next-tag-cursor-or-search` passed 3/3. Directly asked +subjects found the cursor rule and OR-search idiom, but they cited Tag +Processor "Finding tags"/"Custom queries" and HTML Processor `next_token()` +one-cursor guidance rather than local HTML Processor `next_tag()` wording. +Treat this as a placement/transfer hypothesis. Next diagnostic: scratch +method-local `next_tag()` card near the HTML Processor method docs, then test +T07/N03-style tasks before source promotion. + +Sidecar doc-location check: the cursor movement rule is currently under +Tag Processor "Finding tags" / "When matching fails"; the only OR-style idiom +is under Tag Processor "Custom queries". The rendered HTML Processor +`next_tag()` method section has neither a local cursor warning nor an +HTML Processor first-of-several-tags idiom. + +Scratch A/B result: round 31's method-local `next_tag()` cursor card beat the +fresh round-30 control (99.80 vs 99.30) on N03/T07. N03 remained perfect and +T07 improved from 98.60 to 99.60, with all variant T07 trials using one +forward scan rather than sequential filtered searches. This justified a +source edit near `WP_HTML_Processor::next_tag()`. + +Round-32 result: source promotion confirmed. The full train score rose from +round 29's 98.31 to 99.67, all hidden tests passed, T07 recovered to 99.30, +and N03 stayed 100.00. Do not keep spending source-edit budget here unless a +future weaker tier or checkpoint exposes a new cursor variant. + +Risk: low-medium. Keep it generic and avoid a nested-list recipe; teach cursor +state and first-of-several-tags search. + +### 3. Where-text-lives matrix + +Core idea: add a compact token-model matrix near `get_token_type()` and +`get_modifiable_text()`. + +Rows to cover: + +- `#text` tokens: decoded text-node character data. +- Attribute values: retrieved through `get_attribute()`, never as `#text`. +- Comments: `#comment`, not `#text`. +- Raw-text/RCDATA elements such as `SCRIPT`, `STYLE`, `TITLE`, and `TEXTAREA`: + text rides on the element token, not on child `#text` tokens. +- Inline markup: one logical element's text may be split across multiple + `#text` tokens; accumulate. +- Tag Processor text walk versus HTML Processor tree-aware text walk. + +Why this is strong: many passing trials still show shallow explanations about +why comments, attributes, raw-text elements, and split text are excluded or +included. Weaker models are likely to expose this more sharply. + +Round-19 probe result: a direct citation-only text-content recipe probe passed +3/3 at the current `gpt-5.4` / `medium` tier. Subjects found the existing +depth-bounded `#text` accumulation recipe and the SCRIPT/STYLE/TITLE/TEXTAREA +element-token exception. Keep this as a weaker-tier or shadow-doc A/B +candidate, not the next immediate source edit at the current tier. + +Round-20 calibration result: `gpt-5.4` / `low` remained functionally +saturated, but gave repeated adherence-only evidence for this hypothesis. +T05 was the lowest task (96.70) with all trials passing hidden tests but +showing uncertainty about a general DOM-style text-extraction recipe. N06 had +a passed near-miss where a subject appended `get_modifiable_text()` from +comment-like tokens. If a weaker tier exposes the same pattern functionally, +or a scratch A/B shows improvement, promote this as a generic main-class +recipe/matrix rather than a task-shaped answer. + +Round-20 follow-up probe result: a direct generic recipe probe at +`gpt-5.4` / `low` found the DOM-style text recipe in all three trials, so the +text rows alone are still a placement/density hypothesis rather than a missing +fact. The same probe exposed a stronger rewrite-policy gap: all three trials +over-applied `paused_at_incomplete_token()` and recommended rejecting every +rewrite after incomplete trailing syntax, even when a best-effort normalized +rewrite of visited tokens would be acceptable. This supports promoting a +generic HTML Processor recipe that separates unsupported parser aborts from +caller policy for incomplete trailing tokens. + +Round-21 result: a broad HTML Processor class-level recipe plus +`serialize_token()` policy note was mixed. The rewrite portion improved +T09-mark-keyword slightly, but the text portion did not improve processor +choice in T05; all three subjects still selected `WP_HTML_Tag_Processor`. +Before adding more text recipes, clarify the Tag Processor text-walk example +as lexical token processing and point BODY-fragment text-content callers to +`WP_HTML_Processor::create_fragment()`. + +Round-23 result: the Tag Processor placement edit fixed the processor-choice +part of this hypothesis for T05. The remaining text evidence is narrower: +subjects can still over-include special element opener text in ordinary +heading/subtree extraction, and may reject all read-only text collected before +an unsupported parser abort. Promote a future source edit here only after a +checkpoint or focused probe confirms this is still the best next train signal. + +Round-24 checkpoint result: held-out stayed stable and T05 held at 99.80. +N06 still showed over-inclusion of special-element opener text in ordinary +heading text, and N02 repeated the read-only `get_last_error()` partial-result +policy concern. This is now ready for a citation-only probe focused on +read-only text extraction policy. + +Risk: medium-low if phrased as a token model instead of a task recipe. + +### 3b. Read-only text extraction policy + +Core idea: separate three caller policies that the docs currently place near +each other: + +- Ordinary subtree/DOM-style text: append only tokens where + `get_token_type() === '#text'`. +- Special element opener text (`SCRIPT`, `STYLE`, `TITLE`, `TEXTAREA`) is + modifiable text on the element token and must be an explicit opt-in. +- After a read-only extraction walk, `get_last_error()` or + `paused_at_incomplete_token()` tells the caller the walk stopped early or the + input was incomplete; it does not by itself define whether to return + already-collected best-effort text, an empty result, or a failure sentinel. + +Evidence: round-23 T03/N06 over-included special-element opener text in +ordinary heading/subtree extraction; round-23 T05 sometimes discarded collected +text after an unsupported parser abort. Round-24 repeated the N06 +over-inclusion pattern and N02 repeated the read-only partial-result policy +concern. All hidden tests still passed, so this needs a citation-only probe +before source promotion. + +Next diagnostic: ask subjects to cite the rendered docs for a read-only +fragment text extractor that collects ordinary subtree text, decides whether +to include TITLE/TEXTAREA/SCRIPT/STYLE opener text, and states a caller policy +for `get_last_error()` and `paused_at_incomplete_token()`. + +Probe result: passed 3/3. Directly asked subjects cited the existing +`Recipe: collect DOM-style text from a subtree`, `next_token()`, and Tag +Processor lexical-boundary sections, and correctly answered that ordinary text +uses `#text` only, special-element opener text is opt-in, and read-only +fallback is caller policy. Do not promote source prose yet; test whether a +scratch-only policy matrix improves transfer in task code. + +Scratch A/B result: mixed/no promotion. Round 26's policy matrix improved the +paired subset numerically versus round 25 (99.17 vs 98.70) and fixed T05 +adherence, but it also encouraged all three T03 subjects to include +SCRIPT/STYLE/TITLE/TEXTAREA opener text in ordinary heading text. N06 remained +the target near-miss, with all three variant candidates still over-including +special-element text. A promotable source edit needs sharper negative +placement: ordinary `#text` is the default; special-element opener text is +available for explicit caller contracts only. + +Follow-up scratch A/B result: round 28's default-first negative-example +variant beat the fresh round-27 control (99.50 vs 99.27). The target behavior +changed in the right direction: control T03/N06 still over-included +special-element opener text, while variant T03/N06 used ordinary `#text` only; +T05 still correctly opted into TITLE/TEXTAREA while excluding SCRIPT/STYLE. +Promote an adapted source edit now. Keep it generic and avoid the scratch +variant's misleading null-check negative example. + +Source result: round 29 was mixed. T03/T05 improved after promotion, but N06 +still over-included special-element opener text, with judges pointing at the +`next_token()` method-local special-element paragraph rather than the overview +recipe. If this hypothesis is revisited, use a scratch A/B that rewrites that +method-local paragraph to say "only if the caller's definition of text includes +special-element contents" and points back to the ordinary subtree-text recipe. + +Follow-up scratch A/B result: rounds 37/38 tested that method-local rewrite +plus a `get_modifiable_text()` warning. The variant lost 98.72 vs 99.18 and +did not remove the target over-inclusion pattern. Do not promote this wording; +any future text-policy attempt needs a different shape, likely a compact +decision table or a task-independent token-category matrix, and should not be +mixed with serialization fallback guidance. + +Risk: medium. Avoid replacing the processor-choice win with a task-shaped text +recipe. Phrase the edit, if promoted, as a token/policy matrix. + +### 3a. Tag Processor lexical-text boundary — confirmed in round 23 + +Core idea: the Tag Processor docs contain a useful `next_token()` text example +that is lexical, not parsed-tree textContent. Label it that way and +cross-reference the HTML Processor when the caller needs BODY-fragment +semantics, implied closing behavior, tree order, or unsupported-markup policy. + +Evidence: T05 in both round 20 and round 21 passed functionally but selected +`WP_HTML_Tag_Processor` in all three trials. Round-21's added HTML Processor +text recipe did not change this; judges identified the Tag Processor +"Tokens and finer-grained processing" example as the stronger entry point. +Round 22 reproduced the same T05 behavior at `gpt-5.4` / `medium`, so the +signal is no longer only low-effort noise. + +Round-22 probe result: direct citation-only questioning passed 3/3 at +`gpt-5.4` / `medium`. Subjects found the processor boundary when prompted, +so the source hypothesis should improve transfer at the Tag Processor example +itself rather than add more facts elsewhere. + +Round-23 result: confirmed. T05 improved from 96.70 to 99.20, and all three +subjects chose `WP_HTML_Processor::create_fragment()` for parsed fragment text +extraction. Do not keep spending source-edit budget here unless a future tier +or checkpoint exposes a new variant. + +Risk: low-medium. Avoid saying the Tag Processor cannot read text; it can read +lexical token text. The distinction is parsed fragment/DOM semantics versus +flat lexical scanning. + +### 4. Contract-card rendered-doc A/B + +Core idea: before source edits, generate scratch-rendered docs that insert +short "Use this / do not use this / common wrong move" cards under high-entry +method headings. + +High-value headings: + +- `create_fragment()` and `create_full_parser()` +- `next_tag()` +- `next_token()` +- `get_updated_html()` +- `serialize()` +- `serialize_token()` +- `get_breadcrumbs()` +- `get_namespace()` / `get_tag()` + +Why this is strong: recent gaps repeatedly say the fact exists but is too far +from where subjects enter the docs. A shadow variant tests discoverability +without committing source bloat. + +Risk: medium. Cards must teach boundaries, not current corpus answers. + +### 5. Signal-density pruning A/B + +Core idea: test whether fewer visible words produce better weaker-model +behavior. Do this only in scratch-rendered docs first. + +Candidate ablations: + +- Hide future-direction prose in the Tag Processor header. +- Hide HTML Processor roadmap bullets that imply current inner-text operations + are unsupported. +- Collapse duplicate special-element/modifiable-text lists into method-local + contracts. +- Collapse the class-level bookmark overview while keeping method-local + bookmark docs. +- Deduplicate normalization prose across `normalize()`, `serialize()`, and + `serialize_token()`, leaving decision contracts at each method heading. +- Move the template-building overview into method-local contracts for + `set_attribute()`, `set_modifiable_text()`, and `get_updated_html()`. + +Why this is strong: if the next weaker tier fails by retrieval dilution rather +than missing facts, pruning may outperform additive documentation. + +Risk: low as shadow-doc A/B; high if source pruning is promoted without broad +concept stability. + +### 6. Parsed identity and namespace contract + +Core idea: show that parsed element identity is not source spelling. Clarify +that `next_tag( 'IMG' )` uses the parser's element identity, while +`get_namespace()` distinguishes HTML/SVG/MathML when names overlap. + +Why this is strong: the pre-refresh N06 namespace task passed, but subjects +often added redundant or misunderstood namespace guards. The current corpus no +longer has an active namespace task, so treat this as historical/future-task +evidence until a current train task, probe, or A/B test revives it. + +Risk: medium. Use generic parsed-identity language and varied examples rather +than a task-shaped `img`-only recipe. + +### 7. Method-local small contracts + +These are lower-risk but probably smaller-signal than the candidates above: + +- `next_tag()`: opener-only by default; no `is_tag_closer()` guard unless + `tag_closers => 'visit'`. +- `get_breadcrumbs()`: final entry is the current node; slice it off for + strict ancestor checks. +- `get_attribute()`: use `is_string( $value ) && '' !== $value` when a real + string value is required; `null`, `true`, and `''` are distinct. +- `normalize()` / `serialize()`: attribute order is preserved, not sorted. +- `get_tag()`: returns `null` on non-tag tokens during `next_token()` walks. +- `paused_at_incomplete_token()`: lexical incomplete-token state, not unclosed + tree structure. + +Risk: low, but expected incremental score gain may be small unless weaker-tier +probes show these are findability failures. + +## Codex model policy + +Purpose: as the docs approach perfect scores, move test subjects to less +capable configurations so failures reveal documentation strength instead of +model strength. Keep judges strongest and stable; only weaken test subjects. +Use `priority` service tier for every Codex agent when it is available, because +latency variance is not part of the documentation experiment. + +As of 2026-06-12, official OpenAI docs list GPT-5.5 as the flagship model, +GPT-5.4 as the more affordable strong coding/professional model, and +GPT-5.4 mini/nano as smaller lower-latency lower-cost variants. The same +docs list reasoning efforts `none`, `low`, `medium`, `high`, and `xhigh`. +This session's visible subagent overrides expose `gpt-5.5`, `gpt-5.4`, and +`gpt-5.4-mini`. I did not find `gpt-5.3` in the current public model docs; use +it only if the workflow runner exposes it, and treat its position as empirical. + +Judges: + +- Always use `gpt-5.5` / `xhigh` / `priority` for judge agents when available. +- If unavailable, pause or explicitly record the downgrade. Do not silently + compare judge scores across different judge tiers. + +Recommended subject ladder, strongest to weakest: + +1. `gpt-5.4` / `medium` / `priority` +2. `gpt-5.4` / `low` / `priority` +3. `gpt-5.4-mini` / `high` / `priority` +4. `gpt-5.4-mini` / `low` / `priority` + +Do not assume base-model size and reasoning effort compose linearly. For +example, `gpt-5.4-mini/high` may beat or lose to `gpt-5.4/low` depending on +the task. Whenever stepping down across a model-family boundary, run a no-edit +rebaseline first. + +Default round policy: + +- If a subject tier scores 97+ train for two consecutive train rounds and a + checkpoint held-out split is stable, step down one rung. +- Use one primary subject tier per scored round so deltas remain comparable. +- Do not mix subject tiers into the main round score. +- On checkpoints or hold rounds, run a small cross-tier panel to watch for + regressions and calibrate the next rung. Treat this panel as diagnostic until + that tier has its own no-edit baseline. +- If a subject tier falls below roughly 70 with failures unrelated to doc + lookup, keep it as a stress test but do not drive source edits from it. +- At weaker tiers, consider five trials per task or paired A/B trials because + sampling variance will rise. + +Official references: + +- https://developers.openai.com/api/docs/models +- https://developers.openai.com/api/docs/guides/latest-model + +## New experiment types + +### 1. Shadow-doc ablation + +Question: does removing visible but low-value documentation improve results by +increasing signal density? + +Method: + +- Render current docs normally. +- Produce scratch-only ablation variants that delete or collapse selected + sections from the rendered markdown. Do not edit source docblocks for the + first pass. +- Run the same task/model/trial matrix against control docs and ablated docs. +- Promote an ablation to source-doc pruning only if it improves or preserves + scores and judge notes show less confusion. + +Candidate ablations: + +- Collapse long narrative sections that do not appear in successful subject + citations. +- Remove duplicate examples that teach the same path as a stronger nearby + example. +- Hide internal history, future-direction, and low-frequency caveat prose from + the rendered docs unless it affects a task contract. +- Replace long paragraphs with compact "Contract" bullets at method headings. + +Success metric: + +- Equal or better task score. +- Fewer hallucinated methods and fallback branches. +- Explanations cite closer, more local passages. +- No new held-out regression in concepts not targeted by the prune. + +### 2. Contrast cards + +Question: do "do this instead of that" patterns outperform neutral prose? + +Method: + +- Add small contrast blocks near the relevant method docs. +- Avoid task-shaped examples. Teach the decision boundary, not the current + corpus answer. + +High-value patterns: + +- Use `new WP_HTML_Tag_Processor( $html )` for flat lexical tag/class/attribute + edits. Do not call `create_fragment()` or `create_full_parser()` on the Tag + Processor. +- Use `WP_HTML_Processor::create_fragment()` for fragment tree traversal, + breadcrumbs, depth, normalized serialization, and implied nodes. Do not use + the Tag Processor when ancestry or namespace identity matters. +- Use `WP_HTML_Processor::create_full_parser()` for whole-document questions + such as the document `TITLE`. Do not treat the first source-order `` + as the document title. +- After queued edits such as `add_class()`, `remove_class()`, + `set_attribute()`, or `set_modifiable_text()`, use `get_updated_html()`. Do + not use `serialize()` or `normalize()` to read queued lexical edits. +- For selective normalized rewrites while walking every token, use + `serialize_token()`. Do not mix this with queued edits unless the docs + explicitly say that pattern is supported. +- During a plain `next_tag()` walk, do not add an `is_tag_closer()` guard unless + `tag_closers => 'visit'` was requested. +- For ancestor-only tests, slice `get_breadcrumbs()` before checking ancestors + because the last breadcrumb is the current node. +- For usable attribute values, prefer `is_string( $value ) && '' !== $value`. + Do not treat `null`, `true`, and `''` as interchangeable. +- For `#text` rewriting, act on `get_token_type() === '#text'`. Do not expect + attributes, comments, or raw-text element contents to appear as `#text`. +- For foreign content, trust parsed element identity. Do not assume source + spelling alone determines `get_tag()` or `get_namespace()`. + +### 3. Discoverability probes + +Question: are failures caused by missing facts or hard-to-find facts? + +Method: + +- Before a full round, run small read-only subject probes that ask for an answer + and a cited doc location, not code. +- Use weaker models and short time budgets. +- Score only whether the subject finds the right contract and cites a local + passage. + +Probe questions: + +- Which class owns `create_full_parser()`? +- Does `create_fragment()` return null for malformed body HTML, or only for + unsupported context/encoding? +- Does `next_tag()` visit closers by default? +- Does `get_breadcrumbs()` include the current node? +- Does `next_tag( 'IMG' )` match an SVG `<image>` element? +- Does `normalize()` sort attributes? +- What distinguishes `get_updated_html()`, `serialize()`, and + `serialize_token()`? + +If probes fail while the fact exists, prefer relocation or contrast. If probes +pass but task code fails, prefer examples or task/corpus changes. + +### 4. T08 traversal isolation + +Question: is the remaining table-extraction variance a documentation gap or a +state-machine reasoning limit? + +Method: + +- Create microtasks around adjacent regions, self-nesting regions, implied + nodes, and one-cursor walks. +- Keep them out of train until references and hidden tests are approved. +- Use them first as diagnostic probes, not score-driving tasks. + +Potential microtasks: + +- Collect text from adjacent `LI` elements with nested inline markup. +- Collect text from adjacent table cells where implied `TBODY` appears. +- Collect nested `BLOCKQUOTE` regions without losing sibling regions. +- Compare nested-loop and single-dispatch implementations and ask which is + safe. + +Expected useful edit if this confirms a doc gap: + +- A compact single-dispatch "region collector" recipe that names the invariant: + one cursor, one loop, explicit active region state, flush on matching closer. + +### 5. Method-heading contract pass + +Question: can we improve scores by moving existing facts to the exact method +headings where models enter? + +Candidate local contracts: + +- `WP_HTML_Processor::create_fragment()` and `::create_full_parser()`: + construction failure is context/encoding failure; parser support failures + surface later through walking, `serialize()`, `normalize()`, or + `get_last_error()`. +- `WP_HTML_Tag_Processor::next_tag()` and `WP_HTML_Processor::next_tag()`: + opener-only by default; tag-name matching is parsed-token matching, not raw + text matching. +- `WP_HTML_Processor::get_breadcrumbs()`: includes current node as final entry. +- `WP_HTML_Processor::get_tag()`: returns null on non-tag tokens during a + `next_token()` walk. +- `WP_HTML_Processor::normalize()` and `::serialize()`: attribute order is + preserved; attributes are not sorted. +- `WP_HTML_Tag_Processor::paused_at_incomplete_token()`: reports lexical + incomplete-token state, not unclosed tree structure. + +## Noise-removal policy + +Do not delete prose because it is long. Delete or collapse it only when it is +visible to test subjects and at least one of these is true: + +- It repeats a stronger nearby contract. +- It explains implementation history rather than caller behavior. +- It introduces a low-frequency caveat before the common path. +- It causes subjects to add defensive fallback code that the API contract does + not require. +- It has not been cited by successful trials or judges across multiple rounds. + +Run pruning as a shadow-doc ablation first. Source deletion should be a +confirmed hypothesis, not a style cleanup. + +## Proposed next sequence + +1. Run a no-edit current-corpus baseline/calibration with the first current + subject tier, `gpt-5.4` / `medium` / `priority`. Record any runner + mismatch, because this score replaces round 17 as the current-corpus + comparison point. +2. Continue weak-tier calibration down the subject ladder, one tier at a time, + until a tier lands in a useful signal band: not saturated, but still mostly + failing on doc/API reasoning rather than generic coding errors. +3. Run citation-only discoverability probes for the strong-candidate contracts. + If a fact exists but weak subjects cannot cite it locally, prefer relocation + or a contract card over more narrative prose. +4. Add a scratch-only rendered-doc variant tool or manual script that can + insert contract cards and remove named sections without editing source. +5. Run paired shadow-doc A/B tests for the depth-boundary card, factory + lifecycle card, where-text-lives matrix, and signal-density pruning. +6. Run a small cross-tier diagnostic panel on checkpoint or hold rounds to + confirm the improvement generalizes across subject capability. +7. Only then promote winning changes to docblocks, one hypothesis per commit, + with held-out still protected from driving edits. + +The main risk now is overfitting the train set or adding enough prose that the +right line becomes harder to find. The next phase should measure signal +density, not only factual completeness. + +## Future API/design observations + +Use this section for repeated patterns that look like surprising API behavior, +recurring hallucinated methods, or missing API affordances. These notes are not +documentation hypotheses by themselves. Keep them distinct from source +docblock edits until the project decides whether they represent API design +work, task-design drift, or documentation usability gaps. diff --git a/doc-experiment/PLAN.md b/doc-experiment/PLAN.md new file mode 100644 index 0000000000000..2a8323cc02bfd --- /dev/null +++ b/doc-experiment/PLAN.md @@ -0,0 +1,213 @@ +# HTML API Autonomous Documentation Improvement + +Improve the documentation of `WP_HTML_Tag_Processor` and `WP_HTML_Processor` +(docblocks in the two class files) by iteratively measuring how well weaker +models can complete real HTML API tasks using *only* the rendered +documentation, then editing the docs to fix observed failure modes. + +Current phase: after round 17 the original train score was saturated enough +that the primary work was no longer "run another full round, add the latest +gap." The corpus was then refreshed by replacing several active tasks, so +rounds through 17 are historical for the previous corpus and must not be used +as comparable baselines for new source edits. The next valid action is a +no-edit baseline/calibration on the current corpus and current model policy, +then use `doc-experiment/NEXT-HYPOTHESES.md` as the backlog for diagnostic +probes, scratch-rendered A/B variants, and source-edit hypotheses. + +## Pipeline (per round) + +1. Regenerate parsed-doc JSON (script lives in the phpdoc-parser checkout; + must be invoked by absolute path): + + ```sh + php /Users/jonsurrell/a8c/phpdoc-parser/generate-json-manually.php \ + -d src/wp-includes/html-api/class-wp-html-tag-processor.php \ + -o artifacts/html-tag-processor.json + php /Users/jonsurrell/a8c/phpdoc-parser/generate-json-manually.php \ + -d src/wp-includes/html-api/class-wp-html-processor.php \ + -o artifacts/html-processor.json + ``` + + (Harmless P2P_Autoload deprecation warnings are expected on stderr.) + +2. Render deterministic markdown from the JSON: + + ```sh + python3 doc-experiment/render-docs-markdown.py -i artifacts/html-tag-processor.json -o <scratch>/html-tag-processor.md + python3 doc-experiment/render-docs-markdown.py -i artifacts/html-processor.json -o <scratch>/html-processor.md + ``` + + The renderer fails loudly on unknown HTML tags (schema drift guard) and is + byte-deterministic. It excludes line numbers and `uses` arrays + (implementation leakage). + +3. Copy ONLY the two markdown files into a fresh scratch directory outside the + repo (e.g. `/tmp/html-api-docs-eval/round-NN/`). Test subagents are given + those two absolute paths and never learn the repo location. + +4. Run the train set with one primary subject tier per scored round. One fresh + subagent per task-trial, run in parallel. Test subagents get Read + Grep + only, the task prompt, and the two markdown paths. They MUST NOT access any + other information source or execute code. Their deliverable: PHP code + + explanation + self-reported confidence. Spot-check transcripts for + isolation violations each round. + +5. Execute every trial's code in the standalone harness against the task's + hidden test cases (deterministic pass/fail per case, recorded before + judging). + +6. Judge: one strongest-available judge per task sees the task spec, reference + implementation, hidden-test execution results for every trial, the + markdown docs the subagents saw, and full source access. It scores each + trial and writes a failure analysis: which doc gap or misleading passage + caused each failure. + +7. Analyze failures, form hypotheses, and choose the next action: + no-edit weak-tier calibration, citation-only discoverability probes, + scratch-rendered A/B variants, or source docblock edits. Source edits are + promoted only after diagnostic evidence, and then committed one hypothesis + per commit. + +## Current model policy + +Use `priority` service tier for every Codex agent when available. + +- Judges: always `gpt-5.5` / `xhigh` / `priority` when available. If this + is unavailable, pause or explicitly record the downgrade; do not silently + compare judge scores across judge tiers. +- Test subjects, strongest to weakest: + 1. `gpt-5.4` / `medium` / `priority` + 2. `gpt-5.4` / `low` / `priority` + 3. `gpt-5.4-mini` / `high` / `priority` + 4. `gpt-5.4-mini` / `low` / `priority` + +Use one primary subject tier per scored round. Do not mix tiers into the main +round score. Step down only after no-edit calibration shows the next tier is a +useful measuring instrument: not saturated, but still mostly failing on +documentation/API reasoning rather than generic coding errors. Cross-tier +panels are diagnostic only until each tier has its own no-edit baseline. + +## Post-round-17 diagnostic loop + +Before promoting more source docblock edits, prefer this sequence: + +1. Run no-edit weak-tier calibration across the subject ladder, one tier at a + time. +2. Run citation-only discoverability probes for the strong candidate contracts + in `NEXT-HYPOTHESES.md`. +3. Create scratch-rendered variants that insert contract cards, relocate + method-local facts, or remove noisy rendered sections without editing source. +4. Run paired shadow-doc A/B tests against the selected primary tier. +5. Promote only winning variants to source docblocks, one hypothesis per + commit, then run the docs-only guard and a normal scored round. + +Strong current candidates are: the depth-boundary equivalence card, factory +lifecycle contract, where-text-lives matrix, method-heading contract cards, +signal-density pruning, parsed identity/namespace contract, and smaller +method-local contracts. + +## Scoring + +- Per-trial: 70% functional correctness (fraction of hidden test cases + passed) + 30% API adherence rubric (no hallucinated methods, correct + processor choice, idiomatic handling of malformed HTML, no + `_doing_it_wrong` triggers). +- Task score = mean of all trials for that task, usually 3 unless a weaker + tier needs 5 to reduce variance; round score = mean over 15 train tasks. + Scale 0–100. +- Revert rule: revert a hypothesis commit if the next round's score drops + more than 2 points, or a previously passing task regresses across all + trials. Neutral edits that are qualitatively sound are kept. + +## Corpus + +Revised after Jon's round-1 review and refreshed again after round 17: +19 active tasks — 15 train + 4 held-out. Held-out tasks are scored only at +checkpoints (every 3rd round and at the end) and never drive doc edits — +they detect doc edits that game the train set. Because the post-round-17 +refresh replaced active tasks, pre-refresh scores are not comparable with +future current-corpus scores except as historical context. + +- Train core: T03–T12 plus N03 (first list direct-child count), N04 + (normalize with fallback), and N06 (heading table-of-contents extraction). + Current train concepts cover attributes, classes, normalization, + serialization, text, and traversal. +- Train smoke: T01, T02 — basic sanity checks, kept in the round score + but reviewed separately; they must not dominate coverage. +- Held-out: N01 (class removal), N02 (contextual selection with + breadcrumbs), N05 (full-document title via create_full_parser), + H04 (empty-paragraph normalized removal). +- Retired to corpus-retired/ (too close to train patterns to give + held-out anti-overfitting value): H01, H02, H03. + +Every active task carries labels in tests.json — role (core/smoke), commonness +(high/medium/low), concept (attributes, classes, text, traversal, +serialization, full-document, normalization), and intended +processor (tag/html/either). Rounds are reviewed per concept, not only by +aggregate score, so a high aggregate cannot hide an untaught concept. + +Sources of task patterns: dmsnell's gists (HTML serialization builder, +streaming html-grep, semantic truncation) adapted to the *current* API on +this branch — the gists use experimental methods that don't exist here — +plus common content workflows: class manipulation, contextual selection, +truncated-input detection, normalization failure, full-document parsing, +namespace distinction. Most tasks do not name which processor class to +use; choosing correctly is part of what the docs must teach. Every task +ships: prompt, function signature, reference implementation, hidden test +cases. All references must pass their hidden tests in the harness, and +extraction tasks are cross-checked against PHP's Dom\HTMLDocument oracle, +before they enter a round. + +Held-out must stay protected in the post-round-17 phase. Do not run every +agent tier against held-out every round. Regular scored rounds use the primary +tier on train. Checkpoint/final rounds may run the primary tier on train plus +held-out. Cross-tier panels should be train-only or diagnostic; if they include +held-out, treat held-out results as regression sentinels only, never edit +drivers. + +## Execution harness + +Standalone PHP CLI harness (no WordPress boot, no DB): requires the html-api +source files directly plus small shims — real `utf8.php`, copied +`wp_kses_uri_attributes()`, identity `__()`, recording `_doing_it_wrong()` +(its triggering is an adherence signal), minimal `esc_url()` that performs +HTML escaping but no protocol filtering or URL normalization. Candidate and +reference both run under the same harness so shim divergence cancels out. +Tasks are authored to avoid protocol-filtering-sensitive expectations. + +## Round flow & stopping + +- Round 0 scores the unmodified docs (baseline/control) after corpus + approval. +- Docs-only guard each round: PHP token stream with comments stripped must + be identical before/after edits; `php -l` passes; `@since` tags untouched; + no fabricated changelog entries. Free restructuring of docblock content is + otherwise allowed (file-, class-, property-, method-level, both files). +- Docs are free-form: optimized purely for scores, not for WP documentation + standards (upstreaming is a later, separate concern). +- Step down the subject ladder when the current primary tier is saturated for + two consecutive train rounds and checkpoint held-out is stable. Re-baseline + the new tier with no doc edits before using it to drive source changes. +- Stop or pause when the selected weak tier has two consecutive flat rounds, + when diagnostic A/B tests stop producing concept-level signal, or on Jon's + interrupt. + +## Repo layout + +- `doc-experiment/PLAN.md` — this contract; update it when the design + changes. +- `doc-experiment/NEXT-HYPOTHESES.md` — post-round-17 hypotheses, model + policy, diagnostic tests, and source-promotion criteria. +- `doc-experiment/render-docs-markdown.py` — JSON→markdown renderer. +- `doc-experiment/corpus/` — task specs, reference implementations, hidden + test cases (never exposed to test subagents). +- `doc-experiment/harness/` — standalone PHP execution harness. +- `doc-experiment/results/round-NN/` — scores, per-task judge analyses. +- `doc-experiment/LOG.md` — running hypothesis → outcome narrative. +- `artifacts/` — generated JSON (gitignored; regenerated every round). + +## Autonomy + +After corpus approval the loop runs autonomously round-to-round. After each +round a summary is posted (scores, deltas, hypotheses, commits) for +asynchronous review; held-out checkpoints every 3rd round gate continuation. diff --git a/doc-experiment/PROTOCOL.md b/doc-experiment/PROTOCOL.md new file mode 100644 index 0000000000000..d030340c73a57 --- /dev/null +++ b/doc-experiment/PROTOCOL.md @@ -0,0 +1,482 @@ +# Round protocol + +Operational runbook for one evaluation round. Keep in sync with PLAN.md. + +## 0. Choose round mode and model tier + +Start every run with the read-only state audit: + +```sh +python3 doc-experiment/tools/audit-state.py +``` + +If it reports local drift, corpus/result mismatch, source-doc changes since the +last trusted score, or missing current-corpus baseline, resolve that state +before trusting any new score. +When a matching current-corpus calibration round is already prepared, +`audit-state.py` reports its lifecycle and the next artifact action: launch +trials, complete trials, run judges, aggregate, or repair/restage. + +When corpus fixtures changed since the latest trusted score, verify active +reference implementations before staging or comparing a new round: + +```sh +python3 doc-experiment/tools/validate-corpus.py +``` + +This runs every active `reference.php` against its hidden `tests.json`. +Harness signal records such as unsupported-markup `wp_trigger_error()` events +are reported as warnings by default; use `--strict-signals` when those should +fail a focused audit. + +Use `priority` service tier for every Codex agent when available. + +Judges always use `gpt-5.5` / `xhigh` / `priority` when available. If this +is unavailable, pause or explicitly record the downgrade. + +Test subjects use one primary tier per scored round: + +1. `gpt-5.4` / `medium` / `priority` +2. `gpt-5.4` / `low` / `priority` +3. `gpt-5.4-mini` / `high` / `priority` +4. `gpt-5.4-mini` / `low` / `priority` + +Do not mix subject tiers into the main round score. Before a new tier drives +source edits, run a no-edit baseline for that tier. + +Pick exactly one round mode: + +- `scored-train`: primary tier on train tasks only; this is the normal edit + feedback loop. +- `checkpoint`: primary tier on train plus held-out; held-out is a regression + sentinel and never drives edits. +- `weak-tier-calibration`: current docs, no edits, one candidate tier at a + time; selects the next measuring instrument. +- `discoverability-probe`: citation-only questions against rendered docs; no + hidden tests and no source edits. +- `shadow-doc-a/b`: compare normal rendered docs against a scratch-only + rendered variant, such as contract cards or pruning. Source docblocks are not + edited until a variant wins and is promoted as its own hypothesis. + +If the active corpus has changed since the last trusted score, do not compare +against that older score and do not promote source docblock edits. First run a +no-edit baseline/calibration on the current corpus with the current subject and +judge model policy, then use that result as the current comparison point. The +start-of-run audit treats a baseline as current only when the scored artifacts +validate cleanly and the metadata matches the current task set, subject tier, +and judge tier. + +## 1. Stage + +```sh +python3 doc-experiment/tools/prepare-round.py <N> \ + --mode weak-tier-calibration +``` + +This regenerates the rendered docs, copies only the selected tasks' +`task.md` files into `/tmp/html-api-docs-eval/round-NN/tasks/`, and writes +`doc-experiment/results/round-NN/round-metadata.json` with the mode, selected +tasks, trial count, model policy, git head, scratch path, and HTML API source +file digests. It must not copy corpus directories, `reference.php`, or +`tests.json` into scratch. Use `--dry-run` first when reconciling task +selection. The preparation script runs `verify-scratch-isolation.py` before +writing metadata and records SHA-256 hashes for every staged doc and task +prompt. Source digests include both raw source bytes and a comment/whitespace +stripped PHP token-stream fingerprint matching the docs-only guard invariant. +Metadata also records SHA-256 digests for each selected task's `task.md`, +`reference.php`, and `tests.json`; these hidden corpus inputs must not drift +between preparation, execution, judging, and aggregation. +When the worktree is clean, the digest ref is the recorded `git_head`; when +local drift exists, it is `working-tree` and `git_status_short` records the +drift. + +`stage-round.sh <N>` remains the low-level docs-only staging command for +manual scratch variants and shadow-doc A/B setup. + +For a manually edited scratch variant, run: + +```sh +python3 doc-experiment/tools/verify-scratch-isolation.py <scratch> \ + --task-id T01-add-image-class +``` + +If docs were edited since the last round, first run the docs-only guard: + +```sh +php doc-experiment/tools/docs-only-guard.php +``` + +To inspect the source fingerprints recorded by prepared rounds: + +```sh +php doc-experiment/tools/source-digests.php +``` + +For `shadow-doc-a/b`, stage normal docs first, then copy the staged directory +to a variant scratch directory and apply rendered-markdown-only changes there. +Do not edit source docblocks for the variant. Record the variant name in the +result directory and judge prompts. + +## 2. Test-subagent prompt template + +One agent per task-trial; agent type `docs-test-subject` (Read+Grep only, +defined in `.claude/agents/`); use the selected primary subject tier from +section 0; 3 trials per task unless a weaker tier needs 5 trials to reduce +variance. Note: agent definitions register at session start — in a session +older than the definition, fall back to a general agent with the +prompt-level restrictions below and spot-check transcripts for isolation +violations. Substitute `{SCRATCH}` and `{TASK_MD}`: + +For trusted scored rounds, the preferred runner must enforce the +`docs-test-subject` tool boundary or an equivalent Read+Grep-only boundary. If +that Workflow runner is unavailable, use the local Codex CLI fallback: + +```sh +python3 doc-experiment/tools/run-codex-trials.py round-NN \ + --output doc-experiment/results/round-NN/codex-trials-output.json +python3 doc-experiment/tools/validate-workflow-output.py trials \ + doc-experiment/results/round-NN/codex-trials-output.json round-NN +python3 doc-experiment/tools/ingest-trials.py \ + doc-experiment/results/round-NN/codex-trials-output.json round-NN +python3 doc-experiment/tools/validate-round.py round-NN --require-trials-complete +``` + +The local fallback runs each subject from a private non-repo directory +containing only the two rendered docs, one task prompt, and the output schema, +then embeds the task and rendered docs directly in the subject prompt because +local `codex exec` does not expose the experiment's Read/Grep-only agent tools. +It ignores project rules and user config, uses a read-only sandbox, sets +approval policy `never`, and persists `subject_isolation.isolation_mode` as +`isolated-workdir` with `input_delivery: prompt-embedded-docs`. Scores from +this runner are comparable only with rounds using the same isolation mode and +runner policy. A prompt-only fallback without one of these persisted isolation +attestations remains diagnostic unless transcripts are inspected and the +isolation risk is explicitly recorded. + +````text +You are implementing a PHP function for WordPress using the HTML API. + +Your ONLY sources of information about the API are these two +documentation files: + +- {SCRATCH}/html-tag-processor.md +- {SCRATCH}/html-processor.md + +Strict rules: do not read any other file; do not run code; do not rely on +memory of WordPress source code — if the documentation contradicts your +memory, trust the documentation. Methods not documented in those files do +not exist. + +THE TASK: + +{TASK_MD} + +Respond with your final answer in exactly this structure (the code block +must contain a complete PHP file defining exactly the requested function): + +```php +<?php +// implementation +``` + +EXPLANATION: one short paragraph describing your approach and which +documented APIs you used. + +CONFIDENCE: an integer 0-100 — your confidence the implementation passes +a strict behavioral test suite. +```` + +When orchestrating via the Workflow tool, prefer `schema` structured +output with fields `code` (string), `explanation` (string), `confidence` +(integer 0-100) instead of free-text parsing. + +Trusted trials must also persist runner isolation evidence. The trials workflow +returns an object with a `result` array and a `subject_isolation` attestation: + +```json +{ + "subject_isolation": { + "enforced": true, + "agent_type": "docs-test-subject", + "allowed_tools": ["Read", "Grep"], + "notes": "Runner enforced the docs-test-subject tool boundary." + }, + "result": [] +} +``` + +If a Workflow runner uses an equivalent agent type, `agent_type` may differ, +but `allowed_tools` must still be exactly `Read` and `Grep`, and +`equivalent_boundary_notes` must explain the equivalent enforced boundary. If +the local Codex CLI fallback is used, `allowed_tools` is replaced by the +`isolated-workdir` fields validated by `validate-workflow-output.py` and +`validate-round.py`. +`ingest-trials.py` persists this as `subject-isolation.json`; `validate-round.py` +rejects trial artifacts that lack it. If the workflow runner saves returned +values under a top-level `result` key, `validate-workflow-output.py` and +`ingest-trials.py` also accept `{ "result": { "subject_isolation": ..., "result": [...] } }`. + +For the bundled workflow script, generate the task list and model policy from +the round metadata: + +```sh +python3 doc-experiment/tools/workflow-args.py trials round-NN +``` + +The bundled trials workflow passes `agent_type: docs-test-subject` to each +subject `agent()` call. This command verifies the staged scratch directory, +recorded file hashes, selected corpus references, and round preflight before +emitting agent-launch arguments. If `/tmp` was cleaned, a staged file changed, +selected corpus inputs drifted, or a selected reference no longer passes its +hidden tests, restage or reconcile the round rather than launching subjects +against mismatched docs or fixtures. +The escape hatch is named `--skip-round-check` because it bypasses all staged +round artifact and selected-corpus checks, not only scratch isolation; use it +only for diagnostics. +To emit both trial and judge workflow inputs plus the ingest/validation command +sequence as a single handoff object, run: + +```sh +python3 doc-experiment/tools/workflow-args.py manifest round-NN +``` + +The manifest includes launch provenance: current git head/status, the prepared +round metadata git head/status, and SHA-256 hashes for the bundled trial and +judge workflow scripts. Persist the manifest or equivalent values with the +external runner handoff so tooling-only commits can be distinguished from the +staged rendered-doc/corpus state. Its preflight commands validate exactly the +selected tasks recorded in round metadata; they are intentionally not split +shortcuts such as `--split train`. +Use `--output <path>` to write the emitted trials, judges, or manifest JSON to +a handoff file while still printing it to stdout. + +For `discoverability-probe`, replace the implementation prompt with a +question-answer prompt requiring: answer, cited markdown file/heading, and +one-sentence rationale. Do not execute code or expose hidden tests. + +If the Workflow runner is unavailable, use the local Codex CLI probe fallback: + +```sh +python3 doc-experiment/tools/run-codex-probes.py round-NN \ + --question-id <stable-id> \ + --question '<citation-only question>' \ + --output doc-experiment/results/probes/round-NN-<stable-id>.json +``` + +The local fallback runs each probe subject from a private non-repo directory, +embeds only the staged rendered docs and probe question in the prompt, ignores +project rules and user config, uses a read-only sandbox, and sets approval +policy `never`. Persist the probe output with the result artifacts and log +whether the subject found the relevant local contract. + +## 3. Execute + +For each trial, write the returned code to +`results/round-NN/<task>/trial-<n>/candidate.php`, then: + +```sh +php doc-experiment/harness/run-tests.php \ + results/round-NN/<task>/trial-<n>/candidate.php \ + doc-experiment/corpus/<task>/tests.json \ + > results/round-NN/<task>/trial-<n>/execution.json || true +``` + +(`run-tests.php` exits non-zero on failures; the JSON is still complete.) +`persist-trials.py` refuses to persist a trial if the harness output is not +valid execution JSON with `passed`, `total`, and `cases`; artifacts created for +that failed ingest attempt are removed before the ingest exits non-zero, so a +mid-batch harness failure does not leave partial trial artifacts behind. +After `persist-trials.py` succeeds, `ingest-trials.py` writes +`subject-isolation.json` atomically; if that write fails, it removes the trial +directories from the current ingest attempt before exiting non-zero. + +For metadata-backed rounds, `ingest-trials.py` rejects workflow outputs whose +task IDs, trial numbers, or structured-output fields do not match +`round-metadata.json`. Trial entries must include non-empty `code` and +`explanation` strings plus integer `confidence` 0-100, and `code` must be a +complete PHP file starting with `<?php`. Incomplete or malformed agent +responses are rejected before result files are written; ingestion does not +repair subject code. Malformed workflow envelopes, missing or invalid +`subject_isolation` attestations, non-array `result` payloads, and non-object +trial entries are rejected before ingestion reads or persists the payload. You +can run the same +preflight without writing files: + +```sh +python3 doc-experiment/tools/validate-workflow-output.py trials \ + <trials-output.json> round-NN +``` + +Skip this section for `discoverability-probe` rounds. For `shadow-doc-a/b`, +execute control and variant candidates separately and keep result directories +clearly labeled. + +## 4. Judge prompt template + +One `gpt-5.5` / `xhigh` / `priority` judge per task. The judge receives: the +task directory contents (task.md, reference.php, tests.json), every `trial-N` +directory for that task (candidate.php, explanation, confidence, +execution.json), and the two rendered markdown docs the subagents saw. The +judge may read the html-api source and run ad-hoc probes with the harness +bootstrap. + +For the bundled judge workflow script, generate args from the same metadata: + +```sh +python3 doc-experiment/tools/workflow-args.py judges round-NN +``` + +This performs the same scratch/hash preflight because judges must see the exact +rendered docs that subjects saw, and it revalidates the selected corpus +references before judge launch. + +If the Workflow runner is unavailable, use the local Codex CLI judge fallback: + +```sh +python3 doc-experiment/tools/run-codex-judges.py round-NN \ + --output doc-experiment/results/round-NN/codex-judges-output.json +python3 doc-experiment/tools/validate-workflow-output.py judges \ + doc-experiment/results/round-NN/codex-judges-output.json round-NN +python3 doc-experiment/tools/ingest-judges.py \ + doc-experiment/results/round-NN/codex-judges-output.json round-NN +python3 doc-experiment/tools/validate-round.py round-NN --require-scored +``` + +The local judge runner uses the same judge model policy, runs from the +repository root under a read-only sandbox, ignores project rules and user +config, and writes the same judge workflow-output shape consumed by +`ingest-judges.py`. + +The judge returns JSON: + +```json +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 0, + "hallucinated_methods": [], + "notes": "…" + } + ], + "failure_analysis": "Which misunderstandings caused failures, citing the docs passages (or absences) responsible.", + "doc_gaps": [ + { "location": "method or section", "problem": "…", "suggestion": "…" } + ] +} +``` + +For metadata-backed rounds, judge workflow preflight rejects missing task +coverage, missing trial verdicts, non-integer adherence, non-string +hallucinated method entries, empty trial notes, empty failure analysis, and +empty doc-gap fields before any `judge.json` or `round-summary.json` is +written. Malformed workflow envelopes, non-array `result` payloads, and +non-object judge entries are rejected before ingestion reads or persists the +payload. + +Adherence rubric (0-100): correct processor choice for the job (30), +no hallucinated/undocumented API usage (30), idiomatic use of documented +patterns — bookmarks, breadcrumbs, token walking (25), graceful handling +of edge cases the docs describe (15). Execution results measure +correctness separately; adherence is about HOW the API was used. + +For held-out tasks, judges may report regressions but their `doc_gaps` must be +tagged `held-out-only` and must not drive source edits unless the same issue +has train or probe evidence. + +For `shadow-doc-a/b`, ask judges to compare whether the variant changed +failure modes, hallucinated methods, local citations, or unnecessary fallback +branches. A variant "wins" only if it improves concept-level behavior or +discoverability without a clean regression. + +## 5. Aggregate and record + +Before aggregation, validate result completeness: + +```sh +python3 doc-experiment/tools/validate-round.py round-NN +``` + +It should report `judged` before aggregation. After aggregation, rerun it with +`--require-scored`; it should report `scored` before the score is trusted. +For metadata-backed rounds, validation also checks that staged scratch files +still match the SHA-256 hashes recorded at preparation time and that recorded +HTML API source digests match both their recorded git ref and the current +worktree. It also checks the current selected task prompts, references, and +hidden tests against the corpus file digests recorded at preparation time. +Trial artifacts are content-validated before a round can be considered +trial-complete: +`candidate.php` must be non-empty PHP, `response.json` must contain the +subject explanation/confidence shape, and `execution.json` must contain the +harness pass/total/cases shape. Persisted `judge.json` artifacts are +content-validated before a round can be considered judged or scored: every +expected trial must have an adherence score, hallucinated-method list, and +non-empty notes, and the task verdict must include non-empty failure analysis +plus structured doc-gap entries. +Lifecycle counts in `validate-round.py` include only valid artifacts; a +present but malformed `candidate.php`, `response.json`, `execution.json`, or +`judge.json` keeps the round incomplete and must be reconciled before +advancing. +`ingest-judges.py` validates trial completeness before writing judges and +judged-state completeness before writing a summary. It also preflights judge +workflow output shape: + +```sh +python3 doc-experiment/tools/validate-workflow-output.py judges \ + <judges-output.json> round-NN +``` + +`aggregate-round.py` refuses metadata-backed rounds with missing judges, +missing trial executions, or mismatched task sets. For metadata-backed scored +rounds, `validate-round.py --require-scored` recomputes the aggregate and +rejects a `round-summary.json` that no longer matches the persisted trial +executions, judge verdicts, metadata, and current corpus labels. +Trial and judge ingestion refuse to overwrite existing trial directories, +`subject-isolation.json`, `judge.json`, or `round-summary.json`. If an ingest +must be retried after a failed or invalid runner output, first record the +reconciliation in `LOG.md`, remove or quarantine the invalid artifacts +deliberately, and then rerun ingestion. +Judge ingestion removes artifacts it created in the current attempt if judge +writing, post-write validation, aggregation, or summary writing fails. + +```sh +python3 doc-experiment/tools/aggregate-round.py doc-experiment/results/round-NN +``` + +Record in LOG.md: round score, per-task scores, judge doc_gaps summary. +Commit results. For normal scored rounds, make source doc edits only when the +evidence supports a general hypothesis; commit one hypothesis at a time, +re-run the docs-only guard, and stage the next round. For calibration, +discoverability, or shadow-doc rounds, record the outcome and whether any +variant should be promoted; do not commit source docblock changes as part of +the same hypothesis. + +Before committing a source documentation hypothesis that includes examples, +verify the examples through `doc-experiment/harness/bootstrap.php` where +applicable. + +## Operational hazards + +- Workflow `args` may arrive as a JSON string; orchestration scripts should + parse defensively. +- Strong-judge session limits can kill a judge fan-out, sometimes returning an + empty result set with failures listed. Trial executions are already + persisted, so relaunch judges after reset rather than rerunning trials. +- Expected outputs are frozen. Regenerate them only when a reference + implementation intentionally changes, and review the diff before trusting the + new fixtures. +- Historical logs may use legacy labels such as "opus", "sonnet", or "haiku". + Treat those as historical role labels, not current model choices. + +## Storage layout + +``` +doc-experiment/results/round-NN/ + subject-isolation.json # runner-enforced docs-test-subject boundary attestation + <task-id>/ + trial-1/candidate.php + trial-1/response.json # explanation + confidence as returned + trial-1/execution.json + judge.json + round-summary.json # aggregate-round.py output +``` diff --git a/doc-experiment/README.md b/doc-experiment/README.md new file mode 100644 index 0000000000000..5db89e027aec2 --- /dev/null +++ b/doc-experiment/README.md @@ -0,0 +1,101 @@ +# Doc-improvement experiment + +## Process documents + +- `PLAN.md` — experiment contract, corpus rules, model policy, and + source-edit promotion criteria. +- `PROTOCOL.md` — operational runbook for scored rounds, weak-tier + calibration, discoverability probes, and shadow-doc A/B tests. +- `NEXT-HYPOTHESES.md` — post-round-17 hypothesis backlog: strong candidates, + signal-density tests, contrast cards, model ladder, and next sequence. +- `LOG.md` — round-by-round hypothesis and outcome narrative. + +## `render-docs-markdown.py` + +Deterministic JSON-to-Markdown renderer for phpdoc-parser output. Converts a +parsed PHP class (description, properties, methods, docblock tags) into a single +Markdown file optimized for an LLM agent reading the docs to write code against +the API. + +### Usage + +```sh +python3 render-docs-markdown.py -i input.json -o output.md +``` + +- `-i/--input` — phpdoc-parser JSON (array of file objects, each with `classes`). +- `-o/--output` — Markdown file to write (UTF-8, LF line endings). + +Standard library only; no dependencies. Python 3. + +### Output structure + +1. `# H1` class name + file-level description / long description. +2. `## Overview` — class doc, plus extends / implements / final / abstract. +3. `## Method Index` — navigation table (method, visibility, one-line description), source order. +4. `## Properties` — every property (all visibilities) with type from `@var` and description. +5. `## Methods` — one `### method()` per method in source order: PHP-style signature + (types from `@param` / `@return`), description, long description (HTML converted to + Markdown), then `@since` / `@param` / `@return` / `@throws` / `@see` / other tags. + +Line numbers, `uses` arrays, and `root` / `path` fields are excluded. + +### Guarantees and behavior + +- **Deterministic:** identical input bytes produce identical output bytes (JSON + order preserved; no timestamps, no randomness). +- **HTML to Markdown:** an `html.parser`-based converter handles the docblock tag + inventory (`p`, `br`, `pre`/`code` to fenced PHP, `code`, `em`, `strong`, + `ul`/`ol`/`li`, `h2`-`h4`, `blockquote`, tables, `a`). Entities are decoded. +- **Schema-drift guard:** an unknown HTML tag aborts loudly via `sys.exit` rather + than being silently dropped. (`<div>` in example prose is the one tolerated + non-structural tag and is re-emitted as literal text.) + +### Regenerate the sample outputs + +```sh +python3 render-docs-markdown.py \ + -i ../artifacts/html-tag-processor.json \ + -o /tmp/html-api-docs-eval-test/html-tag-processor.md + +python3 render-docs-markdown.py \ + -i ../artifacts/html-processor.json \ + -o /tmp/html-api-docs-eval-test/html-processor.md +``` + +<!-- The experiment harness documentation is appended below by a later step. --> + +## Round tools + +- `tools/audit-state.py` — read-only start-of-run audit for worktree drift, + latest trusted score, corpus comparability, prepared-round lifecycle, model + policy, valid current-policy baseline status, and next action. +- `tools/prepare-round.py` — preferred current entry point for a round. It + stages rendered docs, copies only selected `task.md` prompts into scratch, + and writes `results/round-NN/round-metadata.json`. +- `tools/verify-scratch-isolation.py` — checks a scratch directory exposes only + rendered docs and selected task prompts, never references, tests, plans, or + source files; it can also emit/verify SHA-256 hashes for staged files. +- `tools/source-digests.php` — emits raw-source and comment/whitespace-stripped + PHP token-stream SHA-256 fingerprints for the two HTML API source files. +- `tools/validate-corpus.py` — runs active corpus `reference.php` files against + their hidden `tests.json` fixtures and reports harness signal warnings. +- `tools/validate-round.py` — reports whether a round is prepared, partially + trialed, trial-complete, judged, or scored, verifies recorded scratch hashes, + verifies recorded source and corpus digests against the current worktree, + validates trial and judge artifact contents, recomputes metadata-backed + scored summaries, and lists missing artifacts. +- `tools/workflow-args.py` — emits trials or judges workflow JSON from + `round-metadata.json` so model policy and task IDs are not transcribed by + hand; it runs full round validation before emitting launch args, and can emit + a full launch manifest or atomically write the emitted JSON with `--output`. +- `tools/validate-workflow-output.py` — preflights trials or judges workflow + JSON envelopes, subject-isolation attestation, round metadata coverage, and + required payload shape before ingestion writes files. +- `tools/stage-round.sh` — low-level docs-only staging command used by + `prepare-round.py` and manual scratch variants. +- `tools/persist-trials.py` / `tools/ingest-trials.py` — persist subject + outputs, persist the runner isolation attestation, and execute candidates + against hidden tests. +- `tools/ingest-judges.py` / `tools/aggregate-round.py` — persist judge + verdicts and compute scored summaries. diff --git a/doc-experiment/corpus-retired/H01-strip-styles/reference.php b/doc-experiment/corpus-retired/H01-strip-styles/reference.php new file mode 100644 index 0000000000000..035103bf97ad0 --- /dev/null +++ b/doc-experiment/corpus-retired/H01-strip-styles/reference.php @@ -0,0 +1,9 @@ +<?php + +function strip_inline_styles( string $html ): string { + $processor = new WP_HTML_Tag_Processor( $html ); + while ( $processor->next_tag() ) { + $processor->remove_attribute( 'style' ); + } + return $processor->get_updated_html(); +} diff --git a/doc-experiment/corpus-retired/H01-strip-styles/task.md b/doc-experiment/corpus-retired/H01-strip-styles/task.md new file mode 100644 index 0000000000000..9f00b8285407c --- /dev/null +++ b/doc-experiment/corpus-retired/H01-strip-styles/task.md @@ -0,0 +1,21 @@ +# Strip inline styles + +Write a single PHP function: + +```php +function strip_inline_styles( string $html ): string +``` + +Remove the `style` attribute from every tag in the document and return the +modified HTML. All other attributes and everything else in the document +must be preserved byte-for-byte; whitespace that surrounded a removed +attribute remains where it was. Attribute names are case-insensitive +(`STYLE="…"` is a `style` attribute). Content inside HTML comments is not +real markup and must not be modified. + +Example (note the leftover spaces where the attributes were removed): + +```php +strip_inline_styles( '<p style="color:red">Hi <b style="x">there</b></p>' ) +// => '<p >Hi <b >there</b></p>' +``` diff --git a/doc-experiment/corpus-retired/H01-strip-styles/tests.json b/doc-experiment/corpus-retired/H01-strip-styles/tests.json new file mode 100644 index 0000000000000..ab44b61bc1045 --- /dev/null +++ b/doc-experiment/corpus-retired/H01-strip-styles/tests.json @@ -0,0 +1,51 @@ +{ + "id": "H01-strip-styles", + "title": "Strip inline styles", + "difficulty": "basic", + "split": "holdout", + "function": "strip_inline_styles", + "cases": [ + { + "id": "simple", + "args": [ + "<p style=\"color:red\">Hi <b style=\"x\">there</b></p>" + ], + "expected": "<p >Hi <b >there</b></p>" + }, + { + "id": "uppercase-attribute", + "args": [ + "<div STYLE=\"margin:0\">x</div>" + ], + "expected": "<div >x</div>" + }, + { + "id": "other-attributes-preserved", + "args": [ + "<p id=\"a\" style=\"x\" class=\"b\">text</p>" + ], + "expected": "<p id=\"a\" class=\"b\">text</p>" + }, + { + "id": "no-styles-unchanged", + "args": [ + "<p class=\"clean\">nothing</p>" + ], + "expected": "<p class=\"clean\">nothing</p>" + }, + { + "id": "comment-untouched", + "args": [ + "<!-- <p style=\"x\">fake</p> --><p style=\"y\">real</p>" + ], + "expected": "<!-- <p style=\"x\">fake</p> --><p >real</p>" + }, + { + "id": "valueless-style", + "args": [ + "<p style>odd</p>" + ], + "expected": "<p >odd</p>" + } + ] +} diff --git a/doc-experiment/corpus-retired/H02-data-attributes/reference.php b/doc-experiment/corpus-retired/H02-data-attributes/reference.php new file mode 100644 index 0000000000000..d7c4563a069a4 --- /dev/null +++ b/doc-experiment/corpus-retired/H02-data-attributes/reference.php @@ -0,0 +1,16 @@ +<?php + +function get_data_attributes( string $html ): array { + $processor = new WP_HTML_Tag_Processor( $html ); + if ( ! $processor->next_tag( 'DIV' ) ) { + return array(); + } + + $data = array(); + $attributes = $processor->get_attribute_names_with_prefix( 'data-' ); + foreach ( $attributes ?? array() as $name ) { + $data[ $name ] = $processor->get_attribute( $name ); + } + + return $data; +} diff --git a/doc-experiment/corpus-retired/H02-data-attributes/task.md b/doc-experiment/corpus-retired/H02-data-attributes/task.md new file mode 100644 index 0000000000000..1e41242d55fe4 --- /dev/null +++ b/doc-experiment/corpus-retired/H02-data-attributes/task.md @@ -0,0 +1,22 @@ +# Read data attributes + +Write a single PHP function: + +```php +function get_data_attributes( string $html ): array +``` + +Find the first `DIV` tag in the document and return an associative array of +all its `data-*` attributes: keys are the full lowercase attribute names +(including the `data-` prefix), values are the decoded attribute values as +the HTML API reports them (a string, or `true` for an attribute written +without a value). Preserve the order in which the attributes appear in the +tag. Return an empty array if there is no `DIV` or it has no `data-*` +attributes. + +Example: + +```php +get_data_attributes( '<div id="x" data-post-id="42" data-featured>…</div>' ) +// => [ 'data-post-id' => '42', 'data-featured' => true ] +``` diff --git a/doc-experiment/corpus-retired/H02-data-attributes/tests.json b/doc-experiment/corpus-retired/H02-data-attributes/tests.json new file mode 100644 index 0000000000000..2670eb0ea60b5 --- /dev/null +++ b/doc-experiment/corpus-retired/H02-data-attributes/tests.json @@ -0,0 +1,61 @@ +{ + "id": "H02-data-attributes", + "title": "Read data attributes", + "difficulty": "basic", + "split": "holdout", + "function": "get_data_attributes", + "cases": [ + { + "id": "mixed", + "args": [ + "<div id=\"x\" data-post-id=\"42\" data-featured>content</div>" + ], + "expected": { + "data-post-id": "42", + "data-featured": true + } + }, + { + "id": "uppercase-names-lowercased", + "args": [ + "<div DATA-TYPE=\"post\" data-Other=\"x\">y</div>" + ], + "expected": { + "data-type": "post", + "data-other": "x" + } + }, + { + "id": "entities-in-values", + "args": [ + "<div data-title=\"Fish & Chips\">z</div>" + ], + "expected": { + "data-title": "Fish & Chips" + } + }, + { + "id": "no-data-attributes", + "args": [ + "<div id=\"plain\" class=\"c\">w</div>" + ], + "expected": [] + }, + { + "id": "no-div", + "args": [ + "<p data-x=\"1\">not a div</p>" + ], + "expected": [] + }, + { + "id": "first-div-only", + "args": [ + "<div data-a=\"1\">x</div><div data-b=\"2\">y</div>" + ], + "expected": { + "data-a": "1" + } + } + ] +} diff --git a/doc-experiment/corpus-retired/H03-img-alt-audit/reference.php b/doc-experiment/corpus-retired/H03-img-alt-audit/reference.php new file mode 100644 index 0000000000000..191b9da2b2843 --- /dev/null +++ b/doc-experiment/corpus-retired/H03-img-alt-audit/reference.php @@ -0,0 +1,20 @@ +<?php + +function find_images_missing_alt( string $html ): array { + $processor = new WP_HTML_Tag_Processor( $html ); + + $missing = array(); + while ( $processor->next_tag( 'IMG' ) ) { + $src = $processor->get_attribute( 'src' ); + if ( ! is_string( $src ) || '' === $src ) { + continue; + } + + $alt = $processor->get_attribute( 'alt' ); + if ( null === $alt || true === $alt || '' === $alt ) { + $missing[] = $src; + } + } + + return $missing; +} diff --git a/doc-experiment/corpus-retired/H03-img-alt-audit/task.md b/doc-experiment/corpus-retired/H03-img-alt-audit/task.md new file mode 100644 index 0000000000000..6b99c6948399f --- /dev/null +++ b/doc-experiment/corpus-retired/H03-img-alt-audit/task.md @@ -0,0 +1,22 @@ +# Audit image alt text + +Write a single PHP function: + +```php +function find_images_missing_alt( string $html ): array +``` + +Return a list (numeric array) of the `src` values of every `IMG` tag whose +alternative text is missing or empty, in document order. "Missing or empty" +means: the `alt` attribute is absent, is written without a value +(`<img alt>`), or has the empty string as its value (`alt=""`). An `alt` +containing only whitespace (`alt=" "`) is **present** and does not count. +Skip `IMG` tags that have no `src` attribute, or whose `src` has no value +(`src` or `src=""`). The `src` values are the decoded attribute values. + +Example: + +```php +find_images_missing_alt( '<img src="a.jpg"><img src="b.jpg" alt="A bee"><img src="c.jpg" alt="">' ) +// => [ 'a.jpg', 'c.jpg' ] +``` diff --git a/doc-experiment/corpus-retired/H03-img-alt-audit/tests.json b/doc-experiment/corpus-retired/H03-img-alt-audit/tests.json new file mode 100644 index 0000000000000..a3c233a4d5068 --- /dev/null +++ b/doc-experiment/corpus-retired/H03-img-alt-audit/tests.json @@ -0,0 +1,76 @@ +{ + "id": "H03-img-alt-audit", + "title": "Audit image alt text", + "difficulty": "intermediate", + "split": "holdout", + "function": "find_images_missing_alt", + "cases": [ + { + "id": "mixed-states", + "args": [ + "<img src=\"a.jpg\"><img src=\"b.jpg\" alt=\"A bee\"><img src=\"c.jpg\" alt=\"\">" + ], + "expected": [ + "a.jpg", + "c.jpg" + ] + }, + { + "id": "valueless-alt", + "args": [ + "<img src=\"a.jpg\" alt>" + ], + "expected": [ + "a.jpg" + ] + }, + { + "id": "whitespace-alt-is-present", + "args": [ + "<img src=\"a.jpg\" alt=\" \">" + ], + "expected": [] + }, + { + "id": "no-src-skipped", + "args": [ + "<img alt=\"\"><img src=\"real.jpg\">" + ], + "expected": [ + "real.jpg" + ] + }, + { + "id": "empty-and-valueless-src-skipped", + "args": [ + "<img src alt=\"\"><img src=\"\" alt=\"\"><img src=\"real.jpg\" alt=\"\">" + ], + "expected": [ + "real.jpg" + ] + }, + { + "id": "entity-in-src", + "args": [ + "<img src=\"/i?a=1&b=2\">" + ], + "expected": [ + "/i?a=1&b=2" + ] + }, + { + "id": "all-good", + "args": [ + "<img src=\"a.jpg\" alt=\"one\"><img src=\"b.jpg\" alt=\"two\">" + ], + "expected": [] + }, + { + "id": "no-images", + "args": [ + "<p>none</p>" + ], + "expected": [] + } + ] +} diff --git a/doc-experiment/corpus/H04-remove-empty-paragraphs/reference.php b/doc-experiment/corpus/H04-remove-empty-paragraphs/reference.php new file mode 100644 index 0000000000000..c200048f726de --- /dev/null +++ b/doc-experiment/corpus/H04-remove-empty-paragraphs/reference.php @@ -0,0 +1,60 @@ +<?php + +function remove_empty_paragraphs( string $html ): string { + $processor = WP_HTML_Processor::create_fragment( $html ); + if ( null === $processor ) { + return $html; + } + + $output = ''; + $has_current = false; + + while ( $has_current || $processor->next_token() ) { + $has_current = false; + + if ( + '#tag' !== $processor->get_token_type() || + $processor->is_tag_closer() || + 'P' !== $processor->get_tag() || + 'html' !== $processor->get_namespace() + ) { + $output .= $processor->serialize_token(); + continue; + } + + $paragraph_opener = $processor->serialize_token(); + + while ( true ) { + if ( ! $processor->next_token() ) { + if ( null !== $processor->get_last_error() || $processor->paused_at_incomplete_token() ) { + return $html; + } + + $output .= $paragraph_opener; + break 2; + } + + if ( + '#tag' === $processor->get_token_type() && + $processor->is_tag_closer() && + 'P' === $processor->get_tag() && + 'html' === $processor->get_namespace() + ) { + continue 2; + } + + // Ignore tokens that disappear from normalized output, e.g. #presumptuous-tag. + if ( '' === $processor->serialize_token() ) { + continue; + } + + $output .= $paragraph_opener; + $has_current = true; + continue 2; + } + } + + return ( null === $processor->get_last_error() && ! $processor->paused_at_incomplete_token() ) + ? $output + : $html; +} diff --git a/doc-experiment/corpus/H04-remove-empty-paragraphs/task.md b/doc-experiment/corpus/H04-remove-empty-paragraphs/task.md new file mode 100644 index 0000000000000..1ed9e34b76ea4 --- /dev/null +++ b/doc-experiment/corpus/H04-remove-empty-paragraphs/task.md @@ -0,0 +1,20 @@ +# Remove empty paragraphs + +Write a single PHP function: + +```php +function remove_empty_paragraphs( string $html ): string +``` + +Given an HTML fragment (as found inside `<body>`), remove every empty `P` +element, and return a normalized serialization of the result. A paragraph +is empty only when it contains nothing at all; whitespace or child elements +count as content. If the fragment cannot be fully processed, return the +original HTML unchanged. + +Example: + +```php +remove_empty_paragraphs( '<p>Keep <em>me</em></p><p></p><p> </p>' ) +// => '<p>Keep <em>me</em></p><p> </p>' +``` diff --git a/doc-experiment/corpus/H04-remove-empty-paragraphs/tests.json b/doc-experiment/corpus/H04-remove-empty-paragraphs/tests.json new file mode 100644 index 0000000000000..bcf5534d38b39 --- /dev/null +++ b/doc-experiment/corpus/H04-remove-empty-paragraphs/tests.json @@ -0,0 +1,90 @@ +{ + "id": "H04-remove-empty-paragraphs", + "title": "Remove empty paragraphs", + "difficulty": "hard", + "split": "holdout", + "role": "core", + "commonness": "high", + "concept": "serialization", + "processor": "html", + "function": "remove_empty_paragraphs", + "cases": [ + { + "id": "mixed-paragraphs", + "args": [ + "<p>Keep <em>me</em></p><p></p><p> </p><p><img src=\"x.jpg\"></p>" + ], + "expected": "<p>Keep <em>me</em></p><p> </p><p><img src=\"x.jpg\"></p>" + }, + { + "id": "empty-and-whitespace", + "args": [ + "<p></p><p>\n\t </p><p>Text</p>" + ], + "expected": "<p>\n\t </p><p>Text</p>" + }, + { + "id": "entity-content", + "args": [ + "<p> </p><p> </p><p>A B</p>" + ], + "expected": "<p> </p><p> </p><p>A B</p>" + }, + { + "id": "element-only-kept", + "args": [ + "<p><br></p><p><span></span></p><p></p>" + ], + "expected": "<p><br></p><p><span></span></p>" + }, + { + "id": "comment-and-script-kept", + "args": [ + "<p><!--x--></p><p><script></script></p><p></p>" + ], + "expected": "<p><!--x--></p><p><script></script></p>" + }, + { + "id": "self-closing-paragraph-syntax", + "args": [ + "<p/><p>keep</p>" + ], + "expected": "<p>keep</p>" + }, + { + "id": "implicit-paragraph-close", + "args": [ + "<p>One<p> <div>Block</div><p>Two" + ], + "expected": "<p>One</p><p> </p><div>Block</div><p>Two</p>" + }, + { + "id": "case-insensitive-source", + "args": [ + "<P>Keep</P><P> </P>" + ], + "expected": "<p>Keep</p><p> </p>" + }, + { + "id": "no-paragraphs", + "args": [ + "<div>Nothing to remove</div>" + ], + "expected": "<div>Nothing to remove</div>" + }, + { + "id": "incomplete-input-unchanged", + "args": [ + "<p></p><img src=\"x" + ], + "expected": "<p></p><img src=\"x" + }, + { + "id": "unsupported-input-unchanged", + "args": [ + "<p></p><a><div><a></div></a>" + ], + "expected": "<p></p><a><div><a></div></a>" + } + ] +} diff --git a/doc-experiment/corpus/N01-remove-external-class/reference.php b/doc-experiment/corpus/N01-remove-external-class/reference.php new file mode 100644 index 0000000000000..c15ad4af79a67 --- /dev/null +++ b/doc-experiment/corpus/N01-remove-external-class/reference.php @@ -0,0 +1,9 @@ +<?php + +function remove_external_class( string $html ): string { + $processor = new WP_HTML_Tag_Processor( $html ); + while ( $processor->next_tag( 'A' ) ) { + $processor->remove_class( 'external' ); + } + return $processor->get_updated_html(); +} diff --git a/doc-experiment/corpus/N01-remove-external-class/task.md b/doc-experiment/corpus/N01-remove-external-class/task.md new file mode 100644 index 0000000000000..5c209c15f281c --- /dev/null +++ b/doc-experiment/corpus/N01-remove-external-class/task.md @@ -0,0 +1,10 @@ +# Remove a class from links + +Write a single PHP function: + +```php +function remove_external_class( string $html ): string +``` + +Remove the class `external` from every `A` tag that has it, and return the +modified HTML. diff --git a/doc-experiment/corpus/N01-remove-external-class/tests.json b/doc-experiment/corpus/N01-remove-external-class/tests.json new file mode 100644 index 0000000000000..b2eb7a51b53ca --- /dev/null +++ b/doc-experiment/corpus/N01-remove-external-class/tests.json @@ -0,0 +1,62 @@ +{ + "id": "N01-remove-external-class", + "title": "Remove a class from links", + "difficulty": "basic", + "split": "holdout", + "role": "core", + "commonness": "high", + "concept": "classes", + "processor": "tag", + "function": "remove_external_class", + "cases": [ + { + "id": "among-others", + "args": [ + "<a class=\"external link\" href=\"/x\">go</a>" + ], + "expected": "<a class=\"link\" href=\"/x\">go</a>" + }, + { + "id": "only-class-removes-attribute", + "args": [ + "<a class=\"external\" href=\"/x\">go</a>" + ], + "expected": "<a href=\"/x\">go</a>" + }, + { + "id": "no-class-untouched", + "args": [ + "<a href=\"/y\">stay</a>" + ], + "expected": "<a href=\"/y\">stay</a>" + }, + { + "id": "case-sensitive-not-removed", + "args": [ + "<a class=\"EXTERNAL\">caps</a>" + ], + "expected": "<a class=\"EXTERNAL\">caps</a>" + }, + { + "id": "multiple-links", + "args": [ + "<a class=\"external a\">1</a><a class=\"b external\">2</a><a class=\"c\">3</a>" + ], + "expected": "<a class=\"a\">1</a><a class=\"b\">2</a><a class=\"c\">3</a>" + }, + { + "id": "non-link-untouched", + "args": [ + "<div class=\"external\">not a link</div><a class=\"external\">link</a>" + ], + "expected": "<div class=\"external\">not a link</div><a >link</a>" + }, + { + "id": "middle-of-list", + "args": [ + "<a class=\"one external two\">mid</a>" + ], + "expected": "<a class=\"one two\">mid</a>" + } + ] +} diff --git a/doc-experiment/corpus/N02-collect-figure-images/reference.php b/doc-experiment/corpus/N02-collect-figure-images/reference.php new file mode 100644 index 0000000000000..10ec6671d9e05 --- /dev/null +++ b/doc-experiment/corpus/N02-collect-figure-images/reference.php @@ -0,0 +1,23 @@ +<?php + +function collect_figure_images( string $html ): array { + $processor = WP_HTML_Processor::create_fragment( $html ); + if ( null === $processor ) { + return array(); + } + + $sources = array(); + while ( $processor->next_tag( 'IMG' ) ) { + $ancestors = array_slice( $processor->get_breadcrumbs(), 0, -1 ); + if ( ! in_array( 'FIGURE', $ancestors, true ) ) { + continue; + } + + $src = $processor->get_attribute( 'src' ); + if ( is_string( $src ) && '' !== $src ) { + $sources[] = $src; + } + } + + return $sources; +} diff --git a/doc-experiment/corpus/N02-collect-figure-images/task.md b/doc-experiment/corpus/N02-collect-figure-images/task.md new file mode 100644 index 0000000000000..9a7452755f9d9 --- /dev/null +++ b/doc-experiment/corpus/N02-collect-figure-images/task.md @@ -0,0 +1,20 @@ +# Collect images inside figures + +Write a single PHP function: + +```php +function collect_figure_images( string $html ): array +``` + +Given an HTML fragment (as found inside `<body>`), return a list (numeric +array) of the decoded `src` values of every `IMG` element that is inside a +`FIGURE` element — at any depth, not only as a direct child — in document +order. Images outside any figure are excluded. Skip `IMG` tags that have +no `src` attribute or whose `src` has no value. + +Example: + +```php +collect_figure_images( '<figure><img src="in.jpg"></figure><p><img src="out.jpg"></p>' ) +// => [ 'in.jpg' ] +``` diff --git a/doc-experiment/corpus/N02-collect-figure-images/tests.json b/doc-experiment/corpus/N02-collect-figure-images/tests.json new file mode 100644 index 0000000000000..d2fcc46d7e679 --- /dev/null +++ b/doc-experiment/corpus/N02-collect-figure-images/tests.json @@ -0,0 +1,96 @@ +{ + "id": "N02-collect-figure-images", + "title": "Collect images inside figures", + "difficulty": "intermediate", + "split": "holdout", + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html", + "function": "collect_figure_images", + "cases": [ + { + "id": "in-and-out", + "args": [ + "<figure><img src=\"in.jpg\"></figure><p><img src=\"out.jpg\"></p>" + ], + "expected": [ + "in.jpg" + ] + }, + { + "id": "nested-depth", + "args": [ + "<figure><div><a href=\"#\"><img src=\"deep.jpg\"></a></div></figure>" + ], + "expected": [ + "deep.jpg" + ] + }, + { + "id": "multiple-figures", + "args": [ + "<figure><img src=\"a.jpg\"></figure><figure><img src=\"b.jpg\"><img src=\"c.jpg\"></figure>" + ], + "expected": [ + "a.jpg", + "b.jpg", + "c.jpg" + ] + }, + { + "id": "no-figures", + "args": [ + "<p><img src=\"x.jpg\"></p>" + ], + "expected": [] + }, + { + "id": "no-src-skipped", + "args": [ + "<figure><img alt=\"no src\"><img src=\"yes.jpg\"></figure>" + ], + "expected": [ + "yes.jpg" + ] + }, + { + "id": "empty-and-valueless-src-skipped", + "args": [ + "<figure><img src><img src=\"\"><img src=\"yes.jpg\"></figure>" + ], + "expected": [ + "yes.jpg" + ] + }, + { + "id": "entity-decoded-src", + "args": [ + "<figure><img src=\"/i?a=1&b=2\"></figure>" + ], + "expected": [ + "/i?a=1&b=2" + ] + }, + { + "id": "figcaption-sibling", + "args": [ + "<figure><img src=\"pic.jpg\"><figcaption>caption <img src=\"cap.jpg\"></figcaption></figure>" + ], + "expected": [ + "pic.jpg", + "cap.jpg" + ] + }, + { + "id": "unclosed-figure", + "args": [ + "<figure><img src=\"open.jpg\"><p>text<img src=\"later.jpg\">" + ], + "expected": [ + "open.jpg", + "later.jpg" + ] + } + ] +} diff --git a/doc-experiment/corpus/N03-first-list-count/reference.php b/doc-experiment/corpus/N03-first-list-count/reference.php new file mode 100644 index 0000000000000..ab659f85d0a15 --- /dev/null +++ b/doc-experiment/corpus/N03-first-list-count/reference.php @@ -0,0 +1,50 @@ +<?php + +function add_first_list_item_count( string $html ): string { + $processor = WP_HTML_Processor::create_fragment( $html ); + if ( null === $processor ) { + return $html; + } + + while ( $processor->next_tag() ) { + if ( in_array( $processor->get_tag(), array( 'UL', 'OL' ), true ) ) { + break; + } + } + + if ( ! in_array( $processor->get_tag(), array( 'UL', 'OL' ), true ) ) { + return $html; + } + + if ( ! $processor->set_bookmark( 'list' ) ) { + return $html; + } + + $list_depth = $processor->get_current_depth(); + $count = 0; + + while ( $processor->next_tag( array( 'tag_closers' => 'visit' ) ) ) { + if ( $processor->get_current_depth() < $list_depth ) { + break; + } + + if ( + ! $processor->is_tag_closer() && + 'LI' === $processor->get_tag() && + $processor->get_current_depth() === $list_depth + 1 + ) { + ++$count; + } + } + + if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) { + return $html; + } + + if ( ! $processor->seek( 'list' ) ) { + return $html; + } + + $processor->set_attribute( 'data-item-count', (string) $count ); + return $processor->get_updated_html(); +} diff --git a/doc-experiment/corpus/N03-first-list-count/task.md b/doc-experiment/corpus/N03-first-list-count/task.md new file mode 100644 index 0000000000000..4177082a39b42 --- /dev/null +++ b/doc-experiment/corpus/N03-first-list-count/task.md @@ -0,0 +1,20 @@ +# Count items in the first list + +Write a single PHP function: + +```php +function add_first_list_item_count( string $html ): string +``` + +Given an HTML fragment (as found inside `<body>`), find the first `UL` or +`OL` element, count its direct `LI` children, add a `data-item-count` +attribute with that count to the list element, and return the modified +HTML. If there is no list, return the HTML unchanged. If the first list +cannot be fully scanned, return the HTML unchanged. + +Example: + +```php +add_first_list_item_count( '<ul><li>A</li><li>B</li><li>C</li></ul>' ) +// => '<ul data-item-count="3"><li>A</li><li>B</li><li>C</li></ul>' +``` diff --git a/doc-experiment/corpus/N03-first-list-count/tests.json b/doc-experiment/corpus/N03-first-list-count/tests.json new file mode 100644 index 0000000000000..e3ea75c92c8c1 --- /dev/null +++ b/doc-experiment/corpus/N03-first-list-count/tests.json @@ -0,0 +1,90 @@ +{ + "id": "N03-first-list-count", + "title": "Count items in the first list", + "difficulty": "intermediate", + "split": "train", + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html", + "function": "add_first_list_item_count", + "cases": [ + { + "id": "simple-ul", + "args": [ + "<ul><li>A</li><li>B</li><li>C</li></ul>" + ], + "expected": "<ul data-item-count=\"3\"><li>A</li><li>B</li><li>C</li></ul>" + }, + { + "id": "ol", + "args": [ + "<ol><li>A</li><li>B</li></ol>" + ], + "expected": "<ol data-item-count=\"2\"><li>A</li><li>B</li></ol>" + }, + { + "id": "no-list", + "args": [ + "<p>No list here.</p>" + ], + "expected": "<p>No list here.</p>" + }, + { + "id": "existing-count-overwritten", + "args": [ + "<ul data-item-count=\"99\"><li>A</li></ul>" + ], + "expected": "<ul data-item-count=\"1\"><li>A</li></ul>" + }, + { + "id": "omitted-li-closers", + "args": [ + "<ul><li>one<li>two" + ], + "expected": "<ul data-item-count=\"2\"><li>one<li>two" + }, + { + "id": "nested-list-counts-direct-children", + "args": [ + "<ul><li><ul><li>x</li></ul><li>y" + ], + "expected": "<ul data-item-count=\"2\"><li><ul><li>x</li></ul><li>y" + }, + { + "id": "incomplete-token-inside-list", + "args": [ + "<ul><li><img src=\"x" + ], + "expected": "<ul><li><img src=\"x" + }, + { + "id": "incomplete-comment-inside-list", + "args": [ + "<ul><li><!-- cut" + ], + "expected": "<ul><li><!-- cut" + }, + { + "id": "incomplete-token-after-closed-list", + "args": [ + "<ul><li>one</li></ul><img src=\"x" + ], + "expected": "<ul data-item-count=\"1\"><li>one</li></ul><img src=\"x" + }, + { + "id": "unsupported-inside-list", + "args": [ + "<ul><li><a><div><a></div></a>" + ], + "expected": "<ul><li><a><div><a></div></a>" + }, + { + "id": "unsupported-after-closed-list", + "args": [ + "<ul><li>ok</li></ul><a><div><a></div></a>" + ], + "expected": "<ul data-item-count=\"1\"><li>ok</li></ul><a><div><a></div></a>" + } + ] +} diff --git a/doc-experiment/corpus/N04-normalize-or-placeholder/reference.php b/doc-experiment/corpus/N04-normalize-or-placeholder/reference.php new file mode 100644 index 0000000000000..04ef6b2cf7abc --- /dev/null +++ b/doc-experiment/corpus/N04-normalize-or-placeholder/reference.php @@ -0,0 +1,7 @@ +<?php + +function normalize_or_placeholder( string $html ): string { + $normalized = WP_HTML_Processor::normalize( $html ); + + return null === $normalized ? '<p>Unsupported HTML</p>' : $normalized; +} diff --git a/doc-experiment/corpus/N04-normalize-or-placeholder/task.md b/doc-experiment/corpus/N04-normalize-or-placeholder/task.md new file mode 100644 index 0000000000000..42e1bc93ccca2 --- /dev/null +++ b/doc-experiment/corpus/N04-normalize-or-placeholder/task.md @@ -0,0 +1,22 @@ +# Normalize HTML with a fallback + +Write a single PHP function: + +```php +function normalize_or_placeholder( string $html ): string +``` + +Given an HTML fragment (as found inside `<body>`), return its normalized +HTML serialization. If the HTML API cannot normalize the fragment, return +this exact fallback HTML: + +```html +<p>Unsupported HTML</p> +``` + +Example: + +```php +normalize_or_placeholder( '<div><p>Hello' ) +// => '<div><p>Hello</p></div>' +``` diff --git a/doc-experiment/corpus/N04-normalize-or-placeholder/tests.json b/doc-experiment/corpus/N04-normalize-or-placeholder/tests.json new file mode 100644 index 0000000000000..086182353e7c5 --- /dev/null +++ b/doc-experiment/corpus/N04-normalize-or-placeholder/tests.json @@ -0,0 +1,62 @@ +{ + "id": "N04-normalize-or-placeholder", + "title": "Normalize HTML with a fallback", + "difficulty": "intermediate", + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "normalization", + "processor": "html", + "function": "normalize_or_placeholder", + "cases": [ + { + "id": "unclosed-tags-normalize", + "args": [ + "<div><p>Hello" + ], + "expected": "<div><p>Hello</p></div>" + }, + { + "id": "table-normalizes", + "args": [ + "<table><tr><td>ok</table>" + ], + "expected": "<table><tbody><tr><td>ok</td></tr></tbody></table>" + }, + { + "id": "attribute-quoting-normalizes", + "args": [ + "<a href=x class=test>go</a>" + ], + "expected": "<a href=\"x\" class=\"test\">go</a>" + }, + { + "id": "entities-normalize", + "args": [ + "<p>Fish & chips</p>" + ], + "expected": "<p>Fish & chips</p>" + }, + { + "id": "unsupported-misnested-formatting", + "args": [ + "<b>one<i>two</b>three</i>" + ], + "expected": "<p>Unsupported HTML</p>" + }, + { + "id": "unsupported-anchor-misnesting", + "args": [ + "<a><div><a></div></a>" + ], + "expected": "<p>Unsupported HTML</p>" + }, + { + "id": "empty-fragment", + "args": [ + "" + ], + "expected": "" + } + ] +} diff --git a/doc-experiment/corpus/N05-document-title/reference.php b/doc-experiment/corpus/N05-document-title/reference.php new file mode 100644 index 0000000000000..6334c77bd988d --- /dev/null +++ b/doc-experiment/corpus/N05-document-title/reference.php @@ -0,0 +1,16 @@ +<?php + +function get_document_title( string $html ): ?string { + $processor = WP_HTML_Processor::create_full_parser( $html ); + if ( null === $processor ) { + return null; + } + + while ( $processor->next_tag( 'TITLE' ) ) { + if ( 'html' === $processor->get_namespace() ) { + return $processor->get_modifiable_text(); + } + } + + return null; +} diff --git a/doc-experiment/corpus/N05-document-title/task.md b/doc-experiment/corpus/N05-document-title/task.md new file mode 100644 index 0000000000000..79642c4a30c78 --- /dev/null +++ b/doc-experiment/corpus/N05-document-title/task.md @@ -0,0 +1,18 @@ +# Extract the document title + +Write a single PHP function: + +```php +function get_document_title( string $html ): ?string +``` + +Given a complete HTML document, return the text of its `<title>` element, +or `null` if the document has no `<title>` element. An existing but empty +`<title>` returns the empty string, not `null`. + +Example: + +```php +get_document_title( 'My Site — Home' ) +// => 'My Site — Home' +``` diff --git a/doc-experiment/corpus/N05-document-title/tests.json b/doc-experiment/corpus/N05-document-title/tests.json new file mode 100644 index 0000000000000..d3a7d5fb0b365 --- /dev/null +++ b/doc-experiment/corpus/N05-document-title/tests.json @@ -0,0 +1,62 @@ +{ + "id": "N05-document-title", + "title": "Extract the document title", + "difficulty": "intermediate", + "split": "holdout", + "role": "core", + "commonness": "high", + "concept": "full-document", + "processor": "html", + "function": "get_document_title", + "cases": [ + { + "id": "standard-document", + "args": [ + "My Site — Home

x

" + ], + "expected": "My Site — Home" + }, + { + "id": "entities-decoded", + "args": [ + "Fish & Chips" + ], + "expected": "Fish & Chips" + }, + { + "id": "no-title-null", + "args": [ + "

not a title

" + ], + "expected": null + }, + { + "id": "empty-title", + "args": [ + "" + ], + "expected": "" + }, + { + "id": "no-doctype", + "args": [ + "Bare" + ], + "expected": "Bare" + }, + { + "id": "attributes-on-elements", + "args": [ + "With Attrs" + ], + "expected": "With Attrs" + }, + { + "id": "minimal-document", + "args": [ + "Implied structure

body content

" + ], + "expected": "Implied structure" + } + ] +} diff --git a/doc-experiment/corpus/N06-extract-toc/reference.php b/doc-experiment/corpus/N06-extract-toc/reference.php new file mode 100644 index 0000000000000..987bfd466a9d8 --- /dev/null +++ b/doc-experiment/corpus/N06-extract-toc/reference.php @@ -0,0 +1,31 @@ +next_tag() ) { + $tag_name = $processor->get_tag(); + if ( ! in_array( $tag_name, array( 'H1', 'H2', 'H3', 'H4', 'H5', 'H6' ), true ) ) { + continue; + } + + $depth = $processor->get_current_depth(); + $text = ''; + while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + $toc[] = array( + 'level' => (int) substr( $tag_name, 1 ), + 'text' => $text, + ); + } + + return $toc; +} diff --git a/doc-experiment/corpus/N06-extract-toc/task.md b/doc-experiment/corpus/N06-extract-toc/task.md new file mode 100644 index 0000000000000..adc9499e2f0a8 --- /dev/null +++ b/doc-experiment/corpus/N06-extract-toc/task.md @@ -0,0 +1,24 @@ +# Extract a table of contents + +Write a single PHP function: + +```php +function extract_toc( string $html ): array +``` + +Given an HTML fragment (as found inside ``), return a list (numeric +array) describing every heading from `H1` through `H6` in document order. +Each entry is an associative array with: + +- `'level'`: the heading level, from `1` through `6`. +- `'text'`: the heading's text content. + +Markup inside a heading contributes its text, but not its tags. Headings +with no text are included with an empty string. + +Example: + +```php +extract_toc( '

Intro

Text

Details here

' ) +// => [ ['level' => 1, 'text' => 'Intro'], ['level' => 3, 'text' => 'Details here'] ] +``` diff --git a/doc-experiment/corpus/N06-extract-toc/tests.json b/doc-experiment/corpus/N06-extract-toc/tests.json new file mode 100644 index 0000000000000..6ff940e9117a3 --- /dev/null +++ b/doc-experiment/corpus/N06-extract-toc/tests.json @@ -0,0 +1,128 @@ +{ + "id": "N06-extract-toc", + "title": "Extract a table of contents", + "difficulty": "intermediate", + "split": "train", + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html", + "function": "extract_toc", + "cases": [ + { + "id": "basic-h1-h3", + "args": [ + "

Intro

Text

Details here

" + ], + "expected": [ + { + "level": 1, + "text": "Intro" + }, + { + "level": 3, + "text": "Details here" + } + ] + }, + { + "id": "all-heading-levels", + "args": [ + "

Title

Section

Subsection

Minor

Small
Tiny
" + ], + "expected": [ + { + "level": 1, + "text": "Title" + }, + { + "level": 2, + "text": "Section" + }, + { + "level": 3, + "text": "Subsection" + }, + { + "level": 4, + "text": "Minor" + }, + { + "level": 5, + "text": "Small" + }, + { + "level": 6, + "text": "Tiny" + } + ] + }, + { + "id": "nested-text-and-entities", + "args": [ + "

A B & C

" + ], + "expected": [ + { + "level": 2, + "text": "A B & C" + } + ] + }, + { + "id": "empty-heading", + "args": [ + "

" + ], + "expected": [ + { + "level": 2, + "text": "" + }, + { + "level": 3, + "text": "" + } + ] + }, + { + "id": "case-insensitive-source", + "args": [ + "

Upper

Lower
" + ], + "expected": [ + { + "level": 1, + "text": "Upper" + }, + { + "level": 6, + "text": "Lower" + } + ] + }, + { + "id": "implied-heading-close", + "args": [ + "

One

Two" + ], + "expected": [ + { + "level": 2, + "text": "One" + }, + { + "level": 3, + "text": "Two" + } + ] + }, + { + "id": "no-matches", + "args": [ + "

No headings here.

" + ], + "expected": [] + } + ] +} diff --git a/doc-experiment/corpus/T01-add-image-class/reference.php b/doc-experiment/corpus/T01-add-image-class/reference.php new file mode 100644 index 0000000000000..702ec67973496 --- /dev/null +++ b/doc-experiment/corpus/T01-add-image-class/reference.php @@ -0,0 +1,9 @@ +next_tag( 'IMG' ) ) { + $processor->add_class( 'wp-image' ); + } + return $processor->get_updated_html(); +} diff --git a/doc-experiment/corpus/T01-add-image-class/task.md b/doc-experiment/corpus/T01-add-image-class/task.md new file mode 100644 index 0000000000000..691aae2a62983 --- /dev/null +++ b/doc-experiment/corpus/T01-add-image-class/task.md @@ -0,0 +1,25 @@ +# Add a class to every image + +Write a single PHP function: + +```php +function add_image_class( string $html ): string +``` + +Given an HTML document or fragment, add the class `wp-image` to every `IMG` +tag, and return the modified HTML. Everything else in the document must be +preserved byte-for-byte. If an `IMG` tag already has classes, `wp-image` is +added to them (do not remove or reorder existing classes). + +Images that appear inside HTML comments are not real tags and must not be +modified. Tag name matching is case-insensitive (`` is an `IMG` tag). + +Examples: + +```php +add_image_class( '

' ) +// => '

' + +add_image_class( '' ) +// => '' +``` diff --git a/doc-experiment/corpus/T01-add-image-class/tests.json b/doc-experiment/corpus/T01-add-image-class/tests.json new file mode 100644 index 0000000000000..5c13b5c99b665 --- /dev/null +++ b/doc-experiment/corpus/T01-add-image-class/tests.json @@ -0,0 +1,69 @@ +{ + "id": "T01-add-image-class", + "title": "Add a class to every image", + "difficulty": "basic", + "split": "train", + "role": "smoke", + "commonness": "high", + "concept": "classes", + "processor": "tag", + "function": "add_image_class", + "cases": [ + { + "id": "simple", + "args": [ + "

" + ], + "expected": "

" + }, + { + "id": "multiple", + "args": [ + "
" + ], + "expected": "
" + }, + { + "id": "existing-classes", + "args": [ + "" + ], + "expected": "" + }, + { + "id": "uppercase-tag", + "args": [ + "" + ], + "expected": "" + }, + { + "id": "inside-comment-ignored", + "args": [ + "" + ], + "expected": "" + }, + { + "id": "no-images", + "args": [ + "

Nothing here.

" + ], + "expected": "

Nothing here.

" + }, + { + "id": "unquoted-attributes", + "args": [ + "" + ], + "expected": "" + }, + { + "id": "incomplete-tag-at-end", + "args": [ + "

text

text

next_tag( 'A' ) ) { + if ( null !== $processor->get_attribute( 'href' ) ) { + $processor->set_attribute( 'target', '_blank' ); + } + } + return $processor->get_updated_html(); +} diff --git a/doc-experiment/corpus/T02-link-targets/task.md b/doc-experiment/corpus/T02-link-targets/task.md new file mode 100644 index 0000000000000..7f4ed4d763c1a --- /dev/null +++ b/doc-experiment/corpus/T02-link-targets/task.md @@ -0,0 +1,21 @@ +# Open links in a new tab + +Write a single PHP function: + +```php +function add_link_targets( string $html ): string +``` + +For every `A` tag that has an `href` attribute, set its `target` attribute to +`_blank`, and return the modified HTML. The `href` attribute counts as +present even when its value is the empty string (`href=""`) or when it is +written without a value (``). `A` tags without an `href` attribute +must not be modified. An existing `target` attribute is overwritten. +Everything else in the document must be preserved byte-for-byte. + +Example: + +```php +add_link_targets( 'go stay' ) +// => 'go stay' +``` diff --git a/doc-experiment/corpus/T02-link-targets/tests.json b/doc-experiment/corpus/T02-link-targets/tests.json new file mode 100644 index 0000000000000..763df6d981bdc --- /dev/null +++ b/doc-experiment/corpus/T02-link-targets/tests.json @@ -0,0 +1,69 @@ +{ + "id": "T02-link-targets", + "title": "Open links in a new tab", + "difficulty": "basic", + "split": "train", + "role": "smoke", + "commonness": "high", + "concept": "attributes", + "processor": "tag", + "function": "add_link_targets", + "cases": [ + { + "id": "simple", + "args": [ + "go" + ], + "expected": "go" + }, + { + "id": "no-href-skipped", + "args": [ + "staygo" + ], + "expected": "staygo" + }, + { + "id": "empty-href-counts", + "args": [ + "go" + ], + "expected": "go" + }, + { + "id": "valueless-href-counts", + "args": [ + "go" + ], + "expected": "go" + }, + { + "id": "existing-target-overwritten", + "args": [ + "go" + ], + "expected": "go" + }, + { + "id": "uppercase-attribute", + "args": [ + "go" + ], + "expected": "go" + }, + { + "id": "inside-comment-ignored", + "args": [ + "go" + ], + "expected": "go" + }, + { + "id": "nested-markup-in-link", + "args": [ + "bold move" + ], + "expected": "bold move" + } + ] +} diff --git a/doc-experiment/corpus/T03-first-h1-text/reference.php b/doc-experiment/corpus/T03-first-h1-text/reference.php new file mode 100644 index 0000000000000..11967ff25f38c --- /dev/null +++ b/doc-experiment/corpus/T03-first-h1-text/reference.php @@ -0,0 +1,22 @@ +next_tag( 'H1' ) ) { + return null; + } + + $depth = $processor->get_current_depth(); + $text = ''; + while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/corpus/T03-first-h1-text/task.md b/doc-experiment/corpus/T03-first-h1-text/task.md new file mode 100644 index 0000000000000..67bc376203954 --- /dev/null +++ b/doc-experiment/corpus/T03-first-h1-text/task.md @@ -0,0 +1,23 @@ +# Extract the first heading's text + +Write a single PHP function: + +```php +function get_first_h1_text( string $html ): ?string +``` + +Given an HTML fragment (as found inside ``), return the text content +of the first `H1` element: the concatenation of all text nodes inside it, +including text inside nested elements, with character references decoded +(`&` becomes `&`). Markup contributes nothing — an `H1` containing only +an image has text content `""` (empty string, not null). + +Return `null` only when the document contains no `H1` element. + +Examples: + +```php +get_first_h1_text( '

Hello

' ) // => 'Hello' +get_first_h1_text( '

A B C

' ) // => 'A B C' +get_first_h1_text( '

No headings here.

' ) // => null +``` diff --git a/doc-experiment/corpus/T03-first-h1-text/tests.json b/doc-experiment/corpus/T03-first-h1-text/tests.json new file mode 100644 index 0000000000000..4da8df4d62fa7 --- /dev/null +++ b/doc-experiment/corpus/T03-first-h1-text/tests.json @@ -0,0 +1,69 @@ +{ + "id": "T03-first-h1-text", + "title": "Extract the first heading's text", + "difficulty": "basic", + "split": "train", + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html", + "function": "get_first_h1_text", + "cases": [ + { + "id": "simple", + "args": [ + "

Hello

" + ], + "expected": "Hello" + }, + { + "id": "nested-markup", + "args": [ + "

A B C

" + ], + "expected": "A B C" + }, + { + "id": "entities-decoded", + "args": [ + "

Fish & Chips — daily

" + ], + "expected": "Fish & Chips — daily" + }, + { + "id": "no-h1-null", + "args": [ + "

No headings here.

Sub

" + ], + "expected": null + }, + { + "id": "image-only-empty-string", + "args": [ + "

\"decorative\"

" + ], + "expected": "" + }, + { + "id": "first-of-two", + "args": [ + "

First

Second

" + ], + "expected": "First" + }, + { + "id": "nested-in-div", + "args": [ + "

Deep title

" + ], + "expected": "Deep title" + }, + { + "id": "unclosed-h1", + "args": [ + "

Runs to the end" + ], + "expected": "Runs to the end" + } + ] +} diff --git a/doc-experiment/corpus/T04-build-figure/reference.php b/doc-experiment/corpus/T04-build-figure/reference.php new file mode 100644 index 0000000000000..5f883ddce7f19 --- /dev/null +++ b/doc-experiment/corpus/T04-build-figure/reference.php @@ -0,0 +1,18 @@ +
.
' ); + + $processor->next_tag( 'IMG' ); + $processor->set_attribute( 'src', $url ); + $processor->set_attribute( 'alt', $alt ); + + while ( $processor->next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $processor->set_modifiable_text( $caption ); + break; + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/corpus/T04-build-figure/task.md b/doc-experiment/corpus/T04-build-figure/task.md new file mode 100644 index 0000000000000..ae797a41b2539 --- /dev/null +++ b/doc-experiment/corpus/T04-build-figure/task.md @@ -0,0 +1,30 @@ +# Build a figure fragment + +Write a single PHP function: + +```php +function build_figure( string $url, string $alt, string $caption ): string +``` + +Build and return an HTML fragment of exactly this shape: + +```html +
…
+``` + +where the `src` attribute holds `$url`, the `alt` attribute holds `$alt`, +and the `figcaption` contains `$caption` as its text. The attributes must +appear in exactly that order: `src`, then `alt`. The inputs are plain, +unescaped strings and may contain characters that are special in HTML +(`&`, `<`, `>`, quotes); they must be encoded so that a browser renders +exactly the provided values. + +Use the HTML API to construct the fragment — do not hand-assemble the +string with manual escaping. + +Example: + +```php +build_figure( 'https://example.com/dog.jpg', 'A dog', 'My dog' ) +// => '
A dog
My dog
' +``` diff --git a/doc-experiment/corpus/T04-build-figure/tests.json b/doc-experiment/corpus/T04-build-figure/tests.json new file mode 100644 index 0000000000000..f968899486a88 --- /dev/null +++ b/doc-experiment/corpus/T04-build-figure/tests.json @@ -0,0 +1,76 @@ +{ + "id": "T04-build-figure", + "title": "Build a figure fragment", + "difficulty": "basic", + "split": "train", + "role": "core", + "commonness": "high", + "concept": "attributes", + "processor": "tag", + "function": "build_figure", + "cases": [ + { + "id": "simple", + "args": [ + "https://example.com/dog.jpg", + "A dog", + "My dog" + ], + "expected": "
\"A
My dog
" + }, + { + "id": "ampersand-in-caption", + "args": [ + "https://example.com/a.jpg", + "Pair", + "Fish & Chips" + ], + "expected": "
\"Pair\"
Fish & Chips
" + }, + { + "id": "quotes-in-alt", + "args": [ + "https://example.com/a.jpg", + "The \"best\" photo", + "Caption" + ], + "expected": "
\"The
Caption
" + }, + { + "id": "special-chars-in-url", + "args": [ + "/photo?title=\"A&B\"&raw=", + "Alt", + "Caption" + ], + "expected": "
\"Alt\"
Caption
" + }, + { + "id": "angle-brackets-in-caption", + "args": [ + "https://example.com/a.jpg", + "Code", + "Use tags & enjoy" + ], + "expected": "
\"Code\"
Use <em> tags & enjoy
" + }, + { + "id": "unicode", + "args": [ + "https://example.com/a.jpg", + "Schnée ☃", + "Winter 🌨️ scene" + ], + "expected": "
\"Schnée
Winter 🌨️ scene
" + }, + { + "id": "html-in-caption-not-parsed", + "args": [ + "https://example.com/a.jpg", + "alt", + "" + ], + "expected": "
\"alt\"
<script>alert(1)</script>
" + } + ] +} diff --git a/doc-experiment/corpus/T05-text-excerpt/reference.php b/doc-experiment/corpus/T05-text-excerpt/reference.php new file mode 100644 index 0000000000000..9c5fffddfebc2 --- /dev/null +++ b/doc-experiment/corpus/T05-text-excerpt/reference.php @@ -0,0 +1,30 @@ +next_token() ) { + if ( + '#text' === $processor->get_token_type() || + ( + ! $processor->is_tag_closer() && + in_array( $processor->get_token_name(), array( 'TEXTAREA', 'TITLE' ), true ) + ) + ) { + $text .= $processor->get_modifiable_text(); + if ( mb_strlen( $text, 'UTF-8' ) >= $max_codepoints ) { + break; + } + } + } + + return mb_substr( $text, 0, $max_codepoints, 'UTF-8' ); +} diff --git a/doc-experiment/corpus/T05-text-excerpt/task.md b/doc-experiment/corpus/T05-text-excerpt/task.md new file mode 100644 index 0000000000000..7628ffdd0e556 --- /dev/null +++ b/doc-experiment/corpus/T05-text-excerpt/task.md @@ -0,0 +1,30 @@ +# Plain-text excerpt with a length limit + +Write a single PHP function: + +```php +function html_text_excerpt( string $html, int $max_codepoints ): string +``` + +Given an HTML fragment (as found inside ``), return its text content: +the concatenation of every text node in document order, with character +references decoded. Do not normalize or collapse whitespace — whitespace +between elements that the parser reports as text nodes is included as-is. +Text in `Doc & Title

Body

", + 1000 + ], + "expected": "form & fieldDoc & TitleBody" + }, + { + "id": "interelement-whitespace", + "args": [ + "

a

b

", + 1000 + ], + "expected": "a b" + }, + { + "id": "zero-limit", + "args": [ + "

anything

", + 0 + ], + "expected": "" + }, + { + "id": "malformed-nesting", + "args": [ + "

one

two

tail", + 1000 + ], + "expected": "onetwotail" + } + ] +} diff --git a/doc-experiment/corpus/T06-collect-links/reference.php b/doc-experiment/corpus/T06-collect-links/reference.php new file mode 100644 index 0000000000000..67a932f61bb99 --- /dev/null +++ b/doc-experiment/corpus/T06-collect-links/reference.php @@ -0,0 +1,31 @@ +next_tag( 'A' ) ) { + $href = $processor->get_attribute( 'href' ); + if ( ! is_string( $href ) ) { + continue; + } + + $depth = $processor->get_current_depth(); + $text = ''; + while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + $links[] = array( + 'href' => $href, + 'text' => $text, + ); + } + + return $links; +} diff --git a/doc-experiment/corpus/T06-collect-links/task.md b/doc-experiment/corpus/T06-collect-links/task.md new file mode 100644 index 0000000000000..519cd1afe3523 --- /dev/null +++ b/doc-experiment/corpus/T06-collect-links/task.md @@ -0,0 +1,27 @@ +# Collect all links + +Write a single PHP function: + +```php +function collect_links( string $html ): array +``` + +Given an HTML fragment (as found inside ``), return a list (numeric +array) describing every `A` tag whose `href` attribute has a string value, +in document order. Each entry is an associative array: + +- `'href'`: the attribute's decoded value as the HTML API reports it + (a string). +- `'text'`: the link's text content — all text nodes inside the `A` + element concatenated, character references decoded, markup contributing + nothing. + +`A` tags without an `href` attribute, or with an `href` written without a +value, are excluded. Return an empty array when there are no links. + +Example: + +```php +collect_links( '

First skip second link

' ) +// => [ ['href' => '/a', 'text' => 'First'], ['href' => '/b', 'text' => 'second link'] ] +``` diff --git a/doc-experiment/corpus/T06-collect-links/tests.json b/doc-experiment/corpus/T06-collect-links/tests.json new file mode 100644 index 0000000000000..266c45db9d485 --- /dev/null +++ b/doc-experiment/corpus/T06-collect-links/tests.json @@ -0,0 +1,103 @@ +{ + "id": "T06-collect-links", + "title": "Collect all links", + "difficulty": "intermediate", + "split": "train", + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html", + "function": "collect_links", + "cases": [ + { + "id": "simple", + "args": [ + "

First and second link

" + ], + "expected": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ] + }, + { + "id": "no-href-excluded", + "args": [ + "anchorreal" + ], + "expected": [ + { + "href": "/only", + "text": "real" + } + ] + }, + { + "id": "entity-in-href-decoded", + "args": [ + "query" + ], + "expected": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ] + }, + { + "id": "valueless-href", + "args": [ + "empty" + ], + "expected": [] + }, + { + "id": "image-link-empty-text", + "args": [ + "\"pic\"" + ], + "expected": [ + { + "href": "/img", + "text": "" + } + ] + }, + { + "id": "entities-in-text", + "args": [ + "Fish & Chips" + ], + "expected": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ] + }, + { + "id": "no-links", + "args": [ + "

plain text

" + ], + "expected": [] + }, + { + "id": "unclosed-link", + "args": [ + "runs to the end" + ], + "expected": [ + { + "href": "/x", + "text": "runs to the end" + } + ] + } + ] +} diff --git a/doc-experiment/corpus/T07-nested-lists/reference.php b/doc-experiment/corpus/T07-nested-lists/reference.php new file mode 100644 index 0000000000000..a79d2d52b89e8 --- /dev/null +++ b/doc-experiment/corpus/T07-nested-lists/reference.php @@ -0,0 +1,25 @@ +next_tag() ) { + $tag_name = $processor->get_tag(); + if ( 'UL' !== $tag_name && 'OL' !== $tag_name ) { + continue; + } + + $ancestors = array_slice( $processor->get_breadcrumbs(), 0, -1 ); + if ( + in_array( 'UL', $ancestors, true ) || + in_array( 'OL', $ancestors, true ) + ) { + $processor->add_class( 'nested-list' ); + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/corpus/T07-nested-lists/task.md b/doc-experiment/corpus/T07-nested-lists/task.md new file mode 100644 index 0000000000000..9fe1516d1649f --- /dev/null +++ b/doc-experiment/corpus/T07-nested-lists/task.md @@ -0,0 +1,19 @@ +# Mark nested lists + +Write a single PHP function: + +```php +function mark_nested_lists( string $html ): string +``` + +Given an HTML fragment (as found inside ``), add the class +`nested-list` to every `UL` or `OL` element that has a `UL` or `OL` ancestor +anywhere above it. Top-level lists must not be modified. Return the modified +HTML; everything else must be preserved byte-for-byte. + +Examples: + +```php +mark_nested_lists( '
  • One
    1. Nested
' ) +// => '
  • One
    1. Nested
' +``` diff --git a/doc-experiment/corpus/T07-nested-lists/tests.json b/doc-experiment/corpus/T07-nested-lists/tests.json new file mode 100644 index 0000000000000..db8c9c0838ac5 --- /dev/null +++ b/doc-experiment/corpus/T07-nested-lists/tests.json @@ -0,0 +1,62 @@ +{ + "id": "T07-nested-lists", + "title": "Mark nested lists", + "difficulty": "intermediate", + "split": "train", + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html", + "function": "mark_nested_lists", + "cases": [ + { + "id": "simple-ol-inside-ul", + "args": [ + "
  • One
    1. Nested
" + ], + "expected": "
  • One
    1. Nested
" + }, + { + "id": "top-level-lists-untouched", + "args": [ + "
  1. Top
  • Also top
" + ], + "expected": "
  1. Top
  • Also top
" + }, + { + "id": "ul-inside-ol", + "args": [ + "
  1. One
    • Nested
" + ], + "expected": "
  1. One
    • Nested
" + }, + { + "id": "deep-descendant", + "args": [ + "
    1. Deep
" + ], + "expected": "
    1. Deep
" + }, + { + "id": "existing-class-preserved", + "args": [ + "
    1. Nested
" + ], + "expected": "
    1. Nested
" + }, + { + "id": "multiple-nested-levels", + "args": [ + "
  • A
    1. B
      • C
" + ], + "expected": "
  • A
    1. B
      • C
" + }, + { + "id": "mixed-document", + "args": [ + "

intro

  • A
    1. B
  1. C
" + ], + "expected": "

intro

  • A
    1. B
  1. C
" + } + ] +} diff --git a/doc-experiment/corpus/T08-table-extract/reference.php b/doc-experiment/corpus/T08-table-extract/reference.php new file mode 100644 index 0000000000000..1e0f77d1a1be5 --- /dev/null +++ b/doc-experiment/corpus/T08-table-extract/reference.php @@ -0,0 +1,53 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $row = null; + $cell = null; + + while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) { + $token_name = $processor->get_token_name(); + + if ( '#text' === $token_name ) { + if ( null !== $cell ) { + $cell .= $processor->get_modifiable_text(); + } + continue; + } + + $is_closer = $processor->is_tag_closer(); + + switch ( $token_name ) { + case 'TR': + if ( $is_closer ) { + if ( null !== $row ) { + $rows[] = $row; + $row = null; + } + } else { + $row = array(); + } + break; + + case 'TD': + case 'TH': + if ( $is_closer ) { + if ( null !== $row && null !== $cell ) { + $row[] = $cell; + } + $cell = null; + } else { + $cell = ''; + } + break; + } + } + + return $rows; +} diff --git a/doc-experiment/corpus/T08-table-extract/task.md b/doc-experiment/corpus/T08-table-extract/task.md new file mode 100644 index 0000000000000..9666aa05ba1b2 --- /dev/null +++ b/doc-experiment/corpus/T08-table-extract/task.md @@ -0,0 +1,23 @@ +# Extract table data + +Write a single PHP function: + +```php +function table_to_array( string $html ): array +``` + +Given an HTML fragment (as found inside ``), find the first `TABLE` +element and return its contents as a list of rows; each row is a list of +its cells' text content in order. Both `TD` and `TH` cells count. A cell's +text content is the concatenation of all text nodes inside it, character +references decoded, markup contributing nothing. + +Handle ordinary HTML table structure as a browser would. You may assume +tables are not nested. Return an empty array when there is no table. + +Example: + +```php +table_to_array( '
NameAge
Ada36
' ) +// => [ ['Name', 'Age'], ['Ada', '36'] ] +``` diff --git a/doc-experiment/corpus/T08-table-extract/tests.json b/doc-experiment/corpus/T08-table-extract/tests.json new file mode 100644 index 0000000000000..8c8abecd11038 --- /dev/null +++ b/doc-experiment/corpus/T08-table-extract/tests.json @@ -0,0 +1,115 @@ +{ + "id": "T08-table-extract", + "title": "Extract table data", + "difficulty": "intermediate", + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "traversal", + "processor": "html", + "function": "table_to_array", + "cases": [ + { + "id": "simple", + "args": [ + "
NameAge
Ada36
" + ], + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ] + }, + { + "id": "thead-tbody", + "args": [ + "
H
a
b
" + ], + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ] + }, + { + "id": "omitted-closers", + "args": [ + "
onetwo
threefour
" + ], + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ] + }, + { + "id": "markup-in-cells", + "args": [ + "
bold textlink
" + ], + "expected": [ + [ + "bold text", + "link" + ] + ] + }, + { + "id": "entities-in-cells", + "args": [ + "
Fish & Chips
" + ], + "expected": [ + [ + "Fish & Chips" + ] + ] + }, + { + "id": "no-table", + "args": [ + "

no tables here

" + ], + "expected": [] + }, + { + "id": "first-table-only", + "args": [ + "
first
second
" + ], + "expected": [ + [ + "first" + ] + ] + }, + { + "id": "empty-cells", + "args": [ + "
x
" + ], + "expected": [ + [ + "", + "x" + ] + ] + } + ] +} diff --git a/doc-experiment/corpus/T09-mark-keyword/reference.php b/doc-experiment/corpus/T09-mark-keyword/reference.php new file mode 100644 index 0000000000000..61d784002c202 --- /dev/null +++ b/doc-experiment/corpus/T09-mark-keyword/reference.php @@ -0,0 +1,22 @@ +next_token() ) { + if ( + '#text' === $processor->get_token_type() && + str_contains( $processor->get_modifiable_text(), $keyword ) + ) { + $output .= '' . $processor->serialize_token() . ''; + } else { + $output .= $processor->serialize_token(); + } + } + + return $output; +} diff --git a/doc-experiment/corpus/T09-mark-keyword/task.md b/doc-experiment/corpus/T09-mark-keyword/task.md new file mode 100644 index 0000000000000..3cb98c5da5f7b --- /dev/null +++ b/doc-experiment/corpus/T09-mark-keyword/task.md @@ -0,0 +1,39 @@ +# Highlight a keyword in text + +Write a single PHP function: + +```php +function mark_keyword( string $html, string $keyword ): string +``` + +Given an HTML fragment (as found inside ``) and a non-empty keyword, +return a **normalized** serialization of the fragment in which every text +node whose decoded text contains the keyword (case-sensitive substring +match) is wrapped in a `` element. The entire text node is wrapped, +not just the matching substring. + +Notes: + +- The match is against the decoded text, so a keyword spelled with + character references in the source still matches. +- Keywords appearing inside attribute values, comments, or split across + multiple text nodes do not match. +- Text stored directly on special text-bearing elements such as + ``, ordinary subtree text is `AB`: inline markup may split text across multiple `#text` tokens, but SCRIPT and TEXTAREA do not add ordinary `#text` descendants. + +Opt-in policy: when the caller's contract explicitly asks for a special element's content, whitelist those opening element tokens and read their {@see WP_HTML_Tag_Processor::get_modifiable_text}. TITLE and TEXTAREA provide decoded text on their opener tokens; SCRIPT and STYLE provide raw script or stylesheet text. Do not include special element opener text merely because it is available. + +Negative example: + +```php +// Too broad for ordinary subtree or heading text: this can read comments, +// processing instructions, and special-element opener text. +if ( null !== $processor->get_modifiable_text() ) { + $text .= $processor->get_modifiable_text(); +} +``` +```` + +Purpose: test whether a default-first negative example reduces +special-element opener text over-inclusion in ordinary heading/subtree text +without regressing tasks that explicitly ask for TITLE/TEXTAREA text. diff --git a/doc-experiment/results/round-28/codex-judges-output.json b/doc-experiment/results/round-28/codex-judges-output.json new file mode 100644 index 0000000000000..c9a7bb72e25e3 --- /dev/null +++ b/doc-experiment/results/round-28/codex-judges-output.json @@ -0,0 +1,133 @@ +{ + "result": [ + { + "id": "T03-first-h1-text", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the correct tree-aware WP_HTML_Processor, all called methods are documented in the rendered docs, and the solution follows the documented depth-bounded next_token() subtree walk. It appends only #text tokens via get_modifiable_text(), preserving empty text content and decoded entities. Passed 8/8 with no _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same API shape as the reference: create_fragment(), next_tag('H1'), record get_current_depth(), then next_token() while depth remains >= the opener depth. No undocumented calls. Handles nested markup, decoded text, absent H1, image-only H1, multiple H1s, and unclosed H1 as documented. Passed 8/8 with no _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly chose WP_HTML_Processor for subtree text extraction and used only documented methods. The #text-only filtering avoids treating markup, comments, or special-token modifiable text as ordinary heading text. Passed 8/8 with no _doing_it_wrong records." + } + ], + "failure_analysis": "No hidden cases failed across the three trials; each trial passed all 8 frozen expectations, for 24/24 total case passes, and execution.json reported no _doing_it_wrong records. The rendered docs were strong for this task: the HTML Processor overview explicitly says to choose it when structure matters, including collecting an element's text; the 'Recipe: collect DOM-style text from a subtree' shows the exact pattern of create_fragment(), next_tag(), get_current_depth(), next_token(), #text filtering, and get_modifiable_text(); next_token() explains that malformed input still yields closing tokens for unclosed elements; get_current_depth() explains why the guard must be >= rather than >; and get_modifiable_text() states that #text results are decoded UTF-8. The only near-miss is that the empty-container behavior is easier to infer from the next_token() section than from the subtree text recipe itself, but all candidates inferred it correctly for image-only H1.", + "doc_gaps": [ + { + "location": "html-processor.md, 'Recipe: collect DOM-style text from a subtree'", + "problem": "The recipe demonstrates accumulating ordinary #text tokens, but it does not explicitly state the result when the matched container has no ordinary text descendants.", + "suggestion": "Add a general note that a successful subtree text extraction can legitimately produce an empty string when the element exists but contains no ordinary #text descendants, such as an empty element or a container with only void/media elements." + }, + { + "location": "html-processor.md, create_fragment() / HTML Support", + "problem": "create_fragment() documents a nullable return but gives little operational guidance for callers doing read-only extraction when creation fails or the processor later aborts on unsupported markup.", + "suggestion": "Clarify the general failure contract: create_fragment() may return null when the requested context or encoding is unsupported, and callers that must distinguish 'not found' from parser unsupported/truncated states should inspect get_last_error() and paused_at_incomplete_token() after walking." + } + ] + } + }, + { + "id": "N06-extract-toc", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 95, + "hallucinated_methods": [], + "notes": "Uses WP_HTML_Processor::create_fragment(), walks tokens, identifies heading opener/closer tokens with documented get_token_name()/is_tag_closer(), and appends only documented #text get_modifiable_text(). Less directly idiomatic than the subtree-depth recipe because it maintains a single heading state instead of anchoring each heading on get_current_depth(), but this is still supported by the next_token() documentation stating closers, including virtual closers, are visited." + }, + { + "trial_id": "trial-2", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Correctly chooses the HTML Processor, uses documented next_token(), get_tag(), get_current_depth(), is_tag_closer(), get_token_type(), and get_modifiable_text(), and handles final virtual/EOF closure with state. It mirrors the documented depth-bound subtree idea, though implemented as one state-machine pass rather than the exact next_tag-then-inner-walk recipe." + }, + { + "trial_id": "trial-3", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Closest to the documented pattern: find heading openers with next_tag(), record depth, walk the subtree with next_token() while get_current_depth() >= opener depth, and append only #text get_modifiable_text(). All called API methods are documented. The final get_last_error() check is documented and conservative, though the task did not explicitly require rejecting unsupported-fragment partial results." + } + ], + "failure_analysis": "All three trials passed all 7 frozen cases. The rendered docs did well on the exact concepts this task needs: the HTML Processor overview says to choose it for structure, collecting text, walking subtrees, and implied/virtual closing tags; create_fragment() says it is for body fragments; the DOM-style text recipe explicitly says to append only #text tokens and not every token with modifiable text; next_token() explains that implicit and end-of-input closers are visited; get_current_depth() explains the >= depth guard; get_modifiable_text() explains decoded #text output. Near-misses were mostly around cursor shape: trial-1 relied on closer-driven state rather than depth anchoring, and trial-2 used a top-of-loop depth-drop flush. Both are defensible because next_token() documents virtual closers, but the nested-loop/cursor warning could still be easy to misapply for repeated-region extraction. Trial-3 also exposed a policy ambiguity: get_last_error() is documented, but extraction docs do not state whether read-only extractors should return partial results, empty results, or a sentinel on unsupported markup or trailing incomplete tokens.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::next_token() / subtree text examples", + "problem": "The docs explain single-region text collection, but repeated-region extraction still requires callers to reason carefully about one shared cursor, boundary tokens, and virtual closers.", + "suggestion": "Add a general repeated-region extraction example using neutral elements, showing both closer-driven state and depth-bounded walking, with a note about when each shape is appropriate." + }, + { + "location": "WP_HTML_Processor::get_current_depth()", + "problem": "The >= guard is documented, but the consequence for continuing after an inner bounded walk exits is subtle.", + "suggestion": "State explicitly that after a bounded subtree walk exits, the processor remains matched on the token that ended the walk; callers should account for that when continuing an outer scan." + }, + { + "location": "WP_HTML_Processor::get_last_error() and paused_at_incomplete_token() guidance", + "problem": "The docs clearly mention mutation/rewrite policies, but read-only extraction policy for unsupported markup or truncated input is left to inference.", + "suggestion": "Add guidance for read-only extractors: document when partial extracted data is reliable, when unsupported-parser aborts invalidate remaining traversal, and how callers should choose between returning partial data, empty data, or an error sentinel." + }, + { + "location": "WP_HTML_Processor overview / text extraction recipe", + "problem": "The recipe explains ordinary #text versus special-element modifiable text, but the distinction can be missed when extracting visible-ish text from arbitrary subtrees.", + "suggestion": "Add a compact table of token types and whether they count for ordinary DOM text, including comments, SCRIPT/STYLE/TITLE/TEXTAREA, and normal inline elements." + } + ] + } + }, + { + "id": "T05-text-excerpt", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), walked tokens with next_token(), collected only #text plus TITLE/TEXTAREA opener text, and used get_modifiable_text() with UTF-8 mb_* truncation. All called HTML API methods are present in the rendered docs; no _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Correct processor and documented methods throughout. The implementation follows the documented token-walk pattern and correctly excludes SCRIPT/STYLE/comment modifiable text. Minor idiom issue: it always scans the full fragment before truncating, so it misses an easy early-exit opportunity for a length-limited excerpt, but this is not an API misuse." + }, + { + "trial_id": "trial-3", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Correct processor choice and no undocumented API calls. It uses the documented #text plus whitelisted special-element opener pattern and decoded get_modifiable_text() output. Minor idiom issue: the in-loop limit check uses > rather than >=, so exact-limit cases keep scanning unnecessarily; final output remains correct." + } + ], + "failure_analysis": "No hidden case failed in any trial: all three passed 10/10 with no _doing_it_wrong or trigger_error records. The docs worked well here because the processor-choice guidance explicitly says to use WP_HTML_Processor, not WP_HTML_Tag_Processor, when collecting text content or relying on implied/malformed structure. The HTML Processor text-extraction recipe steered subjects toward next_token(), #text filtering, and get_modifiable_text(). The special-element passages were especially effective: they explain that TITLE and TEXTAREA carry decoded text on the opener token, while SCRIPT and STYLE carry raw non-DOM text that should not be included unless explicitly requested. The get_modifiable_text() docs also made decoded UTF-8 output and mb_* truncation clear enough for all trials to handle entities, accents, and emoji. Near misses: the subjects had to compose two separate passages, ordinary text extraction plus special-element opt-in, to solve a full-fragment text-content task; there is no compact read-only fragment text recipe. Also, the overview negative example checks get_modifiable_text() against null even though the method contract says it always returns string, which could teach a misleading guard in other tasks.", + "doc_gaps": [ + { + "location": "html-processor.md, Recipe: collect DOM-style text from a subtree / Opt-in policy", + "problem": "The ordinary #text recipe and the TITLE/TEXTAREA special-element rule are adjacent but still separate. A reader must infer the full pattern for read-only fragment text content that includes selected special elements while excluding SCRIPT, STYLE, and comments.", + "suggestion": "Add a short general example for read-only text collection over a fragment: walk next_token(), append #text, optionally append whitelisted opener-token text for TITLE/TEXTAREA, and do not append every token with modifiable text." + }, + { + "location": "html-processor.md, overview negative example under text extraction", + "problem": "The negative example uses `null !== $processor->get_modifiable_text()`, but the method documentation says get_modifiable_text() returns `string` and uses `''` for tokens without modifiable text.", + "suggestion": "Replace the null-check example with a token-type/name guard, or explicitly state that get_modifiable_text() is not a presence test because it never returns null." + }, + { + "location": "html-processor.md, next_token() incomplete-input guidance", + "problem": "The docs clearly discuss incomplete input for mutations and token serialization, but the recommended policy for read-only extraction is less explicit: return best-effort text, reject on paused_at_incomplete_token(), or make it caller-defined.", + "suggestion": "Add a sentence to read-only token-walk examples saying incomplete trailing syntax is a caller policy decision, and name the checks to use when the result must require complete input: paused_at_incomplete_token() and get_last_error()." + } + ] + } + } + ] +} diff --git a/doc-experiment/results/round-28/codex-trials-output.json b/doc-experiment/results/round-28/codex-trials-output.json new file mode 100644 index 0000000000000..16c0f1e84e5f8 --- /dev/null +++ b/doc-experiment/results/round-28/codex-trials-output.json @@ -0,0 +1,95 @@ +{ + "subject_isolation": { + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-28/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." + }, + "result": [ + { + "id": "T03-first-h1-text", + "trial": 1, + "ok": true, + "code": "next_tag( 'H1' ) ) {\n return null;\n }\n\n $depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation, finds the first `H1` with `next_tag()`, then walks that element’s subtree with `next_token()` while the current depth stays within the `H1`. It appends only `#text` tokens via `get_modifiable_text()`, which the docs say already returns decoded text, so nested markup contributes no text of its own while nested text is included.", + "confidence": 95 + }, + { + "id": "T03-first-h1-text", + "trial": 2, + "ok": true, + "code": "next_tag( 'H1' ) ) {\n return null;\n }\n\n $h1_depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation, then finds the first `H1` with `next_tag()`. From that opener it records `get_current_depth()` and walks the subtree with `next_token()`, appending only `#text` tokens via `get_modifiable_text()`, which the docs say already returns decoded text content.", + "confidence": 92 + }, + { + "id": "T03-first-h1-text", + "trial": 3, + "ok": true, + "code": "next_tag( 'H1' ) ) {\n return null;\n }\n\n $depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because heading text extraction is a tree-aware operation. It finds the first `H1` with `next_tag()`, records that opener’s depth with `get_current_depth()`, then walks its subtree with `next_token()`, concatenating only `#text` tokens via `get_token_type()` and `get_modifiable_text()`. That yields decoded text content from nested elements while ignoring markup and non-text tokens.", + "confidence": 92 + }, + { + "id": "N06-extract-toc", + "trial": 1, + "ok": true, + "code": "next_token() ) {\n $token_name = $processor->get_token_name();\n\n if ( null !== $current_level && '#text' === $processor->get_token_type() ) {\n $current_text .= $processor->get_modifiable_text();\n continue;\n }\n\n if ( null === $token_name || 2 !== strlen( $token_name ) || 'H' !== $token_name[0] ) {\n continue;\n }\n\n $level = (int) $token_name[1];\n if ( $level < 1 || $level > 6 ) {\n continue;\n }\n\n if ( $processor->is_tag_closer() ) {\n if ( $current_level === $level ) {\n $toc[] = array(\n 'level' => $current_level,\n 'text' => $current_text,\n );\n $current_level = null;\n $current_text = '';\n }\n } else {\n $current_level = $level;\n $current_text = '';\n }\n }\n\n return $toc;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, then walks tokens with `next_token()`. On each heading opener (`H1` through `H6`) it starts collecting text, appends only `#text` token contents via `get_modifiable_text()`, and on the matching heading closer detected with `is_tag_closer()` it records the heading level and accumulated text.", + "confidence": 86 + }, + { + "id": "N06-extract-toc", + "trial": 2, + "ok": true, + "code": "next_token() ) {\n if ( null !== $current_level && $processor->get_current_depth() < $heading_depth ) {\n $toc[] = array(\n 'level' => $current_level,\n 'text' => $current_text,\n );\n\n $current_level = null;\n $current_text = '';\n $heading_depth = 0;\n }\n\n $tag_name = $processor->get_tag();\n if ( null !== $tag_name && ! $processor->is_tag_closer() && 2 === strlen( $tag_name ) && 'H' === $tag_name[0] ) {\n $level = (int) $tag_name[1];\n if ( $level >= 1 && $level <= 6 ) {\n $current_level = $level;\n $current_text = '';\n $heading_depth = $processor->get_current_depth();\n continue;\n }\n }\n\n if ( null !== $current_level && '#text' === $processor->get_token_type() ) {\n $current_text .= $processor->get_modifiable_text();\n }\n }\n\n if ( null !== $current_level ) {\n $toc[] = array(\n 'level' => $current_level,\n 'text' => $current_text,\n );\n }\n\n return $toc;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because heading text extraction is tree-aware. It walks the fragment once with `next_token()`, starts a capture when it sees an `H1` through `H6` opening tag via `get_tag()`/`is_tag_closer()`, appends only ordinary `#text` descendants with `get_token_type()` and `get_modifiable_text()`, and closes each heading capture when `get_current_depth()` drops below the heading opener’s depth.", + "confidence": 91 + }, + { + "id": "N06-extract-toc", + "trial": 3, + "ok": true, + "code": "next_tag() ) {\n $tag = $processor->get_tag();\n\n if ( null === $tag || ! preg_match( '/^H([1-6])$/', $tag, $matches ) ) {\n continue;\n }\n\n $level = (int) $matches[1];\n $depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n $toc[] = array(\n 'level' => $level,\n 'text' => $text,\n );\n }\n\n return null === $processor->get_last_error() ? $toc : array();\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because heading text extraction is tree-aware work. It scans heading openers with `next_tag()`, records each heading level from `get_tag()`, then walks that heading’s subtree with `next_token()` and a depth guard from `get_current_depth()`, appending only ordinary `#text` tokens via `get_token_type()` and `get_modifiable_text()` so nested markup contributes text but not tags.", + "confidence": 87 + }, + { + "id": "T05-text-excerpt", + "trial": 1, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n } elseif ( ! $processor->is_tag_closer() ) {\n $token_name = $processor->get_token_name();\n if ( 'TEXTAREA' === $token_name || 'TITLE' === $token_name ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n if ( mb_strlen( $text, 'UTF-8' ) >= $max_codepoints ) {\n return mb_substr( $text, 0, $max_codepoints, 'UTF-8' );\n }\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, then walks tokens with `next_token()`. It appends decoded text from ordinary `#text` tokens via `get_token_type()` and `get_modifiable_text()`, and additionally includes decoded text carried on `TITLE` and `TEXTAREA` opening-tag tokens via `get_token_name()` and `is_tag_closer()`. Truncation is done with `mb_strlen()` and `mb_substr()` in UTF-8 so the result is limited by Unicode code points without splitting multibyte characters.", + "confidence": 90 + }, + { + "id": "T05-text-excerpt", + "trial": 2, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n continue;\n }\n\n if ( '#tag' !== $processor->get_token_type() || $processor->is_tag_closer() ) {\n continue;\n }\n\n $tag = $processor->get_tag();\n if ( 'TEXTAREA' === $tag || 'TITLE' === $tag ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n if ( mb_strlen( $text, 'UTF-8' ) <= $max_codepoints ) {\n return $text;\n }\n\n return mb_substr( $text, 0, $max_codepoints, 'UTF-8' );\n}\n", + "explanation": "This walks the fragment with `WP_HTML_Processor::create_fragment()` and `next_token()`, appending ordinary `#text` token content via `get_modifiable_text()`, plus the opener-token text for the two documented special text-bearing elements that should count here: `TITLE` and `TEXTAREA`. It ignores other modifiable-text tokens such as `SCRIPT`, `STYLE`, comments, and closers, and then truncates the final UTF-8 string by Unicode code points with `mb_strlen()`/`mb_substr()` using an explicit `UTF-8` encoding.", + "confidence": 92 + }, + { + "id": "T05-text-excerpt", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n\t\t$token_type = $processor->get_token_type();\n\n\t\tif ( '#text' === $token_type ) {\n\t\t\t$text .= $processor->get_modifiable_text();\n\t\t} elseif ( '#tag' === $token_type && ! $processor->is_tag_closer() ) {\n\t\t\t$tag_name = $processor->get_tag();\n\t\t\tif ( 'TEXTAREA' === $tag_name || 'TITLE' === $tag_name ) {\n\t\t\t\t$text .= $processor->get_modifiable_text();\n\t\t\t}\n\t\t}\n\n\t\tif ( mb_strlen( $text, 'UTF-8' ) > $max_codepoints ) {\n\t\t\treturn mb_substr( $text, 0, $max_codepoints, 'UTF-8' );\n\t\t}\n\t}\n\n\tif ( mb_strlen( $text, 'UTF-8' ) > $max_codepoints ) {\n\t\treturn mb_substr( $text, 0, $max_codepoints, 'UTF-8' );\n\t}\n\n\treturn $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-style HTML fragment with tree awareness, then walks tokens via `next_token()`, appending ordinary `#text` token content and, per the documented special-element rule, the opener-token text from `TEXTAREA` and `TITLE` via `get_modifiable_text()`. It excludes other modifiable-text carriers like `SCRIPT`, `STYLE`, and comments, and truncates the final result by Unicode code points with `mb_strlen()`/`mb_substr()` using explicit `UTF-8` encoding.", + "confidence": 89 + } + ] +} diff --git a/doc-experiment/results/round-28/round-metadata.json b/doc-experiment/results/round-28/round-metadata.json new file mode 100644 index 0000000000000..6148ae5e61c37 --- /dev/null +++ b/doc-experiment/results/round-28/round-metadata.json @@ -0,0 +1,133 @@ +{ + "round": "round-28", + "mode": "shadow-doc-a/b", + "task_ids": [ + "T03-first-h1-text", + "N06-extract-toc", + "T05-text-excerpt" + ], + "task_count": 3, + "splits": { + "train": 3 + }, + "concepts": { + "text": 2, + "traversal": 1 + }, + "trials_per_task": 3, + "subject": { + "model": "gpt-5.4", + "reasoning_effort": "medium", + "service_tier": "priority" + }, + "judge": { + "model": "gpt-5.5", + "reasoning_effort": "xhigh", + "service_tier": "priority" + }, + "git_head": "3d6e1da1f606996922399d8dcd5eae3b47c9c8ad", + "git_status_short": "", + "source_file_digests": { + "ref": "3d6e1da1f606996922399d8dcd5eae3b47c9c8ad", + "algorithm": "sha256", + "php_behavior_fingerprint": "token_get_all with T_COMMENT, T_DOC_COMMENT, and T_WHITESPACE removed; JSON-encoded token names and text", + "files": { + "src/wp-includes/html-api/class-wp-html-tag-processor.php": { + "source_sha256": "9e431d345cc6d5563a65e3456b8a944fe1def8ceb8db299e89f20e8e9f560058", + "php_without_comments_sha256": "d073f5772a64826878e4132ed9bb30717fb1b6f3ddb5e23a014a4fa9db218ad7", + "php_without_comments_token_count": 9881 + }, + "src/wp-includes/html-api/class-wp-html-processor.php": { + "source_sha256": "f50dbbc343bd72dc6031ba277c1773337f5bb0762791eb8a047a691236c078d5", + "php_without_comments_sha256": "2da9aa482f295c89a7018c5b74b622b4add7e942599c97785434408cfa72c083", + "php_without_comments_token_count": 16806 + } + } + }, + "corpus_file_digests": { + "ref": "3d6e1da1f606996922399d8dcd5eae3b47c9c8ad", + "algorithm": "sha256", + "tasks": { + "T03-first-h1-text": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T03-first-h1-text/task.md": "66a882aef20a6af4430fb6a7eafc54ba306adcf982980967e07eba7881fc0030", + "doc-experiment/corpus/T03-first-h1-text/reference.php": "a64806dcd5f7f91e254c287aad72401502aec44a7661eb691b51e56e17064f2d", + "doc-experiment/corpus/T03-first-h1-text/tests.json": "499c5da8b3d916a6739142c6ed164e8a6a05c43a6c517003bcb31d6929094533" + } + }, + "N06-extract-toc": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/N06-extract-toc/task.md": "ba341a3927bb81410b8662b29f479f23a0887d4ec6f19db95702c1346d670581", + "doc-experiment/corpus/N06-extract-toc/reference.php": "7e5cf569519dc0411da5659e473fc773d82eb634886b5623610b696936c0b2e2", + "doc-experiment/corpus/N06-extract-toc/tests.json": "22f8950aace5d22a842a8c76df372d5d09e770cf4adefb06099eb81f7410317e" + } + }, + "T05-text-excerpt": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T05-text-excerpt/task.md": "5e69766456a6343f99ee24037574ed3f9bdf7795ab8be9a61203f6254b8fa2de", + "doc-experiment/corpus/T05-text-excerpt/reference.php": "1081c297e36fa566e8a6a2cf8098fe8d3117da8ac49d4106691b9bb65b3739b6", + "doc-experiment/corpus/T05-text-excerpt/tests.json": "8b2276d754c887d6246f321b2dfad0de59718bc252c0a6bed52b65ae1bad1496" + } + } + } + }, + "created_at_utc": "2026-06-13T12:25:05+00:00", + "isolation": { + "scratch_contains": [ + "html-tag-processor.md", + "html-processor.md", + "tasks/.md" + ], + "subjects_must_not_read": [ + "reference.php", + "tests.json", + "source files", + "logs", + "plans", + "hypothesis docs" + ] + }, + "scratch": "/tmp/html-api-docs-eval/round-28", + "shadow_doc_variant": { + "name": "ordinary-text-negative-example", + "control_round": "round-27", + "edited_files": [ + "html-processor.md" + ], + "notes": "Scratch-only rendered-doc variant. Replaces the broad special-element text cue near the HTML Processor DOM-style text recipe with default-first ordinary-text policy prose and a negative example; source docblocks are unchanged." + }, + "staged_task_files": [ + "tasks/T03-first-h1-text.md", + "tasks/N06-extract-toc.md", + "tasks/T05-text-excerpt.md" + ], + "scratch_isolation_check": "OK: /tmp/html-api-docs-eval/round-28 exposes 2 docs and 3 task prompt(s), with no forbidden files.", + "scratch_file_sha256": { + "html-processor.md": "d35fbe30fdfbcc3cae6ba83be8edc104a7630ad217a5ab08e817cbb6a14aabc8", + "html-tag-processor.md": "8ca4ec32aa54723fe16a0a9f45386ca6ebeecde7a06575d8a782e53fb395a664", + "tasks/N06-extract-toc.md": "ba341a3927bb81410b8662b29f479f23a0887d4ec6f19db95702c1346d670581", + "tasks/T03-first-h1-text.md": "66a882aef20a6af4430fb6a7eafc54ba306adcf982980967e07eba7881fc0030", + "tasks/T05-text-excerpt.md": "5e69766456a6343f99ee24037574ed3f9bdf7795ab8be9a61203f6254b8fa2de" + } +} diff --git a/doc-experiment/results/round-28/round-summary.json b/doc-experiment/results/round-28/round-summary.json new file mode 100644 index 0000000000000..c2c639ec3cd4b --- /dev/null +++ b/doc-experiment/results/round-28/round-summary.json @@ -0,0 +1,154 @@ +{ + "round_score": 99.5, + "core_score": 99.5, + "by_split": { + "train": 99.5 + }, + "by_concept": { + "text": 99.8, + "traversal": 98.9 + }, + "tasks": { + "T03-first-h1-text": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html", + "split": "train" + } + }, + "N06-extract-toc": { + "score": 98.9, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 95, + "score": 98.5 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 96, + "score": 98.8 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 98, + "score": 99.4 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html", + "split": "train" + } + }, + "T05-text-excerpt": { + "score": 99.6, + "trials": [ + { + "trial": "trial-1", + "passed": 10, + "total": 10, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 10, + "total": 10, + "adherence": 98, + "score": 99.4 + }, + { + "trial": "trial-3", + "passed": 10, + "total": 10, + "adherence": 98, + "score": 99.4 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html", + "split": "train" + } + } + }, + "round_metadata": { + "round": "round-28", + "mode": "shadow-doc-a/b", + "task_ids": [ + "T03-first-h1-text", + "N06-extract-toc", + "T05-text-excerpt" + ], + "task_count": 3, + "trials_per_task": 3, + "subject": { + "model": "gpt-5.4", + "reasoning_effort": "medium", + "service_tier": "priority" + }, + "judge": { + "model": "gpt-5.5", + "reasoning_effort": "xhigh", + "service_tier": "priority" + }, + "git_head": "3d6e1da1f606996922399d8dcd5eae3b47c9c8ad", + "git_status_short": "" + }, + "subject_isolation": { + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-28/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." + } +} diff --git a/doc-experiment/results/round-28/subject-isolation.json b/doc-experiment/results/round-28/subject-isolation.json new file mode 100644 index 0000000000000..b006a21906d0b --- /dev/null +++ b/doc-experiment/results/round-28/subject-isolation.json @@ -0,0 +1,19 @@ +{ + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-28/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." +} diff --git a/doc-experiment/results/round-29/N03-first-list-count/judge.json b/doc-experiment/results/round-29/N03-first-list-count/judge.json new file mode 100644 index 0000000000000..f33f6353070b0 --- /dev/null +++ b/doc-experiment/results/round-29/N03-first-list-count/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), which is the documented choice for structure-aware direct-child counting. All called methods are present in the rendered docs. The implementation follows the documented bookmark -> next_token()/depth-bounded scan -> paused_at_incomplete_token()/get_last_error() -> seek -> set_attribute() -> get_updated_html() pattern. It passed 11/11 with no _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used the HTML Processor, bookmarks, token walking, get_current_depth(), get_token_type(), and get_updated_html(). The bounded subtree loop matches the docs' >= depth guidance, and it checks incomplete/unsupported parser state before editing. All API calls are documented. It passed 11/11 with no _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correct processor choice and fully documented API use. It applies the documented structural scan pattern, counts only LI opener tokens at list_depth + 1, rejects incomplete or unsupported scans, seeks back to the opener, and reads output with get_updated_html(). It passed 11/11 with no _doing_it_wrong records." + } + ], + "failure_analysis": "All trials passed every hidden case, so there were no failed cases to attribute to documentation gaps. The docs did especially well in four places: html-tag-processor.md, \"Which processor should I use?\", clearly says the Tag Processor has no tree awareness and points structural work to WP_HTML_Processor; html-processor.md, \"Recipe: scan a region before editing its opener\", almost directly teaches the required bookmark/scan/seek/edit pattern; WP_HTML_Processor::next_token() explains virtual closers, implied structure, and the single-cursor hazard; and WP_HTML_Processor::get_current_depth() explicitly documents the >= subtree boundary and the need to check paused_at_incomplete_token() plus get_last_error(). Those passages explain why all three subjects handled omitted LI closers, nested lists, incomplete tokens inside the list, and unsupported markup inside the list. The main near-misses were documentation ambiguities that did not bite this round: next_token() still has a stale \"do not use\" history note despite being required by the public recipes, and the HTML Support wording that unsupported markup aborts when it appears in the input can be read as whole-document-global rather than encounter-scoped. The frozen cases for malformed markup after a closed list depend on the encounter-scoped behavior: a bounded scan that stops at the list closer has not seen the later bad token, so get_last_error() and paused_at_incomplete_token() remain clean.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::next_token() method docs", + "problem": "The examples and recipes present next_token() as the right public tool for structural token walks, but the Since note still says \"Added for internal support; do not use.\"", + "suggestion": "Remove or revise the stale warning so it says next_token() is supported for advanced structural walks, with the existing cautions about bounding scans and the shared cursor." + }, + { + "location": "WP_HTML_Processor::get_current_depth() and subtree-walk examples", + "problem": "The docs explain >= boundaries well, but the direct-child test is implicit: readers must infer that a direct child opener is a #tag, not a closer, at parent_depth + 1.", + "suggestion": "Add a short general contract or snippet for detecting direct child element openers: record parent depth N, then match complete tag opener tokens where get_current_depth() === N + 1; deeper tokens are descendants." + }, + { + "location": "HTML Support / get_last_error() / paused_at_incomplete_token() docs", + "problem": "The docs say unsupported or incomplete markup should be checked after a scan, but they do not clearly distinguish errors encountered within a bounded region from malformed tokens that appear later and were never scanned.", + "suggestion": "State that get_last_error() and paused_at_incomplete_token() reflect parser progress so far. For bounded-region edits, callers should decide whether they require the region to be complete or the entire remaining document to be scanned cleanly." + } + ] +} diff --git a/doc-experiment/results/round-29/N03-first-list-count/trial-1/candidate.php b/doc-experiment/results/round-29/N03-first-list-count/trial-1/candidate.php new file mode 100644 index 0000000000000..6aaf954e92277 --- /dev/null +++ b/doc-experiment/results/round-29/N03-first-list-count/trial-1/candidate.php @@ -0,0 +1,59 @@ +next_tag() ) { + $tag = $processor->get_tag(); + if ( 'UL' === $tag || 'OL' === $tag ) { + $found_list = true; + break; + } + } + + if ( ! $found_list ) { + return $html; + } + + if ( ! $processor->set_bookmark( 'first-list-opener' ) ) { + return $html; + } + + $list_depth = $processor->get_current_depth(); + $direct_li_count = 0; + $scan_completed = false; + + while ( $processor->next_token() ) { + if ( $processor->get_current_depth() < $list_depth ) { + $scan_completed = true; + break; + } + + if ( + 'LI' === $processor->get_tag() && + ! $processor->is_tag_closer() && + $processor->get_current_depth() === $list_depth + 1 + ) { + $direct_li_count++; + } + } + + if ( + ! $scan_completed || + $processor->paused_at_incomplete_token() || + null !== $processor->get_last_error() || + ! $processor->seek( 'first-list-opener' ) + ) { + $processor->release_bookmark( 'first-list-opener' ); + return $html; + } + + $processor->set_attribute( 'data-item-count', (string) $direct_li_count ); + $updated_html = $processor->get_updated_html(); + $processor->release_bookmark( 'first-list-opener' ); + + return $updated_html; +} diff --git a/doc-experiment/results/round-29/N03-first-list-count/trial-1/execution.json b/doc-experiment/results/round-29/N03-first-list-count/trial-1/execution.json new file mode 100644 index 0000000000000..9e047348d19ba --- /dev/null +++ b/doc-experiment/results/round-29/N03-first-list-count/trial-1/execution.json @@ -0,0 +1,107 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/N03-first-list-count/trial-1/candidate.php", + "function": "add_first_list_item_count", + "passed": 11, + "total": 11, + "cases": [ + { + "id": "simple-ul", + "status": "pass", + "expected": "
  • A
  • B
  • C
", + "actual": "
  • A
  • B
  • C
", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ol", + "status": "pass", + "expected": "
  1. A
  2. B
", + "actual": "
  1. A
  2. B
", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-list", + "status": "pass", + "expected": "

No list here.

", + "actual": "

No list here.

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-count-overwritten", + "status": "pass", + "expected": "
  • A
", + "actual": "
  • A
", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-li-closers", + "status": "pass", + "expected": "
  • one
  • two", + "actual": "
    • one
    • two", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-list-counts-direct-children", + "status": "pass", + "expected": "
        • x
      • y", + "actual": "
          • x
        • y", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-token-inside-list", + "status": "pass", + "expected": "
          • ", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-images", + "status": "pass", + "expected": "

            Nothing here.

            ", + "actual": "

            Nothing here.

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unquoted-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-tag-at-end", + "status": "pass", + "expected": "

            text

            text

            next_tag( 'img' ) ) { + $processor->add_class( 'wp-image' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-29/T01-add-image-class/trial-2/execution.json b/doc-experiment/results/round-29/T01-add-image-class/trial-2/execution.json new file mode 100644 index 0000000000000..9a5f38bbf27ca --- /dev/null +++ b/doc-experiment/results/round-29/T01-add-image-class/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T01-add-image-class/trial-2/candidate.php", + "function": "add_image_class", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

            ", + "actual": "

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple", + "status": "pass", + "expected": "
            ", + "actual": "
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-classes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-tag", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-images", + "status": "pass", + "expected": "

            Nothing here.

            ", + "actual": "

            Nothing here.

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unquoted-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-tag-at-end", + "status": "pass", + "expected": "

            text

            text

            next_tag( 'img' ) ) { + $processor->add_class( 'wp-image' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-29/T01-add-image-class/trial-3/execution.json b/doc-experiment/results/round-29/T01-add-image-class/trial-3/execution.json new file mode 100644 index 0000000000000..af05e19a58883 --- /dev/null +++ b/doc-experiment/results/round-29/T01-add-image-class/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T01-add-image-class/trial-3/candidate.php", + "function": "add_image_class", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

            ", + "actual": "

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple", + "status": "pass", + "expected": "
            ", + "actual": "
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-classes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-tag", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-images", + "status": "pass", + "expected": "

            Nothing here.

            ", + "actual": "

            Nothing here.

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unquoted-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-tag-at-end", + "status": "pass", + "expected": "

            text

            text

            get_attribute( $name )`, and state that truthiness is not a valid presence test because `\"\"` is a present value." + }, + { + "location": "WP_HTML_Tag_Processor::set_attribute() docblock and Modifying HTML attributes overview", + "problem": "Attribute insertion and overwrite ordering are documented in the method details, but byte-exact tasks depend heavily on the rule that existing attributes keep position while new attributes are inserted immediately after the tag name and sorted among other new attributes.", + "suggestion": "Surface the insertion-order contract in the overview with a tiny before/after example for one existing attribute update and one newly added attribute." + }, + { + "location": "WP_HTML_Tag_Processor::next_tag() docblock", + "problem": "The docs imply lexical safety, but the method-level contract could be more explicit that `next_tag()` matches real tag openers only, not markup-looking text inside comments, SCRIPT/STYLE/TITLE/TEXTAREA content, or incomplete trailing syntax.", + "suggestion": "Add a concise note under `next_tag()` describing which markup-looking sequences are skipped or paused, with cross-links to the special-element and incomplete-token sections." + } + ] +} diff --git a/doc-experiment/results/round-29/T02-link-targets/trial-1/candidate.php b/doc-experiment/results/round-29/T02-link-targets/trial-1/candidate.php new file mode 100644 index 0000000000000..b69b5917371f2 --- /dev/null +++ b/doc-experiment/results/round-29/T02-link-targets/trial-1/candidate.php @@ -0,0 +1,15 @@ +next_tag( 'A' ) ) { + if ( null === $processor->get_attribute( 'href' ) ) { + continue; + } + + $processor->set_attribute( 'target', '_blank' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-29/T02-link-targets/trial-1/execution.json b/doc-experiment/results/round-29/T02-link-targets/trial-1/execution.json new file mode 100644 index 0000000000000..8cb06407038e9 --- /dev/null +++ b/doc-experiment/results/round-29/T02-link-targets/trial-1/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T02-link-targets/trial-1/candidate.php", + "function": "add_link_targets", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "go", + "actual": "
            go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-skipped", + "status": "pass", + "expected": "staygo", + "actual": "staygo", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-target-overwritten", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup-in-link", + "status": "pass", + "expected": "bold move", + "actual": "bold move", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T02-link-targets/trial-1/response.json b/doc-experiment/results/round-29/T02-link-targets/trial-1/response.json new file mode 100644 index 0000000000000..36f7603597c28 --- /dev/null +++ b/doc-experiment/results/round-29/T02-link-targets/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat, byte-preserving attribute edit. It scans only `A` tags with `next_tag( 'A' )`, checks `get_attribute( 'href' ) !== null` so empty-string and boolean `href` forms still count as present, then overwrites or creates `target` with `set_attribute( 'target', '_blank' )` and returns the result via `get_updated_html()`.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-29/T02-link-targets/trial-2/candidate.php b/doc-experiment/results/round-29/T02-link-targets/trial-2/candidate.php new file mode 100644 index 0000000000000..84eee79d6fcf1 --- /dev/null +++ b/doc-experiment/results/round-29/T02-link-targets/trial-2/candidate.php @@ -0,0 +1,12 @@ +next_tag( 'A' ) ) { + if ( null !== $processor->get_attribute( 'href' ) ) { + $processor->set_attribute( 'target', '_blank' ); + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-29/T02-link-targets/trial-2/execution.json b/doc-experiment/results/round-29/T02-link-targets/trial-2/execution.json new file mode 100644 index 0000000000000..b08cbfb47206d --- /dev/null +++ b/doc-experiment/results/round-29/T02-link-targets/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T02-link-targets/trial-2/candidate.php", + "function": "add_link_targets", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-skipped", + "status": "pass", + "expected": "staygo", + "actual": "staygo", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-target-overwritten", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup-in-link", + "status": "pass", + "expected": "bold move", + "actual": "bold move", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T02-link-targets/trial-2/response.json b/doc-experiment/results/round-29/T02-link-targets/trial-2/response.json new file mode 100644 index 0000000000000..8dc2011ecf6fd --- /dev/null +++ b/doc-experiment/results/round-29/T02-link-targets/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat, byte-preserving attribute edit: it scans only `A` tag openers with `next_tag( 'A' )`, checks `get_attribute( 'href' )` so empty-string and boolean `href` still count as present, then overwrites or adds `target` via `set_attribute( 'target', '_blank' )`, and finally returns the result with `get_updated_html()`.", + "confidence": 96 +} diff --git a/doc-experiment/results/round-29/T02-link-targets/trial-3/candidate.php b/doc-experiment/results/round-29/T02-link-targets/trial-3/candidate.php new file mode 100644 index 0000000000000..84eee79d6fcf1 --- /dev/null +++ b/doc-experiment/results/round-29/T02-link-targets/trial-3/candidate.php @@ -0,0 +1,12 @@ +next_tag( 'A' ) ) { + if ( null !== $processor->get_attribute( 'href' ) ) { + $processor->set_attribute( 'target', '_blank' ); + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-29/T02-link-targets/trial-3/execution.json b/doc-experiment/results/round-29/T02-link-targets/trial-3/execution.json new file mode 100644 index 0000000000000..0b5e7a3963539 --- /dev/null +++ b/doc-experiment/results/round-29/T02-link-targets/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T02-link-targets/trial-3/candidate.php", + "function": "add_link_targets", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-skipped", + "status": "pass", + "expected": "staygo", + "actual": "staygo", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-target-overwritten", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup-in-link", + "status": "pass", + "expected": "bold move", + "actual": "bold move", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T02-link-targets/trial-3/response.json b/doc-experiment/results/round-29/T02-link-targets/trial-3/response.json new file mode 100644 index 0000000000000..63ea3c89e8350 --- /dev/null +++ b/doc-experiment/results/round-29/T02-link-targets/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for byte-precise attribute edits. It scans only `A` tag openers with `next_tag( 'A' )`, checks `get_attribute( 'href' )` so `href=\"\"` and boolean `href` both count as present because only a missing attribute returns `null`, then overwrites or adds `target` via `set_attribute()`, and finally returns the modified document with `get_updated_html()`.", + "confidence": 96 +} diff --git a/doc-experiment/results/round-29/T03-first-h1-text/judge.json b/doc-experiment/results/round-29/T03-first-h1-text/judge.json new file mode 100644 index 0000000000000..434ab816ec6f6 --- /dev/null +++ b/doc-experiment/results/round-29/T03-first-h1-text/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Passed 8/8. Used WP_HTML_Processor::create_fragment(), next_tag('H1'), depth-bounded next_token(), get_token_type(), and get_modifiable_text() exactly as documented for subtree text extraction. It avoided broad get_modifiable_text() use and correctly relies on decoded #text tokens and virtual closers for incomplete input." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Passed 8/8. Same API shape as the reference: correct tree-aware processor, documented methods only, idiomatic >= depth guard, and #text-only accumulation with decoded get_modifiable_text(). No _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 94, + "hallucinated_methods": [], + "notes": "Passed 8/8. All called methods are documented: create_fragment, next_tag, get_current_depth, next_token, get_token_type, get_modifiable_text, is_tag_closer, and get_token_name. The main deduction is the extra SCRIPT/STYLE/TEXTAREA/TITLE branch: the docs document this opt-in pattern, but also warn that ordinary subtree text should append only #text tokens unless the caller explicitly asks for special-element contents. For a heading-text task, this is a plausible but over-broad interpretation, especially because SCRIPT/STYLE text is raw, not decoded." + } + ], + "failure_analysis": "No hidden case failed in the frozen execution reports; all three trials passed all 8 cases. The docs did well on the core task: html-processor.md's \"Recipe: collect DOM-style text from a subtree\" gives the exact processor choice and loop shape, next_token() explains that token walks do not stop at the original matched element, get_current_depth() explains the >= guard and virtual closers, and get_modifiable_text() explains decoded #text text. The near-miss is trial 3's special-element handling. html-processor.md both says ordinary subtree text excludes special element opener text and later says special-element contents are carried on the opener token. That is accurate but easy to over-apply when a task says \"text content\" without naming whether SCRIPT/STYLE/TEXTAREA/TITLE payloads count. A read-only probe confirmed the divergence: the reference-style #text-only policy returns \"AB\" for

            AB

            , while trial 3 would return \"AD & EF & GB\".", + "doc_gaps": [ + { + "location": "/tmp/html-api-docs-eval/round-29/html-processor.md, heading \"Recipe: collect DOM-style text from a subtree\"", + "problem": "The heading says \"DOM-style text\" while the body defines a narrower default policy: ordinary #text tokens only, excluding special-element opener text. That terminology can make readers think a generic text-content request should include SCRIPT/STYLE/TEXTAREA/TITLE payloads.", + "suggestion": "Rename or clarify the recipe as ordinary subtree text extraction, and add a short policy note distinguishing ordinary human-readable subtree text from a caller-defined full textContent-like extraction. State that special-element payloads are excluded unless the caller explicitly names them." + }, + { + "location": "/tmp/html-api-docs-eval/round-29/html-processor.md, next_token() special-element exception and get_modifiable_text()", + "problem": "The docs correctly explain how to read special-element text, but the warning about raw versus decoded payloads is separated from the subtree extraction decision. This contributed to trial 3 appending SCRIPT/STYLE raw text into a decoded heading-text result.", + "suggestion": "Add a compact decision table for token inclusion: #text for ordinary extracted text; TITLE/TEXTAREA opener text only when explicitly requested and decoded; SCRIPT/STYLE opener text only for raw code/style payload extraction, not general human text." + }, + { + "location": "/tmp/html-api-docs-eval/round-29/html-processor.md, subtree walking examples", + "problem": "The examples show how to collect text once an element is found, but the no-match null versus matched-empty-string distinction is implicit. This distinction matters for extraction APIs that return null only when the target element is absent.", + "suggestion": "Add a general example note for extraction contracts: use next_tag() failure for \"not found\" and keep an initialized empty accumulator for matched elements with no #text descendants." + } + ] +} diff --git a/doc-experiment/results/round-29/T03-first-h1-text/trial-1/candidate.php b/doc-experiment/results/round-29/T03-first-h1-text/trial-1/candidate.php new file mode 100644 index 0000000000000..a65b49649f352 --- /dev/null +++ b/doc-experiment/results/round-29/T03-first-h1-text/trial-1/candidate.php @@ -0,0 +1,23 @@ +next_tag( 'H1' ) ) { + return null; + } + + $depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-29/T03-first-h1-text/trial-1/execution.json b/doc-experiment/results/round-29/T03-first-h1-text/trial-1/execution.json new file mode 100644 index 0000000000000..c96289fa4ff37 --- /dev/null +++ b/doc-experiment/results/round-29/T03-first-h1-text/trial-1/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T03-first-h1-text/trial-1/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T03-first-h1-text/trial-1/response.json b/doc-experiment/results/round-29/T03-first-h1-text/trial-1/response.json new file mode 100644 index 0000000000000..2c89e2bd65bd6 --- /dev/null +++ b/doc-experiment/results/round-29/T03-first-h1-text/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation, then finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, and walks its subtree with `next_token()`, appending only `#text` tokens via `get_modifiable_text()`. That follows the documented HTML Processor recipe for collecting ordinary subtree text, which also gives decoded character references automatically.", + "confidence": 95 +} diff --git a/doc-experiment/results/round-29/T03-first-h1-text/trial-2/candidate.php b/doc-experiment/results/round-29/T03-first-h1-text/trial-2/candidate.php new file mode 100644 index 0000000000000..a65b49649f352 --- /dev/null +++ b/doc-experiment/results/round-29/T03-first-h1-text/trial-2/candidate.php @@ -0,0 +1,23 @@ +next_tag( 'H1' ) ) { + return null; + } + + $depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-29/T03-first-h1-text/trial-2/execution.json b/doc-experiment/results/round-29/T03-first-h1-text/trial-2/execution.json new file mode 100644 index 0000000000000..111edc434c3fe --- /dev/null +++ b/doc-experiment/results/round-29/T03-first-h1-text/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T03-first-h1-text/trial-2/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T03-first-h1-text/trial-2/response.json b/doc-experiment/results/round-29/T03-first-h1-text/trial-2/response.json new file mode 100644 index 0000000000000..dccf5b71668e7 --- /dev/null +++ b/doc-experiment/results/round-29/T03-first-h1-text/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation. It finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, then walks that subtree with `next_token()`, concatenating only `#text` tokens via `get_token_type()` and `get_modifiable_text()`, which provides decoded text content.", + "confidence": 95 +} diff --git a/doc-experiment/results/round-29/T03-first-h1-text/trial-3/candidate.php b/doc-experiment/results/round-29/T03-first-h1-text/trial-3/candidate.php new file mode 100644 index 0000000000000..43fd773a06e64 --- /dev/null +++ b/doc-experiment/results/round-29/T03-first-h1-text/trial-3/candidate.php @@ -0,0 +1,40 @@ +next_tag( 'H1' ) ) { + return null; + } + + $h1_depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) { + $token_type = $processor->get_token_type(); + + if ( '#text' === $token_type ) { + $text .= $processor->get_modifiable_text(); + continue; + } + + if ( '#tag' !== $token_type || $processor->is_tag_closer() ) { + continue; + } + + $token_name = $processor->get_token_name(); + if ( + 'SCRIPT' === $token_name || + 'STYLE' === $token_name || + 'TEXTAREA' === $token_name || + 'TITLE' === $token_name + ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-29/T03-first-h1-text/trial-3/execution.json b/doc-experiment/results/round-29/T03-first-h1-text/trial-3/execution.json new file mode 100644 index 0000000000000..c9a8ea7bfc526 --- /dev/null +++ b/doc-experiment/results/round-29/T03-first-h1-text/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T03-first-h1-text/trial-3/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T03-first-h1-text/trial-3/response.json b/doc-experiment/results/round-29/T03-first-h1-text/trial-3/response.json new file mode 100644 index 0000000000000..450c26dc57281 --- /dev/null +++ b/doc-experiment/results/round-29/T03-first-h1-text/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation. It finds the first `H1` with `next_tag()`, records that element\u2019s depth with `get_current_depth()`, then walks its subtree with `next_token()`, appending decoded text from `#text` tokens via `get_modifiable_text()`. To include text carried on special-element opener tokens, it also appends `get_modifiable_text()` for nested `SCRIPT`, `STYLE`, `TEXTAREA`, and `TITLE` opening tags.", + "confidence": 84 +} diff --git a/doc-experiment/results/round-29/T04-build-figure/judge.json b/doc-experiment/results/round-29/T04-build-figure/judge.json new file mode 100644 index 0000000000000..f1e5f274a270d --- /dev/null +++ b/doc-experiment/results/round-29/T04-build-figure/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Used the correct `WP_HTML_Tag_Processor` for fixed-shape fragment construction, with only documented methods: `next_tag`, `set_attribute`, `next_token`, `get_token_type`, `set_modifiable_text`, and `get_updated_html`. It followed the documented template/placeholder pattern and preserved attribute order by seeding `src` then `alt`. Minor near-miss: it did not check `next_tag()` or `set_modifiable_text()` return values, though the controlled literal template makes that low risk." + }, + { + "trial_id": "trial-2", + "adherence": 99, + "hallucinated_methods": [], + "notes": "Same correct documented API usage as the reference, and slightly more defensive than trials 1 and 3 by guarding the `next_tag( 'img' )` call before setting attributes. It used token walking to find a `#text` token and `get_updated_html()` to read queued edits. Minor near-miss: it still did not check the boolean result of `set_modifiable_text()`, despite the docs advising that generally." + }, + { + "trial_id": "trial-3", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Used the correct Tag Processor and only documented methods. The solution closely follows the rendered docs' `Building markup from a template` pattern: seed exact markup, update existing attributes, replace placeholder text, and return `get_updated_html()`. Minor near-miss: unchecked `next_tag()` and `set_modifiable_text()` return values." + } + ], + "failure_analysis": "All three trials passed all 7 hidden cases, with no `_doing_it_wrong` or PHP errors. The docs worked well for this task. The `Which processor should I use?` guidance clearly says the Tag Processor is appropriate for flat, byte-preserving attribute edits, while the HTML Processor is for structural questions. The `Building markup from a template` section directly taught the needed pattern: start from a literal template, include attributes in the desired order, include placeholder text for later replacement, then use `set_attribute()`, token walking, `set_modifiable_text()`, and `get_updated_html()`. The `set_attribute()` docs also explicitly explain that plain unescaped values are encoded and that newly added attributes sort by name, which likely prevented attribute-order failures. The `set_modifiable_text()` docs explain that ordinary container elements do not carry text themselves and that callers need a `#text` token or placeholder, which likely prevented attempts to set text while matched on `FIGCAPTION`. Near-misses were limited to defensive style: candidates mostly copied the fixed-template examples without checking every boolean return value, but the chosen template made those calls deterministic in this task.", + "doc_gaps": [ + { + "location": "`WP_HTML_Tag_Processor::set_modifiable_text()` docblock and examples", + "problem": "The prose says to always check the return value, but the successful template-building examples make it easy to omit that check when copying the pattern.", + "suggestion": "Add a short example that captures the boolean result and handles `false`, or explicitly state that a known ordinary `#text` token in a trusted template is the narrow case where failure is unexpected." + }, + { + "location": "`WP_HTML_Tag_Processor::next_tag()` usage examples", + "problem": "Examples often call `next_tag()` directly in fixed-template code, while broader input-processing code needs to guard the `false` case because the cursor moves to the end on failure or incomplete input.", + "suggestion": "Distinguish trusted literal-template examples from arbitrary-input examples, and show guarded `next_tag()` for the latter." + }, + { + "location": "`WP_HTML_Tag_Processor::set_attribute()` docblock", + "problem": "The docs cover `true` and `false` boolean handling and attribute ordering, but the empty-string case is only implicit. Builders often need to know that `''` means an empty quoted value, not a boolean or removed attribute.", + "suggestion": "Add an explicit sentence and tiny example: passing `''` renders `name=\"\"`; passing `true` renders a boolean attribute; passing `false` removes it." + } + ] +} diff --git a/doc-experiment/results/round-29/T04-build-figure/trial-1/candidate.php b/doc-experiment/results/round-29/T04-build-figure/trial-1/candidate.php new file mode 100644 index 0000000000000..ca65d3d8c6a0c --- /dev/null +++ b/doc-experiment/results/round-29/T04-build-figure/trial-1/candidate.php @@ -0,0 +1,17 @@ +
            .
            ' ); + + $processor->next_tag( 'img' ); + $processor->set_attribute( 'src', $url ); + $processor->set_attribute( 'alt', $alt ); + + while ( $processor->next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $processor->set_modifiable_text( $caption ); + break; + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-29/T04-build-figure/trial-1/execution.json b/doc-experiment/results/round-29/T04-build-figure/trial-1/execution.json new file mode 100644 index 0000000000000..8f9055af3b414 --- /dev/null +++ b/doc-experiment/results/round-29/T04-build-figure/trial-1/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T04-build-figure/trial-1/candidate.php", + "function": "build_figure", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "
            \"A
            My dog
            ", + "actual": "
            \"A
            My dog
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ampersand-in-caption", + "status": "pass", + "expected": "
            \"Pair\"
            Fish & Chips
            ", + "actual": "
            \"Pair\"
            Fish & Chips
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "quotes-in-alt", + "status": "pass", + "expected": "
            \"The
            Caption
            ", + "actual": "
            \"The
            Caption
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "special-chars-in-url", + "status": "pass", + "expected": "
            \"Alt\"
            Caption
            ", + "actual": "
            \"Alt\"
            Caption
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "angle-brackets-in-caption", + "status": "pass", + "expected": "
            \"Code\"
            Use <em> tags & enjoy
            ", + "actual": "
            \"Code\"
            Use <em> tags & enjoy
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unicode", + "status": "pass", + "expected": "
            \"Schnée
            Winter 🌨️ scene
            ", + "actual": "
            \"Schnée
            Winter 🌨️ scene
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "html-in-caption-not-parsed", + "status": "pass", + "expected": "
            \"alt\"
            <script>alert(1)</script>
            ", + "actual": "
            \"alt\"
            <script>alert(1)</script>
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T04-build-figure/trial-1/response.json b/doc-experiment/results/round-29/T04-build-figure/trial-1/response.json new file mode 100644 index 0000000000000..2f29625f42d6a --- /dev/null +++ b/doc-experiment/results/round-29/T04-build-figure/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This builds the fragment from a fixed HTML template with `WP_HTML_Tag_Processor`, which the documentation recommends when exact markup shape and attribute order matter. It updates the existing `src` and `alt` attributes with `set_attribute()` so the API handles HTML encoding while preserving their template order, then walks tokens with `next_token()` until the `figcaption` placeholder text node and replaces it with `set_modifiable_text()`, finally returning the result with `get_updated_html()`.", + "confidence": 96 +} diff --git a/doc-experiment/results/round-29/T04-build-figure/trial-2/candidate.php b/doc-experiment/results/round-29/T04-build-figure/trial-2/candidate.php new file mode 100644 index 0000000000000..bb347f67e790b --- /dev/null +++ b/doc-experiment/results/round-29/T04-build-figure/trial-2/candidate.php @@ -0,0 +1,18 @@ +
            .
            ' ); + + if ( $processor->next_tag( 'img' ) ) { + $processor->set_attribute( 'src', $url ); + $processor->set_attribute( 'alt', $alt ); + } + + while ( $processor->next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $processor->set_modifiable_text( $caption ); + break; + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-29/T04-build-figure/trial-2/execution.json b/doc-experiment/results/round-29/T04-build-figure/trial-2/execution.json new file mode 100644 index 0000000000000..f7e06f113df24 --- /dev/null +++ b/doc-experiment/results/round-29/T04-build-figure/trial-2/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T04-build-figure/trial-2/candidate.php", + "function": "build_figure", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "
            \"A
            My dog
            ", + "actual": "
            \"A
            My dog
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ampersand-in-caption", + "status": "pass", + "expected": "
            \"Pair\"
            Fish & Chips
            ", + "actual": "
            \"Pair\"
            Fish & Chips
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "quotes-in-alt", + "status": "pass", + "expected": "
            \"The
            Caption
            ", + "actual": "
            \"The
            Caption
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "special-chars-in-url", + "status": "pass", + "expected": "
            \"Alt\"
            Caption
            ", + "actual": "
            \"Alt\"
            Caption
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "angle-brackets-in-caption", + "status": "pass", + "expected": "
            \"Code\"
            Use <em> tags & enjoy
            ", + "actual": "
            \"Code\"
            Use <em> tags & enjoy
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unicode", + "status": "pass", + "expected": "
            \"Schnée
            Winter 🌨️ scene
            ", + "actual": "
            \"Schnée
            Winter 🌨️ scene
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "html-in-caption-not-parsed", + "status": "pass", + "expected": "
            \"alt\"
            <script>alert(1)</script>
            ", + "actual": "
            \"alt\"
            <script>alert(1)</script>
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T04-build-figure/trial-2/response.json b/doc-experiment/results/round-29/T04-build-figure/trial-2/response.json new file mode 100644 index 0000000000000..97e1b7b319cd5 --- /dev/null +++ b/doc-experiment/results/round-29/T04-build-figure/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` with a fixed literal template so the output shape and `img` attribute order stay exact. It fills `src` and `alt` via `next_tag()` and `set_attribute()`, then replaces the placeholder text inside `figcaption` by scanning tokens with `next_token()` until the `#text` node and calling `set_modifiable_text()`, relying on the HTML API to perform all required escaping.", + "confidence": 96 +} diff --git a/doc-experiment/results/round-29/T04-build-figure/trial-3/candidate.php b/doc-experiment/results/round-29/T04-build-figure/trial-3/candidate.php new file mode 100644 index 0000000000000..b5d7a41b97f48 --- /dev/null +++ b/doc-experiment/results/round-29/T04-build-figure/trial-3/candidate.php @@ -0,0 +1,18 @@ +
            .
            ' ); + + $processor->next_tag( 'img' ); + $processor->set_attribute( 'src', $url ); + $processor->set_attribute( 'alt', $alt ); + + while ( $processor->next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $processor->set_modifiable_text( $caption ); + break; + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-29/T04-build-figure/trial-3/execution.json b/doc-experiment/results/round-29/T04-build-figure/trial-3/execution.json new file mode 100644 index 0000000000000..a75e0a31fda55 --- /dev/null +++ b/doc-experiment/results/round-29/T04-build-figure/trial-3/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T04-build-figure/trial-3/candidate.php", + "function": "build_figure", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "
            \"A
            My dog
            ", + "actual": "
            \"A
            My dog
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ampersand-in-caption", + "status": "pass", + "expected": "
            \"Pair\"
            Fish & Chips
            ", + "actual": "
            \"Pair\"
            Fish & Chips
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "quotes-in-alt", + "status": "pass", + "expected": "
            \"The
            Caption
            ", + "actual": "
            \"The
            Caption
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "special-chars-in-url", + "status": "pass", + "expected": "
            \"Alt\"
            Caption
            ", + "actual": "
            \"Alt\"
            Caption
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "angle-brackets-in-caption", + "status": "pass", + "expected": "
            \"Code\"
            Use <em> tags & enjoy
            ", + "actual": "
            \"Code\"
            Use <em> tags & enjoy
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unicode", + "status": "pass", + "expected": "
            \"Schnée
            Winter 🌨️ scene
            ", + "actual": "
            \"Schnée
            Winter 🌨️ scene
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "html-in-caption-not-parsed", + "status": "pass", + "expected": "
            \"alt\"
            <script>alert(1)</script>
            ", + "actual": "
            \"alt\"
            <script>alert(1)</script>
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T04-build-figure/trial-3/response.json b/doc-experiment/results/round-29/T04-build-figure/trial-3/response.json new file mode 100644 index 0000000000000..c6aca57106e3b --- /dev/null +++ b/doc-experiment/results/round-29/T04-build-figure/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` with a fixed template so the output shape and attribute order stay exact. It updates the existing `img` attributes via `next_tag()` and `set_attribute()`, then walks tokens with `next_token()` until the `figcaption` placeholder text node and replaces it with `set_modifiable_text()`, relying on the HTML API to perform all required escaping.", + "confidence": 94 +} diff --git a/doc-experiment/results/round-29/T05-text-excerpt/judge.json b/doc-experiment/results/round-29/T05-text-excerpt/judge.json new file mode 100644 index 0000000000000..8727260c44c12 --- /dev/null +++ b/doc-experiment/results/round-29/T05-text-excerpt/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the correct `WP_HTML_Processor::create_fragment()` path, walked tokens with `next_token()`, read only `#text` plus whitelisted `TITLE`/`TEXTAREA` opener text, and used documented decoded `get_modifiable_text()` semantics with UTF-8-safe truncation. Passed 10/10 cases with no `_doing_it_wrong` records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same correct processor and token-walk pattern as the reference. All processor methods used are present in the rendered docs, and the implementation correctly avoids treating all modifiable text as DOM text. Passed 10/10 cases with no `_doing_it_wrong` records." + }, + { + "trial_id": "trial-3", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Correctly chose the HTML Processor and used only documented methods. It follows the documented text-extraction pattern, including special opener text for `TITLE`/`TEXTAREA`. Minor caveat: the final `get_last_error()` fallback is a strict policy not required by the task and would differ from the reference on unsupported markup after earlier extractable text, though the method itself is documented. Passed 10/10 cases with no `_doing_it_wrong` records." + } + ], + "failure_analysis": "No failed hidden case appeared across the three trials: each candidate passed all 10 frozen expectations. The docs performed well on the central hazards for this task: they explicitly say to use `WP_HTML_Processor` rather than `WP_HTML_Tag_Processor` for DOM-style text extraction, to walk with `next_token()` when text matters, to append ordinary `#text` tokens rather than every token with modifiable text, and to opt into special-element opener text for `TITLE` and `TEXTAREA` while treating `SCRIPT` and `STYLE` separately. The `get_modifiable_text()` documentation also clearly states that `#text`, `TEXTAREA`, and `TITLE` are returned decoded and UTF-8, which explains why all candidates handled `&`, accents, and emoji correctly. The main near-miss is policy around parser aborts and incomplete input: trial 3 interpreted `get_last_error()` as a reason to discard all collected text. That is defensible from some strict-parser guidance, but the docs could better separate best-effort read-only extraction from mutation/serialization policies that must reject unsupported or truncated input.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::next_token() / Recipe: collect DOM-style text from a subtree", + "problem": "The correct full-fragment text extraction pattern requires combining several passages: processor choice, `#text` accumulation, and special-element opener text. Subjects succeeded here, but the guidance is distributed.", + "suggestion": "Add a compact general example for collecting text from a fragment that shows ordinary `#text` accumulation plus an explicit whitelist for special opener text, with a note that `SCRIPT`/`STYLE` raw text should only be included by caller policy." + }, + { + "location": "WP_HTML_Processor::get_last_error() and WP_HTML_Tag_Processor::paused_at_incomplete_token()", + "problem": "The docs mention unsupported aborts and incomplete trailing syntax, but the policy distinction is easy to over-apply to read-only extraction. `get_last_error()` does not report incomplete trailing tokens, and strict rejection is not always the desired result for best-effort scans.", + "suggestion": "Clarify that read-only scans must choose a policy: return best-effort text collected before an abort, or reject/fallback on `get_last_error()`. Separately state that incomplete trailing syntax is detected with `paused_at_incomplete_token()`, not `get_last_error()`." + }, + { + "location": "WP_HTML_Tag_Processor::get_modifiable_text() / WP_HTML_Processor::get_modifiable_text()", + "problem": "The UTF-8 note recommends `mb_strlen()`/`mb_substr()`, but it does not explicitly distinguish Unicode code points from grapheme clusters or user-perceived characters.", + "suggestion": "Add one sentence that `mb_*` with UTF-8 is suitable for code-point limits, while grapheme-aware limits require grapheme/Intl APIs. This would prevent ambiguity for emoji, variation selectors, and combining marks." + } + ] +} diff --git a/doc-experiment/results/round-29/T05-text-excerpt/trial-1/candidate.php b/doc-experiment/results/round-29/T05-text-excerpt/trial-1/candidate.php new file mode 100644 index 0000000000000..9a696d20f9d57 --- /dev/null +++ b/doc-experiment/results/round-29/T05-text-excerpt/trial-1/candidate.php @@ -0,0 +1,44 @@ +next_token() ) { + $chunk = ''; + $type = $processor->get_token_type(); + + if ( '#text' === $type ) { + $chunk = $processor->get_modifiable_text(); + } elseif ( '#tag' === $type && ! $processor->is_tag_closer() ) { + $name = $processor->get_token_name(); + if ( 'TEXTAREA' === $name || 'TITLE' === $name ) { + $chunk = $processor->get_modifiable_text(); + } + } + + if ( '' === $chunk ) { + continue; + } + + $chunk_length = mb_strlen( $chunk, 'UTF-8' ); + if ( $count + $chunk_length <= $max_codepoints ) { + $text .= $chunk; + $count += $chunk_length; + continue; + } + + $text .= mb_substr( $chunk, 0, $max_codepoints - $count, 'UTF-8' ); + return $text; + } + + return $text; +} diff --git a/doc-experiment/results/round-29/T05-text-excerpt/trial-1/execution.json b/doc-experiment/results/round-29/T05-text-excerpt/trial-1/execution.json new file mode 100644 index 0000000000000..4870d8fdd6c7b --- /dev/null +++ b/doc-experiment/results/round-29/T05-text-excerpt/trial-1/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T05-text-excerpt/trial-1/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T05-text-excerpt/trial-1/response.json b/doc-experiment/results/round-29/T05-text-excerpt/trial-1/response.json new file mode 100644 index 0000000000000..168de70697107 --- /dev/null +++ b/doc-experiment/results/round-29/T05-text-excerpt/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment with structure awareness, then walks tokens with `next_token()`, appending only ordinary `#text` tokens plus the decoded opener text of `TITLE` and `TEXTAREA` via `get_modifiable_text()`, while excluding other modifiable-text carriers like `SCRIPT`, `STYLE`, and comments. Truncation is done incrementally with `mb_strlen()` and `mb_substr()` using explicit `UTF-8` so the result is limited by Unicode code points without splitting a multi-byte character.", + "confidence": 88 +} diff --git a/doc-experiment/results/round-29/T05-text-excerpt/trial-2/candidate.php b/doc-experiment/results/round-29/T05-text-excerpt/trial-2/candidate.php new file mode 100644 index 0000000000000..2e65609916b07 --- /dev/null +++ b/doc-experiment/results/round-29/T05-text-excerpt/trial-2/candidate.php @@ -0,0 +1,47 @@ +next_token() ) { + $text = ''; + + if ( '#text' === $processor->get_token_type() ) { + $text = $processor->get_modifiable_text(); + } elseif ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) { + $token_name = $processor->get_token_name(); + if ( 'TITLE' === $token_name || 'TEXTAREA' === $token_name ) { + $text = $processor->get_modifiable_text(); + } + } + + if ( '' === $text ) { + continue; + } + + $matched = preg_match_all( '/./us', $text, $codepoints ); + if ( false === $matched || 0 === $matched ) { + continue; + } + + if ( $matched <= $remaining ) { + $excerpt .= $text; + $remaining -= $matched; + continue; + } + + $excerpt .= implode( '', array_slice( $codepoints[0], 0, $remaining ) ); + break; + } + + return $excerpt; +} diff --git a/doc-experiment/results/round-29/T05-text-excerpt/trial-2/execution.json b/doc-experiment/results/round-29/T05-text-excerpt/trial-2/execution.json new file mode 100644 index 0000000000000..f88d8454cba10 --- /dev/null +++ b/doc-experiment/results/round-29/T05-text-excerpt/trial-2/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T05-text-excerpt/trial-2/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T05-text-excerpt/trial-2/response.json b/doc-experiment/results/round-29/T05-text-excerpt/trial-2/response.json new file mode 100644 index 0000000000000..543fa6322fadb --- /dev/null +++ b/doc-experiment/results/round-29/T05-text-excerpt/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, then walks tokens with `next_token()`, appending only ordinary `#text` tokens plus the documented special text carried on opening `TITLE` and `TEXTAREA` tokens via `get_modifiable_text()`, while excluding other special elements such as `SCRIPT` and `STYLE`. Truncation is done by UTF-8 code points with `preg_match_all('/./us', ...)` so multi-byte characters are never split.", + "confidence": 91 +} diff --git a/doc-experiment/results/round-29/T05-text-excerpt/trial-3/candidate.php b/doc-experiment/results/round-29/T05-text-excerpt/trial-3/candidate.php new file mode 100644 index 0000000000000..099eeda96aee5 --- /dev/null +++ b/doc-experiment/results/round-29/T05-text-excerpt/trial-3/candidate.php @@ -0,0 +1,36 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + continue; + } + + if ( '#tag' !== $processor->get_token_type() || $processor->is_tag_closer() ) { + continue; + } + + $tag_name = $processor->get_token_name(); + if ( 'TITLE' === $tag_name || 'TEXTAREA' === $tag_name ) { + $text .= $processor->get_modifiable_text(); + } + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + return mb_substr( $text, 0, $max_codepoints, 'UTF-8' ); +} diff --git a/doc-experiment/results/round-29/T05-text-excerpt/trial-3/execution.json b/doc-experiment/results/round-29/T05-text-excerpt/trial-3/execution.json new file mode 100644 index 0000000000000..b318434f7e13e --- /dev/null +++ b/doc-experiment/results/round-29/T05-text-excerpt/trial-3/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T05-text-excerpt/trial-3/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T05-text-excerpt/trial-3/response.json b/doc-experiment/results/round-29/T05-text-excerpt/trial-3/response.json new file mode 100644 index 0000000000000..b24c4371b4151 --- /dev/null +++ b/doc-experiment/results/round-29/T05-text-excerpt/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, then walks tokens with `next_token()`, appending only ordinary `#text` token content plus the decoded opener-carried text for `TITLE` and `TEXTAREA` via `get_modifiable_text()`, while naturally excluding `SCRIPT` and `STYLE`. The final string is truncated by Unicode code points with `mb_substr(..., 'UTF-8')`.", + "confidence": 84 +} diff --git a/doc-experiment/results/round-29/T06-collect-links/judge.json b/doc-experiment/results/round-29/T06-collect-links/judge.json new file mode 100644 index 0000000000000..cc745da2ec87e --- /dev/null +++ b/doc-experiment/results/round-29/T06-collect-links/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the correct WP_HTML_Processor::create_fragment() parser, then next_tag('A') plus a depth-bounded next_token() subtree walk. All HTML API calls are documented. It correctly relied on get_attribute() string/true/null semantics, accumulated only #text tokens, and used get_modifiable_text() for decoded text." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor and a single next_token() state-machine walk, which matches the documented repeated-region pattern. All HTML API calls are documented. It finalized on A closers and also handled end-of-input defensively; href filtering and decoded text handling are correct." + }, + { + "trial_id": "trial-3", + "adherence": 99, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor and a documented token-walking approach with a small stack of active A elements. All HTML API calls are documented. It handles string-only href values and #text-only decoded text correctly. Slightly less direct than the documented closer-driven or depth-bounded recipes, but still API-adherent." + } + ], + "failure_analysis": "No hidden cases failed in any trial. The rendered docs did well on the key risks for this task: the HTML Processor overview says to choose WP_HTML_Processor when structure or text collection matters; the 'collect DOM-style text from a subtree' recipe shows a depth-bounded next_token() walk that appends only #text tokens; next_token() documents split text tokens, implicit/end-of-input closers, and the one-cursor model; get_attribute() documents string|true|null, and the Tag Processor version explicitly states decoded attribute values; get_modifiable_text() documents decoded #text output. The main near-misses are documentation locality issues rather than observed failures: decoded attribute behavior is clearer in the Tag Processor page than in the HTML Processor override, and the docs contain both a subtree inner-loop recipe and a warning against nested token walks without a crisp rule for when each pattern is appropriate.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_attribute() docblock", + "problem": "The HTML Processor override documents string|true|null and boolean attributes, but does not repeat the decoded string-value contract that appears in the Tag Processor docs.", + "suggestion": "State directly that string attribute values returned by WP_HTML_Processor::get_attribute() are already decoded, with a small href query-string example." + }, + { + "location": "WP_HTML_Processor::next_token() / subtree text recipe", + "problem": "The docs show a depth-bounded inner walk and also warn that nested next_token() walks can interfere. Readers need a clearer boundary between safe one-off subtree scans and repeated-region extraction.", + "suggestion": "Add a short note: use a depth-bounded inner walk for one matched subtree when consuming its closer is acceptable; use one single-pass state machine for repeated sibling/nested regions." + }, + { + "location": "WP_HTML_Processor::create_fragment() examples", + "problem": "The signature returns static|null, but several examples call methods on the result without showing a null guard.", + "suggestion": "Model the null check in at least the first usage example, or explicitly explain when null can be returned and how callers should handle it." + } + ] +} diff --git a/doc-experiment/results/round-29/T06-collect-links/trial-1/candidate.php b/doc-experiment/results/round-29/T06-collect-links/trial-1/candidate.php new file mode 100644 index 0000000000000..8099aab9fbed2 --- /dev/null +++ b/doc-experiment/results/round-29/T06-collect-links/trial-1/candidate.php @@ -0,0 +1,30 @@ +next_tag( 'A' ) ) { + $href = $processor->get_attribute( 'href' ); + $depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + if ( is_string( $href ) ) { + $links[] = array( + 'href' => $href, + 'text' => $text, + ); + } + } + + return $links; +} diff --git a/doc-experiment/results/round-29/T06-collect-links/trial-1/execution.json b/doc-experiment/results/round-29/T06-collect-links/trial-1/execution.json new file mode 100644 index 0000000000000..e172950585d42 --- /dev/null +++ b/doc-experiment/results/round-29/T06-collect-links/trial-1/execution.json @@ -0,0 +1,148 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T06-collect-links/trial-1/candidate.php", + "function": "collect_links", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "actual": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-excluded", + "status": "pass", + "expected": [ + { + "href": "/only", + "text": "real" + } + ], + "actual": [ + { + "href": "/only", + "text": "real" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-in-href-decoded", + "status": "pass", + "expected": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "actual": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-link-empty-text", + "status": "pass", + "expected": [ + { + "href": "/img", + "text": "" + } + ], + "actual": [ + { + "href": "/img", + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-text", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "actual": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-links", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-link", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "actual": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T06-collect-links/trial-1/response.json b/doc-experiment/results/round-29/T06-collect-links/trial-1/response.json new file mode 100644 index 0000000000000..e69cedf10a46f --- /dev/null +++ b/doc-experiment/results/round-29/T06-collect-links/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because link text extraction is tree-aware work. It scans `A` openers with `next_tag('A')`, reads `href` with `get_attribute()` and keeps only string-valued attributes, then walks that anchor\u2019s subtree with `next_token()` bounded by `get_current_depth()`, concatenating only `#text` token contents via `get_modifiable_text()` so markup contributes nothing and character references are already decoded.", + "confidence": 90 +} diff --git a/doc-experiment/results/round-29/T06-collect-links/trial-2/candidate.php b/doc-experiment/results/round-29/T06-collect-links/trial-2/candidate.php new file mode 100644 index 0000000000000..ad99eacc789df --- /dev/null +++ b/doc-experiment/results/round-29/T06-collect-links/trial-2/candidate.php @@ -0,0 +1,45 @@ +next_token() ) { + if ( 'A' === $processor->get_token_name() ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_link ) { + $links[] = $current_link; + $current_link = null; + } + continue; + } + + $href = $processor->get_attribute( 'href' ); + if ( is_string( $href ) ) { + $current_link = array( + 'href' => $href, + 'text' => '', + ); + } else { + $current_link = null; + } + + continue; + } + + if ( null !== $current_link && '#text' === $processor->get_token_type() ) { + $current_link['text'] .= $processor->get_modifiable_text(); + } + } + + if ( null !== $current_link ) { + $links[] = $current_link; + } + + return $links; +} diff --git a/doc-experiment/results/round-29/T06-collect-links/trial-2/execution.json b/doc-experiment/results/round-29/T06-collect-links/trial-2/execution.json new file mode 100644 index 0000000000000..e816e5d5abca7 --- /dev/null +++ b/doc-experiment/results/round-29/T06-collect-links/trial-2/execution.json @@ -0,0 +1,148 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T06-collect-links/trial-2/candidate.php", + "function": "collect_links", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "actual": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-excluded", + "status": "pass", + "expected": [ + { + "href": "/only", + "text": "real" + } + ], + "actual": [ + { + "href": "/only", + "text": "real" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-in-href-decoded", + "status": "pass", + "expected": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "actual": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-link-empty-text", + "status": "pass", + "expected": [ + { + "href": "/img", + "text": "" + } + ], + "actual": [ + { + "href": "/img", + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-text", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "actual": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-links", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-link", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "actual": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T06-collect-links/trial-2/response.json b/doc-experiment/results/round-29/T06-collect-links/trial-2/response.json new file mode 100644 index 0000000000000..39e7ce22f27ad --- /dev/null +++ b/doc-experiment/results/round-29/T06-collect-links/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation, then performs a single `next_token()` walk as recommended by the docs. On each `A` opener it reads `href` with `get_attribute()` and keeps only string-valued attributes, accumulates decoded text from `#text` tokens via `get_modifiable_text()`, and finalizes each link when the matching `A` closer is visited.", + "confidence": 90 +} diff --git a/doc-experiment/results/round-29/T06-collect-links/trial-3/candidate.php b/doc-experiment/results/round-29/T06-collect-links/trial-3/candidate.php new file mode 100644 index 0000000000000..6029a0e60fefd --- /dev/null +++ b/doc-experiment/results/round-29/T06-collect-links/trial-3/candidate.php @@ -0,0 +1,47 @@ +next_token() ) { + if ( 'A' === $processor->get_token_name() ) { + if ( $processor->is_tag_closer() ) { + array_pop( $open_links ); + } else { + $href = $processor->get_attribute( 'href' ); + + if ( is_string( $href ) ) { + $links[] = array( + 'href' => $href, + 'text' => '', + ); + $open_links[] = count( $links ) - 1; + } else { + $open_links[] = null; + } + } + + continue; + } + + if ( '#text' !== $processor->get_token_type() ) { + continue; + } + + for ( $i = count( $open_links ) - 1; $i >= 0; $i-- ) { + if ( null !== $open_links[ $i ] ) { + $links[ $open_links[ $i ] ]['text'] .= $processor->get_modifiable_text(); + break; + } + } + } + + return $links; +} diff --git a/doc-experiment/results/round-29/T06-collect-links/trial-3/execution.json b/doc-experiment/results/round-29/T06-collect-links/trial-3/execution.json new file mode 100644 index 0000000000000..dda5cc2898700 --- /dev/null +++ b/doc-experiment/results/round-29/T06-collect-links/trial-3/execution.json @@ -0,0 +1,148 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T06-collect-links/trial-3/candidate.php", + "function": "collect_links", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "actual": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-excluded", + "status": "pass", + "expected": [ + { + "href": "/only", + "text": "real" + } + ], + "actual": [ + { + "href": "/only", + "text": "real" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-in-href-decoded", + "status": "pass", + "expected": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "actual": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-link-empty-text", + "status": "pass", + "expected": [ + { + "href": "/img", + "text": "" + } + ], + "actual": [ + { + "href": "/img", + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-text", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "actual": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-links", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-link", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "actual": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T06-collect-links/trial-3/response.json b/doc-experiment/results/round-29/T06-collect-links/trial-3/response.json new file mode 100644 index 0000000000000..12b77cd69990e --- /dev/null +++ b/doc-experiment/results/round-29/T06-collect-links/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is tree-aware, then walks the fragment with `next_token()`. On each `A` opener it reads `href` with `get_attribute()` and includes only string-valued attributes (excluding missing and boolean `href`), and while inside a collected link it concatenates decoded `#text` token content from `get_modifiable_text()` until the corresponding closer is reached.", + "confidence": 89 +} diff --git a/doc-experiment/results/round-29/T07-nested-lists/judge.json b/doc-experiment/results/round-29/T07-nested-lists/judge.json new file mode 100644 index 0000000000000..1d2fee912e8cc --- /dev/null +++ b/doc-experiment/results/round-29/T07-nested-lists/judge.json @@ -0,0 +1,45 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), one forward next_tag() walk, get_tag(), get_breadcrumbs(), add_class(), get_last_error(), and get_updated_html(). All API calls are documented, no _doing_it_wrong records, and all hidden cases passed." + }, + { + "trial_id": "trial-2", + "adherence": 82, + "hallucinated_methods": [], + "notes": "Correctly chose WP_HTML_Processor and used only documented APIs, but used two separate next_tag() scans on the same processor: first for UL, then for OL. The first loop leaves the cursor at the end, so the second loop cannot revisit earlier OL elements. This is a cursor-walking misuse rather than hallucinated API usage." + }, + { + "trial_id": "trial-3", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Used the right processor and an idiomatic single forward walk with get_breadcrumbs(), add_class(), and get_updated_html(). All API calls are documented and all hidden cases passed. Minor edge-case gap: unlike trial 1, it does not inspect get_last_error() after the scan before returning modified output." + } + ], + "failure_analysis": "Trials 1 and 3 passed every hidden case. Trial 2 failed simple-ol-inside-ul, deep-descendant, existing-class-preserved, multiple-nested-levels, and mixed-document for the same reason: it assumed a WP_HTML_Processor could be scanned once for UL tags and then scanned again for OL tags from the beginning. In reality next_tag() advances one shared cursor; after the UL loop returns false, the processor is already at EOF, so nested OL elements are never visited. The clearest relevant passage is in html-tag-processor.md under 'Finding tags': next_tag() returning false moves the cursor to the end, and once the cursor reaches the end the processor is done unless you recreate it or use bookmarks. The HTML Processor docs do not repeat this warning in the WP_HTML_Processor::next_tag() section, even though this structural task naturally points subjects to WP_HTML_Processor. For existing-class-preserved, the failure was not a class-merging misconception: add_class() docs correctly say existing classes are preserved/appended. The add_class() call simply never happened because the OL pass never ran. Breadcrumb docs were adequate for ancestor detection: they state that get_breadcrumbs() contains the full path including the current element, and the candidates that used a single walk applied that correctly.", + "doc_gaps": [ + { + "location": "html-processor.md > WP_HTML_Processor::next_tag()", + "problem": "The method docs say it finds the next matching tag but do not explicitly state that searches are cursor-relative and do not restart after a failed search. The equivalent warning exists in the Tag Processor overview, but subjects using the HTML Processor may not transfer that rule.", + "suggestion": "Add a short method-level note: each next_tag() call starts after the current cursor position; when it returns false, the cursor is at EOF, paused on incomplete input, or aborted; a later call with a different query will not rescan earlier tags. To revisit earlier tags, set a bookmark/seek or create a new processor." + }, + { + "location": "html-processor.md > Usage or next_tag() query examples", + "problem": "The docs document a single tag_name query but do not show the idiom for matching one of several tag names. This encourages separate sequential scans for each tag type.", + "suggestion": "Add a general example for OR-style tag matching: call next_tag() with no tag_name, inspect get_tag(), and branch when the current tag is in a small allowed set. Also state that tag_name accepts one name, not an array of alternatives." + }, + { + "location": "html-processor.md > Breadcrumbs", + "problem": "The Breadcrumbs section explains exact paths and shortest suffix matching, but it lacks an explicit 'has an ancestor anywhere above the current node' pattern. That pattern is common for containment checks and differs from a direct breadcrumb query.", + "suggestion": "Add a general containment example showing get_breadcrumbs(), removing or ignoring the current element, and checking whether an ancestor tag appears in the remaining path. Clarify that breadcrumb queries express a path pattern, while arbitrary ancestor checks should inspect get_breadcrumbs()." + }, + { + "location": "html-processor.md > class mutation / inherited output methods", + "problem": "The HTML Processor page has shorter inherited add_class() documentation than the Tag Processor page, while structural tasks often use add_class() through WP_HTML_Processor. Readers may need to jump pages to learn class preservation and output behavior.", + "suggestion": "In the HTML Processor inherited add_class() and get_updated_html() docs, cross-link or inline the key guarantees: add_class() appends without removing existing classes or duplicating the same class, and get_updated_html() returns untouched bytes unchanged after queued attribute/class edits." + } + ] +} diff --git a/doc-experiment/results/round-29/T07-nested-lists/trial-1/candidate.php b/doc-experiment/results/round-29/T07-nested-lists/trial-1/candidate.php new file mode 100644 index 0000000000000..6f1c2e4a08176 --- /dev/null +++ b/doc-experiment/results/round-29/T07-nested-lists/trial-1/candidate.php @@ -0,0 +1,31 @@ +next_tag() ) { + $tag = $processor->get_tag(); + if ( 'UL' !== $tag && 'OL' !== $tag ) { + continue; + } + + $breadcrumbs = $processor->get_breadcrumbs(); + array_pop( $breadcrumbs ); + + foreach ( $breadcrumbs as $ancestor ) { + if ( 'UL' === $ancestor || 'OL' === $ancestor ) { + $processor->add_class( 'nested-list' ); + break; + } + } + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-29/T07-nested-lists/trial-1/execution.json b/doc-experiment/results/round-29/T07-nested-lists/trial-1/execution.json new file mode 100644 index 0000000000000..01226385c0bdf --- /dev/null +++ b/doc-experiment/results/round-29/T07-nested-lists/trial-1/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T07-nested-lists/trial-1/candidate.php", + "function": "mark_nested_lists", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple-ol-inside-ul", + "status": "pass", + "expected": "
            • One
              1. Nested
            ", + "actual": "
            • One
              1. Nested
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "top-level-lists-untouched", + "status": "pass", + "expected": "
            1. Top
            • Also top
            ", + "actual": "
            1. Top
            • Also top
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ul-inside-ol", + "status": "pass", + "expected": "
            1. One
              • Nested
            ", + "actual": "
            1. One
              • Nested
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "deep-descendant", + "status": "pass", + "expected": "
              1. Deep
            ", + "actual": "
              1. Deep
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class-preserved", + "status": "pass", + "expected": "
              1. Nested
            ", + "actual": "
              1. Nested
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-nested-levels", + "status": "pass", + "expected": "
            • A
              1. B
                • C
            ", + "actual": "
            • A
              1. B
                • C
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "mixed-document", + "status": "pass", + "expected": "

            intro

            • A
              1. B
            1. C
            ", + "actual": "

            intro

            • A
              1. B
            1. C
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T07-nested-lists/trial-1/response.json b/doc-experiment/results/round-29/T07-nested-lists/trial-1/response.json new file mode 100644 index 0000000000000..7c92a85339742 --- /dev/null +++ b/doc-experiment/results/round-29/T07-nested-lists/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure. It walks opening tags with `next_tag()`, filters to `UL` and `OL` via `get_tag()`, inspects ancestors with `get_breadcrumbs()`, and calls `add_class()` only when an earlier breadcrumb is also `UL` or `OL`. The result is returned with `get_updated_html()`, and the original HTML is returned unchanged if the processor reports unsupported markup via `get_last_error()`.", + "confidence": 90 +} diff --git a/doc-experiment/results/round-29/T07-nested-lists/trial-2/candidate.php b/doc-experiment/results/round-29/T07-nested-lists/trial-2/candidate.php new file mode 100644 index 0000000000000..b0c762b4aff49 --- /dev/null +++ b/doc-experiment/results/round-29/T07-nested-lists/trial-2/candidate.php @@ -0,0 +1,37 @@ +next_tag( array( 'tag_name' => 'UL' ) ) ) { + $breadcrumbs = $processor->get_breadcrumbs(); + array_pop( $breadcrumbs ); + + if ( in_array( 'UL', $breadcrumbs, true ) || in_array( 'OL', $breadcrumbs, true ) ) { + $processor->add_class( 'nested-list' ); + } + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + while ( $processor->next_tag( array( 'tag_name' => 'OL' ) ) ) { + $breadcrumbs = $processor->get_breadcrumbs(); + array_pop( $breadcrumbs ); + + if ( in_array( 'UL', $breadcrumbs, true ) || in_array( 'OL', $breadcrumbs, true ) ) { + $processor->add_class( 'nested-list' ); + } + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-29/T07-nested-lists/trial-2/execution.json b/doc-experiment/results/round-29/T07-nested-lists/trial-2/execution.json new file mode 100644 index 0000000000000..ea076d274bd9b --- /dev/null +++ b/doc-experiment/results/round-29/T07-nested-lists/trial-2/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T07-nested-lists/trial-2/candidate.php", + "function": "mark_nested_lists", + "passed": 2, + "total": 7, + "cases": [ + { + "id": "simple-ol-inside-ul", + "status": "fail", + "expected": "
            • One
              1. Nested
            ", + "actual": "
            • One
              1. Nested
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "top-level-lists-untouched", + "status": "pass", + "expected": "
            1. Top
            • Also top
            ", + "actual": "
            1. Top
            • Also top
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ul-inside-ol", + "status": "pass", + "expected": "
            1. One
              • Nested
            ", + "actual": "
            1. One
              • Nested
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "deep-descendant", + "status": "fail", + "expected": "
              1. Deep
            ", + "actual": "
              1. Deep
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class-preserved", + "status": "fail", + "expected": "
              1. Nested
            ", + "actual": "
              1. Nested
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-nested-levels", + "status": "fail", + "expected": "
            • A
              1. B
                • C
            ", + "actual": "
            • A
              1. B
                • C
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "mixed-document", + "status": "fail", + "expected": "

            intro

            • A
              1. B
            1. C
            ", + "actual": "

            intro

            • A
              1. B
            1. C
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T07-nested-lists/trial-2/response.json b/doc-experiment/results/round-29/T07-nested-lists/trial-2/response.json new file mode 100644 index 0000000000000..b8c3015c7c0c1 --- /dev/null +++ b/doc-experiment/results/round-29/T07-nested-lists/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure. It scans `UL` and `OL` openers with `next_tag()`, inspects their ancestor chain via `get_breadcrumbs()`, and when any ancestor is a `UL` or `OL` it adds the `nested-list` class with `add_class()`. The final output comes from `get_updated_html()`, and it falls back to the original HTML if the processor reports unsupported markup with `get_last_error()`.", + "confidence": 67 +} diff --git a/doc-experiment/results/round-29/T07-nested-lists/trial-3/candidate.php b/doc-experiment/results/round-29/T07-nested-lists/trial-3/candidate.php new file mode 100644 index 0000000000000..e71a6e8573f79 --- /dev/null +++ b/doc-experiment/results/round-29/T07-nested-lists/trial-3/candidate.php @@ -0,0 +1,28 @@ +next_tag() ) { + $tag = $processor->get_tag(); + + if ( 'UL' !== $tag && 'OL' !== $tag ) { + continue; + } + + $breadcrumbs = $processor->get_breadcrumbs(); + array_pop( $breadcrumbs ); + + foreach ( $breadcrumbs as $ancestor_tag ) { + if ( 'UL' === $ancestor_tag || 'OL' === $ancestor_tag ) { + $processor->add_class( 'nested-list' ); + break; + } + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-29/T07-nested-lists/trial-3/execution.json b/doc-experiment/results/round-29/T07-nested-lists/trial-3/execution.json new file mode 100644 index 0000000000000..0c9367a75e4c3 --- /dev/null +++ b/doc-experiment/results/round-29/T07-nested-lists/trial-3/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T07-nested-lists/trial-3/candidate.php", + "function": "mark_nested_lists", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple-ol-inside-ul", + "status": "pass", + "expected": "
            • One
              1. Nested
            ", + "actual": "
            • One
              1. Nested
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "top-level-lists-untouched", + "status": "pass", + "expected": "
            1. Top
            • Also top
            ", + "actual": "
            1. Top
            • Also top
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ul-inside-ol", + "status": "pass", + "expected": "
            1. One
              • Nested
            ", + "actual": "
            1. One
              • Nested
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "deep-descendant", + "status": "pass", + "expected": "
              1. Deep
            ", + "actual": "
              1. Deep
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class-preserved", + "status": "pass", + "expected": "
              1. Nested
            ", + "actual": "
              1. Nested
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-nested-levels", + "status": "pass", + "expected": "
            • A
              1. B
                • C
            ", + "actual": "
            • A
              1. B
                • C
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "mixed-document", + "status": "pass", + "expected": "

            intro

            • A
              1. B
            1. C
            ", + "actual": "

            intro

            • A
              1. B
            1. C
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T07-nested-lists/trial-3/response.json b/doc-experiment/results/round-29/T07-nested-lists/trial-3/response.json new file mode 100644 index 0000000000000..abde4598374ec --- /dev/null +++ b/doc-experiment/results/round-29/T07-nested-lists/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure. It scans tag openers with `next_tag()`, filters to `UL` and `OL` via `get_tag()`, checks their ancestor chain with `get_breadcrumbs()`, and adds the `nested-list` class with `add_class()` whenever an ancestor list is present. The final HTML comes from `get_updated_html()`, which preserves untouched bytes exactly.", + "confidence": 91 +} diff --git a/doc-experiment/results/round-29/T08-table-extract/judge.json b/doc-experiment/results/round-29/T08-table-extract/judge.json new file mode 100644 index 0000000000000..1a3b33afa272b --- /dev/null +++ b/doc-experiment/results/round-29/T08-table-extract/judge.json @@ -0,0 +1,45 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 94, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), a single depth-bounded next_token() walk, get_current_depth(), get_token_type(), get_tag(), is_tag_closer(), and get_modifiable_text(); all are documented and no _doing_it_wrong records appeared. The main adherence issue is over-applying the special-element get_modifiable_text() guidance: it would include SCRIPT/STYLE/TEXTAREA/TITLE opener text in cell output, while the ordinary subtree-text recipe says to append only #text tokens unless the caller explicitly opts into special-element contents." + }, + { + "trial_id": "trial-2", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Closest to the documented pattern and reference: correct HTML Processor choice, browser-style fragment parsing, single cursor walk, depth bound, closer-driven row/cell flushing, and decoded text via get_modifiable_text() only on #text tokens. The extra cell_depth state is unnecessary but harmless. It checks get_last_error() for unsupported-parser aborts; it does not require complete source bytes, which is reasonable for this extraction task." + }, + { + "trial_id": "trial-3", + "adherence": 91, + "hallucinated_methods": [], + "notes": "All called API methods are documented, including inherited paused_at_incomplete_token(). The structural walk is mostly idiomatic and passed all frozen cases. Deductions are for an over-broad special text-only element whitelist, which would include raw SCRIPT/STYLE and decoded TEXTAREA/TITLE contents as table cell text, and for rejecting the whole result on paused_at_incomplete_token(), even though the docs present that as a caller policy rather than a default for best-effort extraction." + } + ], + "failure_analysis": "All three trials passed all 8 frozen cases, so there were no hidden-case failures to attribute. The docs worked well on the core decision points: the Tag Processor overview says to use WP_HTML_Processor when structure, text collection, implied or missing closing tags, and browser-like parsing matter; WP_HTML_Processor::create_fragment() is clearly presented for BODY fragments; next_token() explains single-cursor token walking, implicit/virtual closers, synthesized table structure, and depth-bounded subtree walks; get_modifiable_text() explains decoded #text content, which prevented double-decoding entity text.\n\nThe near-miss was special-element text. The rendered docs include a strong ordinary subtree-text recipe saying to append only #text tokens unless another token type is explicitly desired, but the next_token() and get_modifiable_text() sections also emphasize that SCRIPT, STYLE, TITLE, and TEXTAREA carry text on opener tokens. Trial 1 and trial 3 latched onto that exception and would include those opener-token contents in table cells, diverging from the ordinary text-node policy.\n\nA second near-miss was incomplete input policy. The docs correctly explain that virtual closers make structural flushing reliable, and that paused_at_incomplete_token() should be checked when the caller must reject truncated input. Trial 3 treated that check as mandatory and would discard an otherwise extractable table for a trailing incomplete tag inside it. That is a policy misunderstanding, not an undocumented API problem.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::next_token() special-element paragraph", + "problem": "The paragraph says special elements carry text on the opener token and should be read there, but it is easy to over-apply this during ordinary text extraction despite the separate recipe warning.", + "suggestion": "Repeat the policy distinction inline: ordinary subtree text should remain #text-only; read SCRIPT/STYLE/TITLE/TEXTAREA opener text only when the caller explicitly wants those element contents, noting raw versus decoded behavior." + }, + { + "location": "WP_HTML_Processor text-extraction recipe / get_modifiable_text() docblock", + "problem": "The docs distinguish modifiable text from ordinary DOM-style text, but the distinction is spread across sections and models still treated get_modifiable_text() availability as inclusion criteria.", + "suggestion": "Add a compact decision table: token type/name, whether it is ordinary subtree text, whether get_modifiable_text() is decoded or raw, and typical inclusion policy." + }, + { + "location": "paused_at_incomplete_token() references from WP_HTML_Processor::next_token() and get_current_depth()", + "problem": "The docs say to check truncation when a result must reject incomplete input, but do not give enough contrast between best-effort extraction, strict validation, and mutation/rewrite policies.", + "suggestion": "Add examples of the three policies: best-effort extraction may return data from visited tokens; strict extraction may reject on paused_at_incomplete_token(); mutations should usually require both no truncation and null get_last_error()." + }, + { + "location": "WP_HTML_Processor table-support documentation", + "problem": "The docs mention synthesized TBODY and implied structure, which was enough here, but table insertion modes are a recurring source of mistakes for subtree walkers.", + "suggestion": "Add a general table-walking note explaining that TABLE walks may visit virtual TBODY/TR/TD-related structure and implicit closers, so code should track row/cell state from visited opener/closer tokens rather than source text or absolute depths." + } + ] +} diff --git a/doc-experiment/results/round-29/T08-table-extract/trial-1/candidate.php b/doc-experiment/results/round-29/T08-table-extract/trial-1/candidate.php new file mode 100644 index 0000000000000..17b536f331d24 --- /dev/null +++ b/doc-experiment/results/round-29/T08-table-extract/trial-1/candidate.php @@ -0,0 +1,83 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = null; + + while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) { + $token_type = $processor->get_token_type(); + + if ( '#text' === $token_type ) { + if ( null !== $current_cell ) { + $current_cell .= $processor->get_modifiable_text(); + } + continue; + } + + if ( '#tag' !== $token_type ) { + continue; + } + + $tag = $processor->get_tag(); + if ( null === $tag ) { + continue; + } + + if ( $processor->is_tag_closer() ) { + if ( ( 'TD' === $tag || 'TH' === $tag ) && null !== $current_cell && null !== $current_row ) { + $current_row[] = $current_cell; + $current_cell = null; + continue; + } + + if ( 'TR' === $tag && null !== $current_row ) { + $rows[] = $current_row; + $current_row = null; + } + + continue; + } + + if ( 'TR' === $tag ) { + $current_row = array(); + continue; + } + + if ( 'TD' === $tag || 'TH' === $tag ) { + if ( null === $current_row ) { + $current_row = array(); + } + + $current_cell = ''; + continue; + } + + if ( + null !== $current_cell && + ( 'SCRIPT' === $tag || 'STYLE' === $tag || 'TEXTAREA' === $tag || 'TITLE' === $tag ) + ) { + $current_cell .= $processor->get_modifiable_text(); + } + } + + if ( null !== $current_cell && null !== $current_row ) { + $current_row[] = $current_cell; + } + + if ( null !== $current_row ) { + $rows[] = $current_row; + } + + return $rows; +} diff --git a/doc-experiment/results/round-29/T08-table-extract/trial-1/execution.json b/doc-experiment/results/round-29/T08-table-extract/trial-1/execution.json new file mode 100644 index 0000000000000..33329b03bd71b --- /dev/null +++ b/doc-experiment/results/round-29/T08-table-extract/trial-1/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T08-table-extract/trial-1/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T08-table-extract/trial-1/response.json b/doc-experiment/results/round-29/T08-table-extract/trial-1/response.json new file mode 100644 index 0000000000000..f4ce23bb309a4 --- /dev/null +++ b/doc-experiment/results/round-29/T08-table-extract/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because table extraction is structure-aware. It finds the first `TABLE` with `next_tag()`, then performs a single depth-bounded `next_token()` walk through that table subtree, opening and closing rows/cells based on `get_tag()` and `is_tag_closer()`, and concatenating decoded text from `#text` tokens via `get_modifiable_text()`; for special text-only elements whose content lives on the opener token, it also appends their documented modifiable text.", + "confidence": 87 +} diff --git a/doc-experiment/results/round-29/T08-table-extract/trial-2/candidate.php b/doc-experiment/results/round-29/T08-table-extract/trial-2/candidate.php new file mode 100644 index 0000000000000..1939d948c4706 --- /dev/null +++ b/doc-experiment/results/round-29/T08-table-extract/trial-2/candidate.php @@ -0,0 +1,89 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = null; + $cell_depth = null; + + while ( $processor->next_token() ) { + $depth = $processor->get_current_depth(); + if ( $depth < $table_depth ) { + break; + } + + $token_type = $processor->get_token_type(); + $token_name = $processor->get_token_name(); + + if ( '#text' === $token_type ) { + if ( null !== $current_cell ) { + $current_cell .= $processor->get_modifiable_text(); + } + continue; + } + + if ( '#tag' !== $token_type || null === $token_name ) { + continue; + } + + if ( 'TR' === $token_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_row ) { + $rows[] = $current_row; + $current_row = null; + } + } else { + $current_row = array(); + } + continue; + } + + if ( 'TD' === $token_name || 'TH' === $token_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_row && null !== $current_cell ) { + $current_row[] = $current_cell; + } + $current_cell = null; + $cell_depth = null; + } else { + if ( null === $current_row ) { + $current_row = array(); + } + $current_cell = ''; + $cell_depth = $depth; + } + continue; + } + + if ( null !== $current_cell && $depth < $cell_depth ) { + if ( null !== $current_row ) { + $current_row[] = $current_cell; + } + $current_cell = null; + $cell_depth = null; + } + } + + if ( null !== $current_cell && null !== $current_row ) { + $current_row[] = $current_cell; + } + + if ( null !== $current_row ) { + $rows[] = $current_row; + } + + if ( null !== $processor->get_last_error() ) { + return array(); + } + + return $rows; +} diff --git a/doc-experiment/results/round-29/T08-table-extract/trial-2/execution.json b/doc-experiment/results/round-29/T08-table-extract/trial-2/execution.json new file mode 100644 index 0000000000000..eb1f933a2991b --- /dev/null +++ b/doc-experiment/results/round-29/T08-table-extract/trial-2/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T08-table-extract/trial-2/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T08-table-extract/trial-2/response.json b/doc-experiment/results/round-29/T08-table-extract/trial-2/response.json new file mode 100644 index 0000000000000..513ba9b64e4d7 --- /dev/null +++ b/doc-experiment/results/round-29/T08-table-extract/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with HTML5 table semantics, finds the first `TABLE` via `next_tag()`, then does a single `next_token()` walk bounded by the table depth from `get_current_depth()`. It tracks `TR` and `TD`/`TH` openers and closers, and concatenates only `#text` token contents with `get_modifiable_text()`, which already returns decoded character references.", + "confidence": 78 +} diff --git a/doc-experiment/results/round-29/T08-table-extract/trial-3/candidate.php b/doc-experiment/results/round-29/T08-table-extract/trial-3/candidate.php new file mode 100644 index 0000000000000..b8a9a462f6207 --- /dev/null +++ b/doc-experiment/results/round-29/T08-table-extract/trial-3/candidate.php @@ -0,0 +1,81 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = null; + + $text_only_elements = array( + 'IFRAME', + 'NOEMBED', + 'NOFRAME', + 'NOFRAMES', + 'NOSCRIPT', + 'SCRIPT', + 'STYLE', + 'TEXTAREA', + 'TITLE', + 'XMP', + ); + + while ( $processor->next_token() ) { + if ( $processor->get_current_depth() < $table_depth ) { + break; + } + + if ( '#text' === $processor->get_token_type() ) { + if ( null !== $current_cell ) { + $current_cell .= $processor->get_modifiable_text(); + } + continue; + } + + if ( '#tag' !== $processor->get_token_type() ) { + continue; + } + + $tag = $processor->get_tag(); + + if ( $processor->is_tag_closer() ) { + if ( ( 'TD' === $tag || 'TH' === $tag ) && null !== $current_cell && null !== $current_row ) { + $current_row[] = $current_cell; + $current_cell = null; + } elseif ( 'TR' === $tag && null !== $current_row ) { + $rows[] = $current_row; + $current_row = null; + } + + continue; + } + + if ( 'TR' === $tag ) { + $current_row = array(); + continue; + } + + if ( ( 'TD' === $tag || 'TH' === $tag ) && null !== $current_row ) { + $current_cell = ''; + continue; + } + + if ( null !== $current_cell && in_array( $tag, $text_only_elements, true ) ) { + $current_cell .= $processor->get_modifiable_text(); + } + } + + if ( null !== $processor->get_last_error() || $processor->paused_at_incomplete_token() ) { + return array(); + } + + return $rows; +} diff --git a/doc-experiment/results/round-29/T08-table-extract/trial-3/execution.json b/doc-experiment/results/round-29/T08-table-extract/trial-3/execution.json new file mode 100644 index 0000000000000..c60d0c408b1b0 --- /dev/null +++ b/doc-experiment/results/round-29/T08-table-extract/trial-3/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T08-table-extract/trial-3/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T08-table-extract/trial-3/response.json b/doc-experiment/results/round-29/T08-table-extract/trial-3/response.json new file mode 100644 index 0000000000000..803e4e190c7d0 --- /dev/null +++ b/doc-experiment/results/round-29/T08-table-extract/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the fragment with browser-style table handling, `next_tag()` to find the first `TABLE`, and then a bounded `next_token()` walk using `get_current_depth()` so only that table subtree is visited. Row and cell boundaries come from `TR`/`TD`/`TH` tag openers and closers, ordinary text is collected from `#text` tokens via `get_modifiable_text()`, and text-only element contents are appended from their opener tokens because the HTML Processor carries those contents on the element token rather than separate text nodes.", + "confidence": 84 +} diff --git a/doc-experiment/results/round-29/T09-mark-keyword/judge.json b/doc-experiment/results/round-29/T09-mark-keyword/judge.json new file mode 100644 index 0000000000000..3267b2f273c2c --- /dev/null +++ b/doc-experiment/results/round-29/T09-mark-keyword/judge.json @@ -0,0 +1,45 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Passed 8/8. Correctly chose WP_HTML_Processor::create_fragment() and walked tokens with next_token(), get_token_type(), get_modifiable_text(), and serialize_token(). The extra WP_HTML_Tag_Processor template for '' is documented and safe, but less direct than serializing the matched token inside fixed wrapper markup. Small edge-policy penalty for returning raw input on create_fragment()/get_last_error() failure, which would not be normalized." + }, + { + "trial_id": "trial-2", + "adherence": 99, + "hallucinated_methods": [], + "notes": "Passed 8/8. Uses the documented, idiomatic pattern almost exactly: BODY fragment processor, #text-only token walk, decoded get_modifiable_text() matching, and accumulated serialize_token() output. WP_HTML_Processor::normalize() is documented; its use is confined to the error fallback. Minor penalty only for redundant get_modifiable_text() calls and a slightly muddy error fallback policy." + }, + { + "trial_id": "trial-3", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Passed 8/8. Correct processor choice and clean token-by-token serialization with only ordinary #text nodes checked, which handles decoded entities, comments, attributes, split text, and special text-bearing elements appropriately. Small penalty for returning raw input on parser creation/error fallback, which conflicts with a normalized-output contract if unsupported input is encountered." + } + ], + "failure_analysis": "All trials passed every hidden case, so there are no failed cases to attribute to a misconception. The docs did well on the core decision points: html-processor.md explains under processor choice/create_fragment() that BODY fragments and normalized output call for WP_HTML_Processor; next_token(), get_token_type(), and get_modifiable_text() distinguish ordinary #text from comments and special element text; get_modifiable_text() states that #text is already decoded; and serialize_token() explicitly says concatenating walked tokens reconstructs normalized serialization and can be used for rewrite loops. Those passages directly supported the entity-encoded keyword, comment, attribute, split-across-elements, unclosed-tag, and normalization cases. Near-misses were in fallback behavior: the three candidates chose different parser-error policies, and two returned raw input, suggesting the docs still leave room for confusion about normalized-output fallbacks after get_last_error() or create_fragment() returning null.", + "doc_gaps": [ + { + "location": "html-processor.md: serialize_token() and the token-by-token rewrite overview", + "problem": "The docs say callers may emit extra markup around selected tokens, but the examples do not show a minimal normalized rewrite that inserts fixed literal markup while using serialize_token() for the original token.", + "suggestion": "Add a general rewrite example showing fixed markup inserted before/after a selected token and state that the accumulated string is the normalized output; get_updated_html() is for queued edits, not for reading a token-walk rewrite." + }, + { + "location": "html-processor.md: get_last_error(), serialize_token(), and paused_at_incomplete_token guidance", + "problem": "Candidates used inconsistent fallback policies after parser errors, including returning raw input, which is not normalized.", + "suggestion": "Add a short policy note: for normalized-output functions, raw input is not a normalized fallback; unsupported parser aborts should return an explicit failure/default value or a separately defined fallback, while incomplete trailing syntax can be accepted or rejected according to caller policy." + }, + { + "location": "html-processor.md: create_fragment() return value", + "problem": "The static|null return type is documented, but the docs do not clearly enumerate when null is expected for the default BODY context or what transformation functions should return when construction fails.", + "suggestion": "Document the likely null cases and recommend a consistent handling pattern for BODY-fragment transformations that need normalized output." + }, + { + "location": "html-tag-processor.md: Building markup from a template / get_updated_html()", + "problem": "The template-building pattern is useful, but when combined with HTML Processor rewrites it can obscure that get_updated_html() preserves untouched bytes and does not normalize an arbitrary input document.", + "suggestion": "Cross-link this section to HTML Processor serialization guidance and explicitly distinguish standalone generated templates from normalized whole-fragment serialization." + } + ] +} diff --git a/doc-experiment/results/round-29/T09-mark-keyword/trial-1/candidate.php b/doc-experiment/results/round-29/T09-mark-keyword/trial-1/candidate.php new file mode 100644 index 0000000000000..5c7528976371b --- /dev/null +++ b/doc-experiment/results/round-29/T09-mark-keyword/trial-1/candidate.php @@ -0,0 +1,36 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text = $processor->get_modifiable_text(); + + if ( str_contains( $text, $keyword ) ) { + $wrapper = new WP_HTML_Tag_Processor( '.' ); + while ( $wrapper->next_token() ) { + if ( '#text' === $wrapper->get_token_type() ) { + $wrapper->set_modifiable_text( $text ); + break; + } + } + + $output .= $wrapper->get_updated_html(); + continue; + } + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + return $output; +} diff --git a/doc-experiment/results/round-29/T09-mark-keyword/trial-1/execution.json b/doc-experiment/results/round-29/T09-mark-keyword/trial-1/execution.json new file mode 100644 index 0000000000000..e33f61cdabd8f --- /dev/null +++ b/doc-experiment/results/round-29/T09-mark-keyword/trial-1/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T09-mark-keyword/trial-1/candidate.php", + "function": "mark_keyword", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple-unclosed", + "status": "pass", + "expected": "

            hello world

            ", + "actual": "

            hello world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-text-nodes", + "status": "pass", + "expected": "

            alpha beta

            beta gamma

            delta

            ", + "actual": "

            alpha beta

            beta gamma

            delta

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-attribute-not-wrapped", + "status": "pass", + "expected": "somewhere world", + "actual": "somewhere world", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-encoded-keyword-matches", + "status": "pass", + "expected": "

            world peace

            ", + "actual": "

            world peace

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "split-across-elements-no-match", + "status": "pass", + "expected": "

            world

            ", + "actual": "

            world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-comment-not-wrapped", + "status": "pass", + "expected": "

            world

            ", + "actual": "

            world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive", + "status": "pass", + "expected": "

            World world

            ", + "actual": "

            World world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "normalization-side-effects", + "status": "pass", + "expected": "
            bold world

            unclosed & markup

            ", + "actual": "
            bold world

            unclosed & markup

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T09-mark-keyword/trial-1/response.json b/doc-experiment/results/round-29/T09-mark-keyword/trial-1/response.json new file mode 100644 index 0000000000000..7ac639720dc25 --- /dev/null +++ b/doc-experiment/results/round-29/T09-mark-keyword/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This walks the fragment with `WP_HTML_Processor::create_fragment()` and `next_token()`, rewrites only ordinary `#text` tokens whose decoded `get_modifiable_text()` contains the case-sensitive keyword, and otherwise appends each token\u2019s normalized `serialize_token()` output. Matching text nodes are replaced by building a normalized `` template with `WP_HTML_Tag_Processor`, updating its text via `set_modifiable_text()`, and reading it back with `get_updated_html()` so the wrapped text is safely re-encoded.", + "confidence": 85 +} diff --git a/doc-experiment/results/round-29/T09-mark-keyword/trial-2/candidate.php b/doc-experiment/results/round-29/T09-mark-keyword/trial-2/candidate.php new file mode 100644 index 0000000000000..65db52940c90a --- /dev/null +++ b/doc-experiment/results/round-29/T09-mark-keyword/trial-2/candidate.php @@ -0,0 +1,30 @@ +next_token() ) { + if ( + '#text' === $processor->get_token_type() && + '' !== $processor->get_modifiable_text() && + false !== strpos( $processor->get_modifiable_text(), $keyword ) + ) { + $output .= '' . $processor->serialize_token() . ''; + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + $normalized = WP_HTML_Processor::normalize( $html ); + return null === $normalized ? '' : $normalized; + } + + return $output; +} diff --git a/doc-experiment/results/round-29/T09-mark-keyword/trial-2/execution.json b/doc-experiment/results/round-29/T09-mark-keyword/trial-2/execution.json new file mode 100644 index 0000000000000..5472d5157089f --- /dev/null +++ b/doc-experiment/results/round-29/T09-mark-keyword/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T09-mark-keyword/trial-2/candidate.php", + "function": "mark_keyword", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple-unclosed", + "status": "pass", + "expected": "

            hello world

            ", + "actual": "

            hello world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-text-nodes", + "status": "pass", + "expected": "

            alpha beta

            beta gamma

            delta

            ", + "actual": "

            alpha beta

            beta gamma

            delta

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-attribute-not-wrapped", + "status": "pass", + "expected": "somewhere world", + "actual": "somewhere world", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-encoded-keyword-matches", + "status": "pass", + "expected": "

            world peace

            ", + "actual": "

            world peace

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "split-across-elements-no-match", + "status": "pass", + "expected": "

            world

            ", + "actual": "

            world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-comment-not-wrapped", + "status": "pass", + "expected": "

            world

            ", + "actual": "

            world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive", + "status": "pass", + "expected": "

            World world

            ", + "actual": "

            World world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "normalization-side-effects", + "status": "pass", + "expected": "
            bold world

            unclosed & markup

            ", + "actual": "
            bold world

            unclosed & markup

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T09-mark-keyword/trial-2/response.json b/doc-experiment/results/round-29/T09-mark-keyword/trial-2/response.json new file mode 100644 index 0000000000000..891d23e9abe3f --- /dev/null +++ b/doc-experiment/results/round-29/T09-mark-keyword/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-context fragment structurally, walks every token with `next_token()`, checks only ordinary `#text` tokens via `get_token_type()` and decoded text from `get_modifiable_text()`, and rebuilds normalized HTML with `serialize_token()`, inserting `...` around any matching text token. Special text-bearing elements like `script`, `style`, `title`, and `textarea` are naturally skipped because their contents are not exposed as `#text` child tokens in the HTML Processor.", + "confidence": 86 +} diff --git a/doc-experiment/results/round-29/T09-mark-keyword/trial-3/candidate.php b/doc-experiment/results/round-29/T09-mark-keyword/trial-3/candidate.php new file mode 100644 index 0000000000000..537b55a6cb09c --- /dev/null +++ b/doc-experiment/results/round-29/T09-mark-keyword/trial-3/candidate.php @@ -0,0 +1,30 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text = $processor->get_modifiable_text(); + + if ( false !== strpos( $text, $keyword ) ) { + $output .= '' . $processor->serialize_token() . ''; + continue; + } + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + return $output; +} diff --git a/doc-experiment/results/round-29/T09-mark-keyword/trial-3/execution.json b/doc-experiment/results/round-29/T09-mark-keyword/trial-3/execution.json new file mode 100644 index 0000000000000..b7675828d668a --- /dev/null +++ b/doc-experiment/results/round-29/T09-mark-keyword/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T09-mark-keyword/trial-3/candidate.php", + "function": "mark_keyword", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple-unclosed", + "status": "pass", + "expected": "

            hello world

            ", + "actual": "

            hello world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-text-nodes", + "status": "pass", + "expected": "

            alpha beta

            beta gamma

            delta

            ", + "actual": "

            alpha beta

            beta gamma

            delta

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-attribute-not-wrapped", + "status": "pass", + "expected": "somewhere world", + "actual": "somewhere world", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-encoded-keyword-matches", + "status": "pass", + "expected": "

            world peace

            ", + "actual": "

            world peace

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "split-across-elements-no-match", + "status": "pass", + "expected": "

            world

            ", + "actual": "

            world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-comment-not-wrapped", + "status": "pass", + "expected": "

            world

            ", + "actual": "

            world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive", + "status": "pass", + "expected": "

            World world

            ", + "actual": "

            World world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "normalization-side-effects", + "status": "pass", + "expected": "
            bold world

            unclosed & markup

            ", + "actual": "
            bold world

            unclosed & markup

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T09-mark-keyword/trial-3/response.json b/doc-experiment/results/round-29/T09-mark-keyword/trial-3/response.json new file mode 100644 index 0000000000000..e877e39123123 --- /dev/null +++ b/doc-experiment/results/round-29/T09-mark-keyword/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-context fragment, then walks every token with `next_token()`. For ordinary `#text` tokens only, it checks the decoded text via `get_modifiable_text()` and, when the case-sensitive keyword is present, emits a `` wrapper around that token\u2019s normalized serialization from `serialize_token()`. All other tokens are emitted unchanged with `serialize_token()`, which preserves the processor\u2019s normalized output behavior and naturally excludes special text-bearing elements like `script`, `style`, `title`, and `textarea` because their contents are not exposed as ordinary `#text` tokens.", + "confidence": 86 +} diff --git a/doc-experiment/results/round-29/T10-last-h2/judge.json b/doc-experiment/results/round-29/T10-last-h2/judge.json new file mode 100644 index 0000000000000..e41fd4b8f5c69 --- /dev/null +++ b/doc-experiment/results/round-29/T10-last-h2/judge.json @@ -0,0 +1,30 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the correct `WP_HTML_Tag_Processor` for a flat class edit. Every API call is documented: constructor, `next_tag`, `set_bookmark`, `seek`, `add_class`, `release_bookmark`, and `get_updated_html`. The implementation uses the documented last-match bookmark idiom, preserves existing classes via `add_class`, returns unchanged HTML when no H2 exists, and execution passed 6/6 with no `_doing_it_wrong` records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the correct tag processor and only documented APIs, including `has_bookmark` and `release_bookmark`. It walks all `H2` tags, repeatedly moves one bookmark, seeks back to the final opener, adds the class, and returns `get_updated_html`. Handles no-match and existing-class cases idiomatically; execution passed 6/6 with no misuse records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same high-adherence pattern as trial 2: correct processor, documented APIs only, literal bookmark reused to remember the final `H2`, `seek` before `add_class`, and `get_updated_html` for output. Edge cases covered by the chosen API behavior; execution passed 6/6 with no `_doing_it_wrong` records." + } + ], + "failure_analysis": "All trials passed every frozen case: `two-headings`, `single-heading`, `no-headings-unchanged`, `many-headings`, `comment-h2-not-counted`, and `existing-class`. There are no failed hidden cases to attribute to a misconception. The docs did well in the key places: `Which processor should I use?` clearly points flat class edits to `WP_HTML_Tag_Processor`; `Finding tags` documents `next_tag( 'H2' )`; `Bookmarks` and `WP_HTML_Tag_Processor::set_bookmark()` explicitly describe re-setting one bookmark to remember the last matching token; `add_class()` documents safe class addition without manual class parsing; and `get_updated_html()` explains how to emit the edited original markup. The main near-miss is incomplete input: the docs mention `next_tag()` returning false for both no match and incomplete syntax, but the successful candidates did not need to make a clean-EOF policy decision for this task.", + "doc_gaps": [ + { + "location": "`WP_HTML_Tag_Processor::set_bookmark()` / Bookmarks recipe", + "problem": "The last-match bookmark idiom is documented, but it is not paired directly with the `next_tag()` false-result ambiguity caused by incomplete trailing syntax.", + "suggestion": "Add a cross-reference note after the bookmark-reuse recipe: after a scan ends, callers that require proof of a complete input should check `paused_at_incomplete_token()` before seeking back and applying an edit; callers that only need the last complete token may safely use the bookmark." + } + ] +} diff --git a/doc-experiment/results/round-29/T10-last-h2/trial-1/candidate.php b/doc-experiment/results/round-29/T10-last-h2/trial-1/candidate.php new file mode 100644 index 0000000000000..3d84be60774d6 --- /dev/null +++ b/doc-experiment/results/round-29/T10-last-h2/trial-1/candidate.php @@ -0,0 +1,22 @@ +next_tag( 'H2' ) ) { + $processor->set_bookmark( 'last-h2' ); + $found_h2 = true; + } + + if ( ! $found_h2 ) { + return $html; + } + + if ( $processor->seek( 'last-h2' ) ) { + $processor->add_class( 'final-section' ); + } + + $processor->release_bookmark( 'last-h2' ); + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-29/T10-last-h2/trial-1/execution.json b/doc-experiment/results/round-29/T10-last-h2/trial-1/execution.json new file mode 100644 index 0000000000000..9b5fc03221bab --- /dev/null +++ b/doc-experiment/results/round-29/T10-last-h2/trial-1/execution.json @@ -0,0 +1,62 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T10-last-h2/trial-1/candidate.php", + "function": "mark_last_h2", + "passed": 6, + "total": 6, + "cases": [ + { + "id": "two-headings", + "status": "pass", + "expected": "

            One

            a

            Two

            b

            ", + "actual": "

            One

            a

            Two

            b

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "single-heading", + "status": "pass", + "expected": "

            Only

            ", + "actual": "

            Only

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-headings-unchanged", + "status": "pass", + "expected": "

            nothing

            ", + "actual": "

            nothing

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "many-headings", + "status": "pass", + "expected": "

            1

            2

            3

            4

            5

            6

            7

            8

            9

            10

            11

            12

            ", + "actual": "

            1

            2

            3

            4

            5

            6

            7

            8

            9

            10

            11

            12

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comment-h2-not-counted", + "status": "pass", + "expected": "

            Real

            ", + "actual": "

            Real

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class", + "status": "pass", + "expected": "

            A

            B

            ", + "actual": "

            A

            B

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T10-last-h2/trial-1/response.json b/doc-experiment/results/round-29/T10-last-h2/trial-1/response.json new file mode 100644 index 0000000000000..1ba71764177ae --- /dev/null +++ b/doc-experiment/results/round-29/T10-last-h2/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat, position-based scan. It walks forward with `next_tag( 'H2' )`, reuses a single bookmark via `set_bookmark()` to remember the most recent `H2`, then `seek()`s back to that bookmarked opener, adds the `final-section` class with `add_class()`, and returns the modified markup with `get_updated_html()`.", + "confidence": 96 +} diff --git a/doc-experiment/results/round-29/T10-last-h2/trial-2/candidate.php b/doc-experiment/results/round-29/T10-last-h2/trial-2/candidate.php new file mode 100644 index 0000000000000..f9c9cd1d41b97 --- /dev/null +++ b/doc-experiment/results/round-29/T10-last-h2/trial-2/candidate.php @@ -0,0 +1,20 @@ +next_tag( 'H2' ) ) { + $processor->set_bookmark( 'last-h2' ); + } + + if ( ! $processor->has_bookmark( 'last-h2' ) ) { + return $html; + } + + if ( $processor->seek( 'last-h2' ) ) { + $processor->add_class( 'final-section' ); + } + + $processor->release_bookmark( 'last-h2' ); + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-29/T10-last-h2/trial-2/execution.json b/doc-experiment/results/round-29/T10-last-h2/trial-2/execution.json new file mode 100644 index 0000000000000..fbebb87d1511d --- /dev/null +++ b/doc-experiment/results/round-29/T10-last-h2/trial-2/execution.json @@ -0,0 +1,62 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T10-last-h2/trial-2/candidate.php", + "function": "mark_last_h2", + "passed": 6, + "total": 6, + "cases": [ + { + "id": "two-headings", + "status": "pass", + "expected": "

            One

            a

            Two

            b

            ", + "actual": "

            One

            a

            Two

            b

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "single-heading", + "status": "pass", + "expected": "

            Only

            ", + "actual": "

            Only

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-headings-unchanged", + "status": "pass", + "expected": "

            nothing

            ", + "actual": "

            nothing

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "many-headings", + "status": "pass", + "expected": "

            1

            2

            3

            4

            5

            6

            7

            8

            9

            10

            11

            12

            ", + "actual": "

            1

            2

            3

            4

            5

            6

            7

            8

            9

            10

            11

            12

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comment-h2-not-counted", + "status": "pass", + "expected": "

            Real

            ", + "actual": "

            Real

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class", + "status": "pass", + "expected": "

            A

            B

            ", + "actual": "

            A

            B

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T10-last-h2/trial-2/response.json b/doc-experiment/results/round-29/T10-last-h2/trial-2/response.json new file mode 100644 index 0000000000000..7e2c2c718befa --- /dev/null +++ b/doc-experiment/results/round-29/T10-last-h2/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat, position-based edit. It scans forward with `next_tag( 'H2' )`, repeatedly moves the same bookmark with `set_bookmark()` to remember the last matched `H2`, then `seek()`s back to that bookmark, applies `add_class( 'final-section' )`, and returns the result with `get_updated_html()`.", + "confidence": 95 +} diff --git a/doc-experiment/results/round-29/T10-last-h2/trial-3/candidate.php b/doc-experiment/results/round-29/T10-last-h2/trial-3/candidate.php new file mode 100644 index 0000000000000..c9ecdfdb5be16 --- /dev/null +++ b/doc-experiment/results/round-29/T10-last-h2/trial-3/candidate.php @@ -0,0 +1,21 @@ +next_tag( 'H2' ) ) { + $processor->set_bookmark( 'last-h2' ); + } + + if ( ! $processor->has_bookmark( 'last-h2' ) ) { + return $html; + } + + if ( $processor->seek( 'last-h2' ) ) { + $processor->add_class( 'final-section' ); + } + + $processor->release_bookmark( 'last-h2' ); + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-29/T10-last-h2/trial-3/execution.json b/doc-experiment/results/round-29/T10-last-h2/trial-3/execution.json new file mode 100644 index 0000000000000..d144d9db9b039 --- /dev/null +++ b/doc-experiment/results/round-29/T10-last-h2/trial-3/execution.json @@ -0,0 +1,62 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T10-last-h2/trial-3/candidate.php", + "function": "mark_last_h2", + "passed": 6, + "total": 6, + "cases": [ + { + "id": "two-headings", + "status": "pass", + "expected": "

            One

            a

            Two

            b

            ", + "actual": "

            One

            a

            Two

            b

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "single-heading", + "status": "pass", + "expected": "

            Only

            ", + "actual": "

            Only

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-headings-unchanged", + "status": "pass", + "expected": "

            nothing

            ", + "actual": "

            nothing

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "many-headings", + "status": "pass", + "expected": "

            1

            2

            3

            4

            5

            6

            7

            8

            9

            10

            11

            12

            ", + "actual": "

            1

            2

            3

            4

            5

            6

            7

            8

            9

            10

            11

            12

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comment-h2-not-counted", + "status": "pass", + "expected": "

            Real

            ", + "actual": "

            Real

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class", + "status": "pass", + "expected": "

            A

            B

            ", + "actual": "

            A

            B

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T10-last-h2/trial-3/response.json b/doc-experiment/results/round-29/T10-last-h2/trial-3/response.json new file mode 100644 index 0000000000000..4dccae874ce10 --- /dev/null +++ b/doc-experiment/results/round-29/T10-last-h2/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a single linear scan, re-setting one bookmark on each matched `H2` so the bookmark ends up at the last `H2` opener. It then `seek()`s back to that bookmarked tag, adds the `final-section` class with `add_class()`, and returns the edited markup via `get_updated_html()`.", + "confidence": 96 +} diff --git a/doc-experiment/results/round-29/T11-strip-tracking-attributes/judge.json b/doc-experiment/results/round-29/T11-strip-tracking-attributes/judge.json new file mode 100644 index 0000000000000..76d8666c88ba1 --- /dev/null +++ b/doc-experiment/results/round-29/T11-strip-tracking-attributes/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Tag_Processor, next_tag(), get_attribute_names_with_prefix(), remove_attribute(), and get_updated_html(), all documented. The approach matches the docs' flat attribute-edit pattern and handles case-insensitive attribute names, comments, no-match attributes, and byte-preserving output correctly." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same documented Tag Processor approach as the reference. No unsupported API use or _doing_it_wrong records. Correctly relies on the prefix helper rather than manual attribute parsing or normalization." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same documented, idiomatic implementation as trial 2. It uses the right processor for a flat attribute rewrite and returns queued edits with get_updated_html()." + } + ], + "failure_analysis": "All three trials passed all 7 hidden cases, so there are no failed hidden cases to attribute to documentation failures. The docs did well in the key places: the Tag Processor overview says to use this class for flat attribute/class edits with byte-precise preservation; next_tag() documents linear walking, real-tag-only matching, comments/rawtext exclusion, and incomplete-token behavior; get_attribute_names_with_prefix() documents lowercase returned names and case-insensitive prefix matching; remove_attribute() and get_updated_html() document the edit-and-return workflow. Near miss: candidates all guarded against null from get_attribute_names_with_prefix(), which is correct after the scan ends, but the docs do not explicitly state that a matched tag with no matching attributes returns an empty array rather than null. That gap did not cause failures here.", + "doc_gaps": [ + { + "location": "/tmp/html-api-docs-eval/round-29/html-tag-processor.md#get_attribute_names_with_prefix", + "problem": "The return contract distinguishes array|null, but only the no-current-tag null case is shown. It does not explicitly state the matched-tag/no-prefix-match case returns an empty array.", + "suggestion": "Add a short return-value table: matched tag with matches returns lowercase attribute names; matched tag with no matches returns array(); no matched tag opener returns null." + }, + { + "location": "/tmp/html-api-docs-eval/round-29/html-tag-processor.md#remove_attribute", + "problem": "The method docblock does not prominently state that attribute targeting is ASCII case-insensitive, even though this matters when callers pass normalized names returned from get_attribute_names_with_prefix() to remove attributes written with different casing.", + "suggestion": "Add a sentence that remove_attribute() matches attribute names case-insensitively in HTML and can safely consume names returned by get_attribute_names_with_prefix()." + }, + { + "location": "/tmp/html-api-docs-eval/round-29/html-tag-processor.md#modifying-html-attributes-for-a-found-tag", + "problem": "The overview shows removing one known attribute, but does not show the general pattern for bulk operations over discovered attribute names.", + "suggestion": "Add a generic recipe for enumerating attribute names from a read API, applying set/remove operations to that snapshot, and returning get_updated_html(), emphasizing that callers should not parse tag text manually." + } + ] +} diff --git a/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-1/candidate.php b/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-1/candidate.php new file mode 100644 index 0000000000000..b7b887dfc400c --- /dev/null +++ b/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-1/candidate.php @@ -0,0 +1,19 @@ +next_tag() ) { + $attribute_names = $processor->get_attribute_names_with_prefix( 'data-track-' ); + + if ( null === $attribute_names ) { + continue; + } + + foreach ( $attribute_names as $attribute_name ) { + $processor->remove_attribute( $attribute_name ); + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-1/execution.json b/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-1/execution.json new file mode 100644 index 0000000000000..ecd2aacdc8776 --- /dev/null +++ b/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-1/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-1/candidate.php", + "function": "strip_tracking_attributes", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "single-link", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-tags", + "status": "pass", + "expected": "

            Text

            ", + "actual": "

            Text

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-matching-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "similar-prefixes-kept", + "status": "pass", + "expected": "x", + "actual": "x", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-source-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comments-untouched", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-matches", + "status": "pass", + "expected": "

            Text

            ", + "actual": "

            Text

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-1/response.json b/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-1/response.json new file mode 100644 index 0000000000000..dfa777f2752ba --- /dev/null +++ b/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat attribute edit on individual tag openers. It scans each tag with `next_tag()`, gets matching attribute names via `get_attribute_names_with_prefix( 'data-track-' )`, removes each one with `remove_attribute()`, and returns the edited markup with `get_updated_html()`.", + "confidence": 96 +} diff --git a/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-2/candidate.php b/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-2/candidate.php new file mode 100644 index 0000000000000..df0f95c384e15 --- /dev/null +++ b/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-2/candidate.php @@ -0,0 +1,18 @@ +next_tag() ) { + $attributes = $processor->get_attribute_names_with_prefix( 'data-track-' ); + + if ( null === $attributes ) { + continue; + } + + foreach ( $attributes as $attribute_name ) { + $processor->remove_attribute( $attribute_name ); + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-2/execution.json b/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-2/execution.json new file mode 100644 index 0000000000000..69818c64e3cac --- /dev/null +++ b/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-2/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-2/candidate.php", + "function": "strip_tracking_attributes", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "single-link", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-tags", + "status": "pass", + "expected": "

            Text

            ", + "actual": "

            Text

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-matching-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "similar-prefixes-kept", + "status": "pass", + "expected": "x", + "actual": "x", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-source-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comments-untouched", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-matches", + "status": "pass", + "expected": "

            Text

            ", + "actual": "

            Text

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-2/response.json b/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-2/response.json new file mode 100644 index 0000000000000..cbf153bae68f2 --- /dev/null +++ b/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat attribute-rewrite pass over every tag opener with `next_tag()`. For each matched tag, it calls `get_attribute_names_with_prefix( 'data-track-' )` to find exactly the lowercase attribute names that start with that prefix, removes each one with `remove_attribute()`, and returns the edited markup with `get_updated_html()`.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-3/candidate.php b/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-3/candidate.php new file mode 100644 index 0000000000000..df0f95c384e15 --- /dev/null +++ b/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-3/candidate.php @@ -0,0 +1,18 @@ +next_tag() ) { + $attributes = $processor->get_attribute_names_with_prefix( 'data-track-' ); + + if ( null === $attributes ) { + continue; + } + + foreach ( $attributes as $attribute_name ) { + $processor->remove_attribute( $attribute_name ); + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-3/execution.json b/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-3/execution.json new file mode 100644 index 0000000000000..128ef1cdb19d6 --- /dev/null +++ b/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-3/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-3/candidate.php", + "function": "strip_tracking_attributes", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "single-link", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-tags", + "status": "pass", + "expected": "

            Text

            ", + "actual": "

            Text

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-matching-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "similar-prefixes-kept", + "status": "pass", + "expected": "x", + "actual": "x", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-source-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comments-untouched", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-matches", + "status": "pass", + "expected": "

            Text

            ", + "actual": "

            Text

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-3/response.json b/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-3/response.json new file mode 100644 index 0000000000000..a5b6f3777c9a5 --- /dev/null +++ b/doc-experiment/results/round-29/T11-strip-tracking-attributes/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat attribute rewrite on individual tag openers. The function scans every tag with `next_tag()`, gets all matching attribute names via `get_attribute_names_with_prefix( 'data-track-' )`, removes each one with `remove_attribute()`, and returns the modified markup with `get_updated_html()`.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-29/T12-unwrap-spans/judge.json b/doc-experiment/results/round-29/T12-unwrap-spans/judge.json new file mode 100644 index 0000000000000..ced21b8a31927 --- /dev/null +++ b/doc-experiment/results/round-29/T12-unwrap-spans/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment() for a body fragment, walked all tokens with next_token(), skipped SPAN opener/closer tokens via documented get_tag(), and accumulated normalized output with serialize_token(). All called methods are present in the rendered docs; no _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Same documented token-serialization pattern as the reference. Minor adherence penalty: on create_fragment() failure or get_last_error(), it returns the original input, which may violate a normalized-rewrite contract by preserving spans and non-normalized markup. This did not affect the hidden cases." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the documented HTML Processor rewrite pattern directly: create_fragment(), next_token(), get_tag(), serialize_token(), and get_last_error(). Correctly avoids Tag Processor get_updated_html() for a structural normalized rewrite; no undocumented API usage." + } + ], + "failure_analysis": "All three trials passed all seven hidden cases. The docs did well on the key distinction for this task: the HTML Processor overview says it adds structural awareness and normalized serialization, while the Tag Processor overview warns it has no tree awareness. The HTML Processor recipe 'rewrite while serializing tokens' and serialize_token() docs directly explain appending current-token serialization, skipping tokens to remove them, and not calling normalize() afterward. The serialize_token() section also includes a general example removing wrapper element tokens while preserving contents, which appears to have led all trials to the intended approach. Near-misses: all candidates relied on get_tag() returning null for non-tag tokens rather than explicitly checking get_token_type() === '#tag'. This is supported by the get_tag() docs and the serialize_token() example, but the safer token taxonomy is somewhat split between get_tag(), get_token_name(), and get_token_type(). Trial 2 also chose an original-input fallback on parser error; the docs say to reject or fall back when get_last_error() is non-null, but they do not tie fallback choice to whether the caller promised normalized rewritten output.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::serialize_token() / rewriting-loop guidance", + "problem": "The docs say to 'reject or fall back' on get_last_error(), but do not define what fallback preserves the contract when the caller is producing normalized rewritten HTML.", + "suggestion": "Add a note distinguishing fail-closed rewrites from display filters: if the function contract requires normalized transformed output, returning the original input is not equivalent to a successful rewrite because it may keep skipped elements and unnormalized bytes." + }, + { + "location": "WP_HTML_Processor::get_tag(), get_token_name(), and get_token_type() docs", + "problem": "The relationship between tag-name checks and token-type checks is spread across separate method docs. Readers can infer that get_tag() is null on non-tags, but the preferred predicate for token-rewrite filters is not stated in one place.", + "suggestion": "Add a short cross-reference in rewrite examples: for element-only filtering, either check get_tag() for a tag name or check get_token_type() === '#tag' plus get_token_name()/get_tag(); explain when each form is clearer." + }, + { + "location": "WP_HTML_Processor::create_fragment() return value docs", + "problem": "The method signature documents static|null, but the common default-body case does not clearly say what practical conditions can produce null.", + "suggestion": "Document the null-return causes for create_fragment(), especially unsupported context or encoding, so callers can choose an appropriate failure value for their own return contract." + } + ] +} diff --git a/doc-experiment/results/round-29/T12-unwrap-spans/trial-1/candidate.php b/doc-experiment/results/round-29/T12-unwrap-spans/trial-1/candidate.php new file mode 100644 index 0000000000000..f7a01467eea5a --- /dev/null +++ b/doc-experiment/results/round-29/T12-unwrap-spans/trial-1/candidate.php @@ -0,0 +1,24 @@ +next_token() ) { + if ( 'SPAN' === $processor->get_tag() ) { + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + return $output; +} diff --git a/doc-experiment/results/round-29/T12-unwrap-spans/trial-1/execution.json b/doc-experiment/results/round-29/T12-unwrap-spans/trial-1/execution.json new file mode 100644 index 0000000000000..83ae975917b6a --- /dev/null +++ b/doc-experiment/results/round-29/T12-unwrap-spans/trial-1/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T12-unwrap-spans/trial-1/candidate.php", + "function": "unwrap_spans", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

            a b c d

            ", + "actual": "

            a b c d

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-spans", + "status": "pass", + "expected": "

            outer inner tail

            ", + "actual": "

            outer inner tail

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-spans-normalized-passthrough", + "status": "pass", + "expected": "

            plain & simple

            ", + "actual": "

            plain & simple

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attributes-discarded", + "status": "pass", + "expected": "styled", + "actual": "styled", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "adjacent-spans", + "status": "pass", + "expected": "

            ab

            ", + "actual": "

            ab

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "span-with-block-content", + "status": "pass", + "expected": "
            before after
            ", + "actual": "
            before after
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-span", + "status": "pass", + "expected": "

            runs to end

            ", + "actual": "

            runs to end

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T12-unwrap-spans/trial-1/response.json b/doc-experiment/results/round-29/T12-unwrap-spans/trial-1/response.json new file mode 100644 index 0000000000000..c6226d57c3539 --- /dev/null +++ b/doc-experiment/results/round-29/T12-unwrap-spans/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-fragment HTML, then walks every token with `next_token()` and rebuilds normalized output with `serialize_token()`. Any token whose tag name is `SPAN` is skipped, which removes both opening and closing span tags while preserving their contents, including nested spans; `get_last_error()` is checked so unsupported markup does not return a partial rewrite.", + "confidence": 90 +} diff --git a/doc-experiment/results/round-29/T12-unwrap-spans/trial-2/candidate.php b/doc-experiment/results/round-29/T12-unwrap-spans/trial-2/candidate.php new file mode 100644 index 0000000000000..cf94fddf1dcb7 --- /dev/null +++ b/doc-experiment/results/round-29/T12-unwrap-spans/trial-2/candidate.php @@ -0,0 +1,24 @@ +next_token() ) { + if ( 'SPAN' === $processor->get_tag() ) { + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + return $output; +} diff --git a/doc-experiment/results/round-29/T12-unwrap-spans/trial-2/execution.json b/doc-experiment/results/round-29/T12-unwrap-spans/trial-2/execution.json new file mode 100644 index 0000000000000..7faae4e0f4aca --- /dev/null +++ b/doc-experiment/results/round-29/T12-unwrap-spans/trial-2/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T12-unwrap-spans/trial-2/candidate.php", + "function": "unwrap_spans", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

            a b c d

            ", + "actual": "

            a b c d

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-spans", + "status": "pass", + "expected": "

            outer inner tail

            ", + "actual": "

            outer inner tail

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-spans-normalized-passthrough", + "status": "pass", + "expected": "

            plain & simple

            ", + "actual": "

            plain & simple

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attributes-discarded", + "status": "pass", + "expected": "styled", + "actual": "styled", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "adjacent-spans", + "status": "pass", + "expected": "

            ab

            ", + "actual": "

            ab

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "span-with-block-content", + "status": "pass", + "expected": "
            before after
            ", + "actual": "
            before after
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-span", + "status": "pass", + "expected": "

            runs to end

            ", + "actual": "

            runs to end

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T12-unwrap-spans/trial-2/response.json b/doc-experiment/results/round-29/T12-unwrap-spans/trial-2/response.json new file mode 100644 index 0000000000000..e5e57b4c5c215 --- /dev/null +++ b/doc-experiment/results/round-29/T12-unwrap-spans/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body-fragment HTML structurally, then walks every token with `next_token()`. For normalized output, it rebuilds the fragment token-by-token with `serialize_token()`, skipping any token whose tag is `SPAN`, which removes both span openers and closers while preserving their contents and normalizing the rest of the HTML.", + "confidence": 86 +} diff --git a/doc-experiment/results/round-29/T12-unwrap-spans/trial-3/candidate.php b/doc-experiment/results/round-29/T12-unwrap-spans/trial-3/candidate.php new file mode 100644 index 0000000000000..6e7cc87e73eed --- /dev/null +++ b/doc-experiment/results/round-29/T12-unwrap-spans/trial-3/candidate.php @@ -0,0 +1,25 @@ +next_token() ) { + if ( 'SPAN' === $processor->get_tag() ) { + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + return $output; +} diff --git a/doc-experiment/results/round-29/T12-unwrap-spans/trial-3/execution.json b/doc-experiment/results/round-29/T12-unwrap-spans/trial-3/execution.json new file mode 100644 index 0000000000000..602a0e0d7348d --- /dev/null +++ b/doc-experiment/results/round-29/T12-unwrap-spans/trial-3/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-29/T12-unwrap-spans/trial-3/candidate.php", + "function": "unwrap_spans", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

            a b c d

            ", + "actual": "

            a b c d

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-spans", + "status": "pass", + "expected": "

            outer inner tail

            ", + "actual": "

            outer inner tail

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-spans-normalized-passthrough", + "status": "pass", + "expected": "

            plain & simple

            ", + "actual": "

            plain & simple

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attributes-discarded", + "status": "pass", + "expected": "styled", + "actual": "styled", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "adjacent-spans", + "status": "pass", + "expected": "

            ab

            ", + "actual": "

            ab

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "span-with-block-content", + "status": "pass", + "expected": "
            before after
            ", + "actual": "
            before after
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-span", + "status": "pass", + "expected": "

            runs to end

            ", + "actual": "

            runs to end

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-29/T12-unwrap-spans/trial-3/response.json b/doc-experiment/results/round-29/T12-unwrap-spans/trial-3/response.json new file mode 100644 index 0000000000000..d18a242b66797 --- /dev/null +++ b/doc-experiment/results/round-29/T12-unwrap-spans/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, then rewrites it token by token with `next_token()` and `serialize_token()`, skipping every `SPAN` token so both opening and closing span wrappers disappear while their contents remain. Using HTML Processor serialization produces the required normalized HTML output.", + "confidence": 87 +} diff --git a/doc-experiment/results/round-29/codex-judges-output.json b/doc-experiment/results/round-29/codex-judges-output.json new file mode 100644 index 0000000000000..3af296484e765 --- /dev/null +++ b/doc-experiment/results/round-29/codex-judges-output.json @@ -0,0 +1,659 @@ +{ + "result": [ + { + "id": "N03-first-list-count", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), which is the documented choice for structure-aware direct-child counting. All called methods are present in the rendered docs. The implementation follows the documented bookmark -> next_token()/depth-bounded scan -> paused_at_incomplete_token()/get_last_error() -> seek -> set_attribute() -> get_updated_html() pattern. It passed 11/11 with no _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used the HTML Processor, bookmarks, token walking, get_current_depth(), get_token_type(), and get_updated_html(). The bounded subtree loop matches the docs' >= depth guidance, and it checks incomplete/unsupported parser state before editing. All API calls are documented. It passed 11/11 with no _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correct processor choice and fully documented API use. It applies the documented structural scan pattern, counts only LI opener tokens at list_depth + 1, rejects incomplete or unsupported scans, seeks back to the opener, and reads output with get_updated_html(). It passed 11/11 with no _doing_it_wrong records." + } + ], + "failure_analysis": "All trials passed every hidden case, so there were no failed cases to attribute to documentation gaps. The docs did especially well in four places: html-tag-processor.md, \"Which processor should I use?\", clearly says the Tag Processor has no tree awareness and points structural work to WP_HTML_Processor; html-processor.md, \"Recipe: scan a region before editing its opener\", almost directly teaches the required bookmark/scan/seek/edit pattern; WP_HTML_Processor::next_token() explains virtual closers, implied structure, and the single-cursor hazard; and WP_HTML_Processor::get_current_depth() explicitly documents the >= subtree boundary and the need to check paused_at_incomplete_token() plus get_last_error(). Those passages explain why all three subjects handled omitted LI closers, nested lists, incomplete tokens inside the list, and unsupported markup inside the list. The main near-misses were documentation ambiguities that did not bite this round: next_token() still has a stale \"do not use\" history note despite being required by the public recipes, and the HTML Support wording that unsupported markup aborts when it appears in the input can be read as whole-document-global rather than encounter-scoped. The frozen cases for malformed markup after a closed list depend on the encounter-scoped behavior: a bounded scan that stops at the list closer has not seen the later bad token, so get_last_error() and paused_at_incomplete_token() remain clean.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::next_token() method docs", + "problem": "The examples and recipes present next_token() as the right public tool for structural token walks, but the Since note still says \"Added for internal support; do not use.\"", + "suggestion": "Remove or revise the stale warning so it says next_token() is supported for advanced structural walks, with the existing cautions about bounding scans and the shared cursor." + }, + { + "location": "WP_HTML_Processor::get_current_depth() and subtree-walk examples", + "problem": "The docs explain >= boundaries well, but the direct-child test is implicit: readers must infer that a direct child opener is a #tag, not a closer, at parent_depth + 1.", + "suggestion": "Add a short general contract or snippet for detecting direct child element openers: record parent depth N, then match complete tag opener tokens where get_current_depth() === N + 1; deeper tokens are descendants." + }, + { + "location": "HTML Support / get_last_error() / paused_at_incomplete_token() docs", + "problem": "The docs say unsupported or incomplete markup should be checked after a scan, but they do not clearly distinguish errors encountered within a bounded region from malformed tokens that appear later and were never scanned.", + "suggestion": "State that get_last_error() and paused_at_incomplete_token() reflect parser progress so far. For bounded-region edits, callers should decide whether they require the region to be complete or the entire remaining document to be scanned cleanly." + } + ] + } + }, + { + "id": "N04-normalize-or-placeholder", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the right API: documented `WP_HTML_Processor::normalize()`. No undocumented calls. The strict `null === $normalized` check correctly treats unsupported markup as fallback while preserving valid empty-string output." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same correct implementation as trial 1. Processor choice, API usage, and fallback handling all match the rendered HTML Processor docs." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same correct implementation as trials 1 and 2. It uses the one-call normalization API and avoids unnecessary token walking or Tag Processor reconstruction." + } + ], + "failure_analysis": "No hidden case failed in any trial; all three passed 7/7. The docs did well here: the HTML Processor overview says to choose it for normalized serialization and structural HTML handling, the `normalize()` section says it assumes BODY-fragment context, lists normalization effects such as quoted attributes, omitted tags, table repair, text re-encoding, and trailing incomplete-token omission, and its return contract says `string|null` with `null` when unable to normalize. The unsupported-markup section also names mis-nested formatting as an unsupported case and says output-producing methods such as `serialize()` and `normalize()` return `null`. Near-misses: the empty-fragment case depended on using a strict null check rather than a truthiness check, and the docs do not explicitly call out that a successful normalization may be `''`. Also, execution records show unsupported cases going through the null path; the docs describe the return value but are less explicit about whether callers should expect warnings or other error-channel side effects from serialization failure.", + "doc_gaps": [ + { + "location": "`WP_HTML_Processor::normalize()` return docs", + "problem": "The `string|null` contract is accurate but does not explicitly warn that valid normalization can return an empty string, so callers might write `if ( ! $normalized )` and misclassify empty input as failure.", + "suggestion": "Add a sentence stating that `null` alone indicates inability to normalize and that callers should use a strict null check because `''` can be a valid normalized result." + }, + { + "location": "`WP_HTML_Processor::normalize()` and `serialize()` failure docs", + "problem": "The docs say unsupported markup returns `null`, but they do not clearly state the expected warning/error side effects, despite serialization failure being observable in execution records.", + "suggestion": "Document whether normalization failure is intended to be a quiet `null` return or may also emit a warning, and give callers a general policy for handling that error channel." + }, + { + "location": "HTML Processor normalization guidance", + "problem": "The docs contain the right pieces across the overview, support section, and method docs, but the choice between `normalize()`, `serialize()`, `serialize_token()`, and `get_updated_html()` is spread out.", + "suggestion": "Add a compact public-API chooser note: use `normalize()` for an unchanged BODY-fragment normalized copy, `serialize()` for a freshly-created processor, `serialize_token()` for token-by-token rewrites, and `get_updated_html()` after queued edits." + } + ] + } + }, + { + "id": "N06-extract-toc", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 92, + "hallucinated_methods": [], + "notes": "Correctly chose WP_HTML_Processor::create_fragment() and used documented token APIs: next_token(), get_token_type(), get_tag(), is_tag_closer(), and get_modifiable_text(). No _doing_it_wrong records. The single-pass state machine matches the documented repeated-region pattern and handles implied heading closes in the frozen cases. Main adherence issue: it explicitly includes SCRIPT/STYLE/TEXTAREA/TITLE opener text inside headings, even though the DOM-style subtree-text recipe says ordinary text should be #text tokens unless the caller opts into special-element contents." + }, + { + "trial_id": "trial-2", + "adherence": 93, + "hallucinated_methods": [], + "notes": "Correctly chose WP_HTML_Processor::create_fragment() and all API calls are documented or inherited in the rendered docs: next_tag(), next_token(), get_current_depth(), get_token_type(), get_modifiable_text(), is_tag_closer(), get_token_name(), paused_at_incomplete_token(), and get_last_error(). The depth-bounded subtree walk is the most reference-like solution. It still over-includes special-element opener text, and its truncation policy is stricter than the task/reference: an incomplete trailing comment would discard accumulated headings instead of returning best-effort extracted text." + }, + { + "trial_id": "trial-3", + "adherence": 91, + "hallucinated_methods": [], + "notes": "Correctly chose WP_HTML_Processor::create_fragment() and used documented APIs. The one-pass token walk is broadly idiomatic and avoids unsafe regex parsing. It relies on manual heading state rather than a depth/breadcrumb boundary, but the docs do support closer-driven collection because the HTML Processor visits virtual closers. Like the others, it over-includes SCRIPT/STYLE/TEXTAREA/TITLE opener text, and its error policy is partial: get_last_error() is only checked when flushing a still-open final heading." + } + ], + "failure_analysis": "All three trials passed all frozen cases: basic-h1-h3, all-heading-levels, nested-text-and-entities, empty-heading, case-insensitive-source, implied-heading-close, and no-matches. The docs worked well for the central task: they made the processor choice clear by saying the HTML Processor is for tree-aware text extraction; they documented create_fragment() for body fragments; they documented uppercase get_tag() results; they documented #text token accumulation with get_modifiable_text(); and they documented virtual/implied closing tokens, which explains why malformed '

            One

            Two' can be handled structurally.\n\nNear-miss: every trial opted into special-element opener text for SCRIPT, STYLE, TEXTAREA, and TITLE inside headings. A probe shows the reference returns only ordinary #text text for '

            AD

            ' as 'AD', while all three candidates return 'AB &C &D'. The overview recipe 'collect DOM-style text from a subtree' says ordinary text is only #text tokens and says not to include special-element opener text merely because it is available. However, the next_token() method section also says special elements produce no #text children and to read their text from the opener, which appears to have encouraged subjects to treat that as part of generic text extraction rather than an opt-in policy.\n\nSecond near-miss: incomplete-input policy was interpreted inconsistently. Trial 2 checks paused_at_incomplete_token() and returns an empty array for an incomplete trailing comment after a heading, while the reference and the other trials return the heading text already collected. The docs correctly mention checking paused_at_incomplete_token() when a caller must reject truncation, but they do not make the policy boundary crisp for read-only extraction tasks that can return best-effort results.", + "doc_gaps": [ + { + "location": "html-processor.md, next_token(), paragraph beginning 'One important exception to the collect-#text-tokens recipe'", + "problem": "The paragraph can be read as a general instruction to include SCRIPT/STYLE/TITLE/TEXTAREA opener text whenever collecting element text, even though the overview recipe later says this is opt-in only.", + "suggestion": "Qualify the paragraph with 'if the caller's definition of text includes special-element contents' and point back to the ordinary subtree-text recipe. Include a short example where ordinary text excludes SCRIPT/TEXTAREA but an explicit all-modifiable-text policy includes them." + }, + { + "location": "html-processor.md, Recipe: collect DOM-style text from a subtree", + "problem": "The term 'DOM-style text' is easy to confuse with broader notions like DOM textContent or 'all text-like content', especially for special elements whose contents are exposed via get_modifiable_text().", + "suggestion": "Define the contract more explicitly as 'ordinary parsed text descendants represented by #text tokens' and contrast it with 'special-element contents' and 'all tokens with modifiable text'." + }, + { + "location": "html-processor.md, next_token() and get_current_depth() examples", + "problem": "The docs warn that nested walk loops can interfere, while also showing a next_tag() followed by a bounded next_token() subtree walk. Subjects need a sharper rule for when this pattern is safe.", + "suggestion": "Add a note that an immediate depth-bounded inner walk for one matched element is safe when the caller expects the cursor to advance to the element boundary, but repeated sibling extraction may be clearer as a single token loop with explicit state." + }, + { + "location": "html-processor.md, paused_at_incomplete_token() guidance in next_token()/get_current_depth()", + "problem": "The docs explain how to detect truncation but do not clearly separate validation/mutation policies from best-effort read-only extraction policies.", + "suggestion": "Add a small policy note: mutating or validation-oriented code should reject/fallback on truncation or get_last_error(); read-only collectors may return accumulated partial results if their contract allows it, but should document that choice." + } + ] + } + }, + { + "id": "T01-add-image-class", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Tag_Processor, which the docs identify as the right tool for flat, byte-preserving attribute/class edits. Calls only documented API: constructor, next_tag(), add_class(), get_updated_html(). The loop is idiomatic and relies on documented next_tag() behavior for case-insensitive tag matching, comments, and incomplete trailing tags." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Identical to trial-1. Correct processor choice, fully documented method usage, and idiomatic scan/edit/return pattern. No _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Identical implementation. The response additionally mentions raw-text regions; that is supported by the next_tag() documentation stating tag-like text in raw text contents is not matched. No undocumented API usage or misuse." + } + ], + "failure_analysis": "All three trials passed all 8 hidden cases: simple, multiple, existing-classes, uppercase-tag, inside-comment-ignored, no-images, unquoted-attributes, and incomplete-tag-at-end. The docs did well in the relevant places: the Tag Processor overview explains it is appropriate for flat byte-preserving tag edits; the next_tag() docs explicitly cover string tag queries, ASCII case-insensitive matching, ignoring tag-like text inside comments/raw-text sections, and pausing before incomplete trailing syntax; add_class() is documented for class updates; get_updated_html() is documented as the correct way to retrieve queued edits while preserving untouched bytes. The only near-miss is that some crucial add_class() semantics are easier to find in overview/design prose than in the add_class() method section itself, so a reader relying only on the method entry could miss ordering/preservation details.", + "doc_gaps": [ + { + "location": "html-tag-processor.md add_class() method docs", + "problem": "The method section says it adds a class, but the most task-relevant guarantees are scattered elsewhere: creating class when absent, appending without reordering existing classes, preserving class ordering/whitespace as much as possible, and no-op behavior when already present.", + "suggestion": "Make the add_class() docblock self-contained by explicitly listing those class-list semantics and including one compact example for absent and existing class attributes." + }, + { + "location": "html-tag-processor.md next_tag() method docs", + "problem": "The docs explain string queries and case-insensitive matching, but the string shorthand is more prominent in the usage table than in the method contract.", + "suggestion": "In the next_tag() docblock, state directly that next_tag('img') is equivalent to querying tag_name => 'IMG' and that matching is ASCII case-insensitive while output preserves original tag-name casing." + }, + { + "location": "html-tag-processor.md get_updated_html() method docs", + "problem": "The method correctly states byte preservation, but readers may still confuse it with serialization APIs after seeing both processor docs.", + "suggestion": "Add a short cross-reference note in class-modification examples: after set_attribute(), add_class(), remove_class(), or set_modifiable_text(), return get_updated_html(); reserve serialize()/serialize_token() for normalized token-by-token rewrites." + } + ] + } + }, + { + "id": "T02-link-targets", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Tag_Processor, which the docs recommend for flat, byte-preserving attribute edits. Called only documented APIs: constructor, next_tag(), get_attribute(), set_attribute(), and get_updated_html(). The null check correctly treats href=\"\" and valueless href as present while skipping absent href; set_attribute() correctly overwrites existing target." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same correct documented pattern as the reference: linear next_tag('A') walk, null !== get_attribute('href') for presence, set_attribute('target', '_blank') for add/overwrite, and get_updated_html() for byte-preserving output. No _doing_it_wrong records or undocumented API use." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correct processor choice and idiomatic documented API usage. The implementation handles the documented null/empty-string/true attribute semantics and relies on the processor to ignore comments and preserve untouched bytes. No hallucinated methods or misuse records." + } + ], + "failure_analysis": "All trials passed all hidden cases, so there are no failed cases to attribute to a documentation defect. The rendered docs did especially well in four places: the Tag Processor overview says to use this class for flat attribute/class edits and byte-precise preservation; the usage section shows constructing with new WP_HTML_Tag_Processor and walking with next_tag(); the get_attribute() documentation distinguishes null for missing, empty string for present-empty, and true for valueless boolean attributes; and set_attribute()/get_updated_html() document overwrite behavior plus byte-preserving output. The main near-miss is that the model explanations sometimes phrase the href test as just \"checks get_attribute('href')\"; the code used the correct null comparison, but a truthiness check would have failed empty-string href. The docs contain the needed contract, but an explicit presence-test idiom would make that safer.", + "doc_gaps": [ + { + "location": "WP_HTML_Tag_Processor::get_attribute() docblock and high-level Custom queries section", + "problem": "The null/empty-string/true distinction is documented, but the common derived rule for attribute presence is implicit. Readers may still write a truthiness check and accidentally reject present-empty attributes.", + "suggestion": "Add a short general example showing presence testing with `null !== $processor->get_attribute( $name )`, and state that truthiness is not a valid presence test because `\"\"` is a present value." + }, + { + "location": "WP_HTML_Tag_Processor::set_attribute() docblock and Modifying HTML attributes overview", + "problem": "Attribute insertion and overwrite ordering are documented in the method details, but byte-exact tasks depend heavily on the rule that existing attributes keep position while new attributes are inserted immediately after the tag name and sorted among other new attributes.", + "suggestion": "Surface the insertion-order contract in the overview with a tiny before/after example for one existing attribute update and one newly added attribute." + }, + { + "location": "WP_HTML_Tag_Processor::next_tag() docblock", + "problem": "The docs imply lexical safety, but the method-level contract could be more explicit that `next_tag()` matches real tag openers only, not markup-looking text inside comments, SCRIPT/STYLE/TITLE/TEXTAREA content, or incomplete trailing syntax.", + "suggestion": "Add a concise note under `next_tag()` describing which markup-looking sequences are skipped or paused, with cross-links to the special-element and incomplete-token sections." + } + ] + } + }, + { + "id": "T03-first-h1-text", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Passed 8/8. Used WP_HTML_Processor::create_fragment(), next_tag('H1'), depth-bounded next_token(), get_token_type(), and get_modifiable_text() exactly as documented for subtree text extraction. It avoided broad get_modifiable_text() use and correctly relies on decoded #text tokens and virtual closers for incomplete input." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Passed 8/8. Same API shape as the reference: correct tree-aware processor, documented methods only, idiomatic >= depth guard, and #text-only accumulation with decoded get_modifiable_text(). No _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 94, + "hallucinated_methods": [], + "notes": "Passed 8/8. All called methods are documented: create_fragment, next_tag, get_current_depth, next_token, get_token_type, get_modifiable_text, is_tag_closer, and get_token_name. The main deduction is the extra SCRIPT/STYLE/TEXTAREA/TITLE branch: the docs document this opt-in pattern, but also warn that ordinary subtree text should append only #text tokens unless the caller explicitly asks for special-element contents. For a heading-text task, this is a plausible but over-broad interpretation, especially because SCRIPT/STYLE text is raw, not decoded." + } + ], + "failure_analysis": "No hidden case failed in the frozen execution reports; all three trials passed all 8 cases. The docs did well on the core task: html-processor.md's \"Recipe: collect DOM-style text from a subtree\" gives the exact processor choice and loop shape, next_token() explains that token walks do not stop at the original matched element, get_current_depth() explains the >= guard and virtual closers, and get_modifiable_text() explains decoded #text text. The near-miss is trial 3's special-element handling. html-processor.md both says ordinary subtree text excludes special element opener text and later says special-element contents are carried on the opener token. That is accurate but easy to over-apply when a task says \"text content\" without naming whether SCRIPT/STYLE/TEXTAREA/TITLE payloads count. A read-only probe confirmed the divergence: the reference-style #text-only policy returns \"AB\" for

            AB

            , while trial 3 would return \"AD & EF & GB\".", + "doc_gaps": [ + { + "location": "/tmp/html-api-docs-eval/round-29/html-processor.md, heading \"Recipe: collect DOM-style text from a subtree\"", + "problem": "The heading says \"DOM-style text\" while the body defines a narrower default policy: ordinary #text tokens only, excluding special-element opener text. That terminology can make readers think a generic text-content request should include SCRIPT/STYLE/TEXTAREA/TITLE payloads.", + "suggestion": "Rename or clarify the recipe as ordinary subtree text extraction, and add a short policy note distinguishing ordinary human-readable subtree text from a caller-defined full textContent-like extraction. State that special-element payloads are excluded unless the caller explicitly names them." + }, + { + "location": "/tmp/html-api-docs-eval/round-29/html-processor.md, next_token() special-element exception and get_modifiable_text()", + "problem": "The docs correctly explain how to read special-element text, but the warning about raw versus decoded payloads is separated from the subtree extraction decision. This contributed to trial 3 appending SCRIPT/STYLE raw text into a decoded heading-text result.", + "suggestion": "Add a compact decision table for token inclusion: #text for ordinary extracted text; TITLE/TEXTAREA opener text only when explicitly requested and decoded; SCRIPT/STYLE opener text only for raw code/style payload extraction, not general human text." + }, + { + "location": "/tmp/html-api-docs-eval/round-29/html-processor.md, subtree walking examples", + "problem": "The examples show how to collect text once an element is found, but the no-match null versus matched-empty-string distinction is implicit. This distinction matters for extraction APIs that return null only when the target element is absent.", + "suggestion": "Add a general example note for extraction contracts: use next_tag() failure for \"not found\" and keep an initialized empty accumulator for matched elements with no #text descendants." + } + ] + } + }, + { + "id": "T04-build-figure", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Used the correct `WP_HTML_Tag_Processor` for fixed-shape fragment construction, with only documented methods: `next_tag`, `set_attribute`, `next_token`, `get_token_type`, `set_modifiable_text`, and `get_updated_html`. It followed the documented template/placeholder pattern and preserved attribute order by seeding `src` then `alt`. Minor near-miss: it did not check `next_tag()` or `set_modifiable_text()` return values, though the controlled literal template makes that low risk." + }, + { + "trial_id": "trial-2", + "adherence": 99, + "hallucinated_methods": [], + "notes": "Same correct documented API usage as the reference, and slightly more defensive than trials 1 and 3 by guarding the `next_tag( 'img' )` call before setting attributes. It used token walking to find a `#text` token and `get_updated_html()` to read queued edits. Minor near-miss: it still did not check the boolean result of `set_modifiable_text()`, despite the docs advising that generally." + }, + { + "trial_id": "trial-3", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Used the correct Tag Processor and only documented methods. The solution closely follows the rendered docs' `Building markup from a template` pattern: seed exact markup, update existing attributes, replace placeholder text, and return `get_updated_html()`. Minor near-miss: unchecked `next_tag()` and `set_modifiable_text()` return values." + } + ], + "failure_analysis": "All three trials passed all 7 hidden cases, with no `_doing_it_wrong` or PHP errors. The docs worked well for this task. The `Which processor should I use?` guidance clearly says the Tag Processor is appropriate for flat, byte-preserving attribute edits, while the HTML Processor is for structural questions. The `Building markup from a template` section directly taught the needed pattern: start from a literal template, include attributes in the desired order, include placeholder text for later replacement, then use `set_attribute()`, token walking, `set_modifiable_text()`, and `get_updated_html()`. The `set_attribute()` docs also explicitly explain that plain unescaped values are encoded and that newly added attributes sort by name, which likely prevented attribute-order failures. The `set_modifiable_text()` docs explain that ordinary container elements do not carry text themselves and that callers need a `#text` token or placeholder, which likely prevented attempts to set text while matched on `FIGCAPTION`. Near-misses were limited to defensive style: candidates mostly copied the fixed-template examples without checking every boolean return value, but the chosen template made those calls deterministic in this task.", + "doc_gaps": [ + { + "location": "`WP_HTML_Tag_Processor::set_modifiable_text()` docblock and examples", + "problem": "The prose says to always check the return value, but the successful template-building examples make it easy to omit that check when copying the pattern.", + "suggestion": "Add a short example that captures the boolean result and handles `false`, or explicitly state that a known ordinary `#text` token in a trusted template is the narrow case where failure is unexpected." + }, + { + "location": "`WP_HTML_Tag_Processor::next_tag()` usage examples", + "problem": "Examples often call `next_tag()` directly in fixed-template code, while broader input-processing code needs to guard the `false` case because the cursor moves to the end on failure or incomplete input.", + "suggestion": "Distinguish trusted literal-template examples from arbitrary-input examples, and show guarded `next_tag()` for the latter." + }, + { + "location": "`WP_HTML_Tag_Processor::set_attribute()` docblock", + "problem": "The docs cover `true` and `false` boolean handling and attribute ordering, but the empty-string case is only implicit. Builders often need to know that `''` means an empty quoted value, not a boolean or removed attribute.", + "suggestion": "Add an explicit sentence and tiny example: passing `''` renders `name=\"\"`; passing `true` renders a boolean attribute; passing `false` removes it." + } + ] + } + }, + { + "id": "T05-text-excerpt", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the correct `WP_HTML_Processor::create_fragment()` path, walked tokens with `next_token()`, read only `#text` plus whitelisted `TITLE`/`TEXTAREA` opener text, and used documented decoded `get_modifiable_text()` semantics with UTF-8-safe truncation. Passed 10/10 cases with no `_doing_it_wrong` records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same correct processor and token-walk pattern as the reference. All processor methods used are present in the rendered docs, and the implementation correctly avoids treating all modifiable text as DOM text. Passed 10/10 cases with no `_doing_it_wrong` records." + }, + { + "trial_id": "trial-3", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Correctly chose the HTML Processor and used only documented methods. It follows the documented text-extraction pattern, including special opener text for `TITLE`/`TEXTAREA`. Minor caveat: the final `get_last_error()` fallback is a strict policy not required by the task and would differ from the reference on unsupported markup after earlier extractable text, though the method itself is documented. Passed 10/10 cases with no `_doing_it_wrong` records." + } + ], + "failure_analysis": "No failed hidden case appeared across the three trials: each candidate passed all 10 frozen expectations. The docs performed well on the central hazards for this task: they explicitly say to use `WP_HTML_Processor` rather than `WP_HTML_Tag_Processor` for DOM-style text extraction, to walk with `next_token()` when text matters, to append ordinary `#text` tokens rather than every token with modifiable text, and to opt into special-element opener text for `TITLE` and `TEXTAREA` while treating `SCRIPT` and `STYLE` separately. The `get_modifiable_text()` documentation also clearly states that `#text`, `TEXTAREA`, and `TITLE` are returned decoded and UTF-8, which explains why all candidates handled `&`, accents, and emoji correctly. The main near-miss is policy around parser aborts and incomplete input: trial 3 interpreted `get_last_error()` as a reason to discard all collected text. That is defensible from some strict-parser guidance, but the docs could better separate best-effort read-only extraction from mutation/serialization policies that must reject unsupported or truncated input.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::next_token() / Recipe: collect DOM-style text from a subtree", + "problem": "The correct full-fragment text extraction pattern requires combining several passages: processor choice, `#text` accumulation, and special-element opener text. Subjects succeeded here, but the guidance is distributed.", + "suggestion": "Add a compact general example for collecting text from a fragment that shows ordinary `#text` accumulation plus an explicit whitelist for special opener text, with a note that `SCRIPT`/`STYLE` raw text should only be included by caller policy." + }, + { + "location": "WP_HTML_Processor::get_last_error() and WP_HTML_Tag_Processor::paused_at_incomplete_token()", + "problem": "The docs mention unsupported aborts and incomplete trailing syntax, but the policy distinction is easy to over-apply to read-only extraction. `get_last_error()` does not report incomplete trailing tokens, and strict rejection is not always the desired result for best-effort scans.", + "suggestion": "Clarify that read-only scans must choose a policy: return best-effort text collected before an abort, or reject/fallback on `get_last_error()`. Separately state that incomplete trailing syntax is detected with `paused_at_incomplete_token()`, not `get_last_error()`." + }, + { + "location": "WP_HTML_Tag_Processor::get_modifiable_text() / WP_HTML_Processor::get_modifiable_text()", + "problem": "The UTF-8 note recommends `mb_strlen()`/`mb_substr()`, but it does not explicitly distinguish Unicode code points from grapheme clusters or user-perceived characters.", + "suggestion": "Add one sentence that `mb_*` with UTF-8 is suitable for code-point limits, while grapheme-aware limits require grapheme/Intl APIs. This would prevent ambiguity for emoji, variation selectors, and combining marks." + } + ] + } + }, + { + "id": "T06-collect-links", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the correct WP_HTML_Processor::create_fragment() parser, then next_tag('A') plus a depth-bounded next_token() subtree walk. All HTML API calls are documented. It correctly relied on get_attribute() string/true/null semantics, accumulated only #text tokens, and used get_modifiable_text() for decoded text." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor and a single next_token() state-machine walk, which matches the documented repeated-region pattern. All HTML API calls are documented. It finalized on A closers and also handled end-of-input defensively; href filtering and decoded text handling are correct." + }, + { + "trial_id": "trial-3", + "adherence": 99, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor and a documented token-walking approach with a small stack of active A elements. All HTML API calls are documented. It handles string-only href values and #text-only decoded text correctly. Slightly less direct than the documented closer-driven or depth-bounded recipes, but still API-adherent." + } + ], + "failure_analysis": "No hidden cases failed in any trial. The rendered docs did well on the key risks for this task: the HTML Processor overview says to choose WP_HTML_Processor when structure or text collection matters; the 'collect DOM-style text from a subtree' recipe shows a depth-bounded next_token() walk that appends only #text tokens; next_token() documents split text tokens, implicit/end-of-input closers, and the one-cursor model; get_attribute() documents string|true|null, and the Tag Processor version explicitly states decoded attribute values; get_modifiable_text() documents decoded #text output. The main near-misses are documentation locality issues rather than observed failures: decoded attribute behavior is clearer in the Tag Processor page than in the HTML Processor override, and the docs contain both a subtree inner-loop recipe and a warning against nested token walks without a crisp rule for when each pattern is appropriate.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_attribute() docblock", + "problem": "The HTML Processor override documents string|true|null and boolean attributes, but does not repeat the decoded string-value contract that appears in the Tag Processor docs.", + "suggestion": "State directly that string attribute values returned by WP_HTML_Processor::get_attribute() are already decoded, with a small href query-string example." + }, + { + "location": "WP_HTML_Processor::next_token() / subtree text recipe", + "problem": "The docs show a depth-bounded inner walk and also warn that nested next_token() walks can interfere. Readers need a clearer boundary between safe one-off subtree scans and repeated-region extraction.", + "suggestion": "Add a short note: use a depth-bounded inner walk for one matched subtree when consuming its closer is acceptable; use one single-pass state machine for repeated sibling/nested regions." + }, + { + "location": "WP_HTML_Processor::create_fragment() examples", + "problem": "The signature returns static|null, but several examples call methods on the result without showing a null guard.", + "suggestion": "Model the null check in at least the first usage example, or explicitly explain when null can be returned and how callers should handle it." + } + ] + } + }, + { + "id": "T07-nested-lists", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), one forward next_tag() walk, get_tag(), get_breadcrumbs(), add_class(), get_last_error(), and get_updated_html(). All API calls are documented, no _doing_it_wrong records, and all hidden cases passed." + }, + { + "trial_id": "trial-2", + "adherence": 82, + "hallucinated_methods": [], + "notes": "Correctly chose WP_HTML_Processor and used only documented APIs, but used two separate next_tag() scans on the same processor: first for UL, then for OL. The first loop leaves the cursor at the end, so the second loop cannot revisit earlier OL elements. This is a cursor-walking misuse rather than hallucinated API usage." + }, + { + "trial_id": "trial-3", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Used the right processor and an idiomatic single forward walk with get_breadcrumbs(), add_class(), and get_updated_html(). All API calls are documented and all hidden cases passed. Minor edge-case gap: unlike trial 1, it does not inspect get_last_error() after the scan before returning modified output." + } + ], + "failure_analysis": "Trials 1 and 3 passed every hidden case. Trial 2 failed simple-ol-inside-ul, deep-descendant, existing-class-preserved, multiple-nested-levels, and mixed-document for the same reason: it assumed a WP_HTML_Processor could be scanned once for UL tags and then scanned again for OL tags from the beginning. In reality next_tag() advances one shared cursor; after the UL loop returns false, the processor is already at EOF, so nested OL elements are never visited. The clearest relevant passage is in html-tag-processor.md under 'Finding tags': next_tag() returning false moves the cursor to the end, and once the cursor reaches the end the processor is done unless you recreate it or use bookmarks. The HTML Processor docs do not repeat this warning in the WP_HTML_Processor::next_tag() section, even though this structural task naturally points subjects to WP_HTML_Processor. For existing-class-preserved, the failure was not a class-merging misconception: add_class() docs correctly say existing classes are preserved/appended. The add_class() call simply never happened because the OL pass never ran. Breadcrumb docs were adequate for ancestor detection: they state that get_breadcrumbs() contains the full path including the current element, and the candidates that used a single walk applied that correctly.", + "doc_gaps": [ + { + "location": "html-processor.md > WP_HTML_Processor::next_tag()", + "problem": "The method docs say it finds the next matching tag but do not explicitly state that searches are cursor-relative and do not restart after a failed search. The equivalent warning exists in the Tag Processor overview, but subjects using the HTML Processor may not transfer that rule.", + "suggestion": "Add a short method-level note: each next_tag() call starts after the current cursor position; when it returns false, the cursor is at EOF, paused on incomplete input, or aborted; a later call with a different query will not rescan earlier tags. To revisit earlier tags, set a bookmark/seek or create a new processor." + }, + { + "location": "html-processor.md > Usage or next_tag() query examples", + "problem": "The docs document a single tag_name query but do not show the idiom for matching one of several tag names. This encourages separate sequential scans for each tag type.", + "suggestion": "Add a general example for OR-style tag matching: call next_tag() with no tag_name, inspect get_tag(), and branch when the current tag is in a small allowed set. Also state that tag_name accepts one name, not an array of alternatives." + }, + { + "location": "html-processor.md > Breadcrumbs", + "problem": "The Breadcrumbs section explains exact paths and shortest suffix matching, but it lacks an explicit 'has an ancestor anywhere above the current node' pattern. That pattern is common for containment checks and differs from a direct breadcrumb query.", + "suggestion": "Add a general containment example showing get_breadcrumbs(), removing or ignoring the current element, and checking whether an ancestor tag appears in the remaining path. Clarify that breadcrumb queries express a path pattern, while arbitrary ancestor checks should inspect get_breadcrumbs()." + }, + { + "location": "html-processor.md > class mutation / inherited output methods", + "problem": "The HTML Processor page has shorter inherited add_class() documentation than the Tag Processor page, while structural tasks often use add_class() through WP_HTML_Processor. Readers may need to jump pages to learn class preservation and output behavior.", + "suggestion": "In the HTML Processor inherited add_class() and get_updated_html() docs, cross-link or inline the key guarantees: add_class() appends without removing existing classes or duplicating the same class, and get_updated_html() returns untouched bytes unchanged after queued attribute/class edits." + } + ] + } + }, + { + "id": "T08-table-extract", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 94, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), a single depth-bounded next_token() walk, get_current_depth(), get_token_type(), get_tag(), is_tag_closer(), and get_modifiable_text(); all are documented and no _doing_it_wrong records appeared. The main adherence issue is over-applying the special-element get_modifiable_text() guidance: it would include SCRIPT/STYLE/TEXTAREA/TITLE opener text in cell output, while the ordinary subtree-text recipe says to append only #text tokens unless the caller explicitly opts into special-element contents." + }, + { + "trial_id": "trial-2", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Closest to the documented pattern and reference: correct HTML Processor choice, browser-style fragment parsing, single cursor walk, depth bound, closer-driven row/cell flushing, and decoded text via get_modifiable_text() only on #text tokens. The extra cell_depth state is unnecessary but harmless. It checks get_last_error() for unsupported-parser aborts; it does not require complete source bytes, which is reasonable for this extraction task." + }, + { + "trial_id": "trial-3", + "adherence": 91, + "hallucinated_methods": [], + "notes": "All called API methods are documented, including inherited paused_at_incomplete_token(). The structural walk is mostly idiomatic and passed all frozen cases. Deductions are for an over-broad special text-only element whitelist, which would include raw SCRIPT/STYLE and decoded TEXTAREA/TITLE contents as table cell text, and for rejecting the whole result on paused_at_incomplete_token(), even though the docs present that as a caller policy rather than a default for best-effort extraction." + } + ], + "failure_analysis": "All three trials passed all 8 frozen cases, so there were no hidden-case failures to attribute. The docs worked well on the core decision points: the Tag Processor overview says to use WP_HTML_Processor when structure, text collection, implied or missing closing tags, and browser-like parsing matter; WP_HTML_Processor::create_fragment() is clearly presented for BODY fragments; next_token() explains single-cursor token walking, implicit/virtual closers, synthesized table structure, and depth-bounded subtree walks; get_modifiable_text() explains decoded #text content, which prevented double-decoding entity text.\n\nThe near-miss was special-element text. The rendered docs include a strong ordinary subtree-text recipe saying to append only #text tokens unless another token type is explicitly desired, but the next_token() and get_modifiable_text() sections also emphasize that SCRIPT, STYLE, TITLE, and TEXTAREA carry text on opener tokens. Trial 1 and trial 3 latched onto that exception and would include those opener-token contents in table cells, diverging from the ordinary text-node policy.\n\nA second near-miss was incomplete input policy. The docs correctly explain that virtual closers make structural flushing reliable, and that paused_at_incomplete_token() should be checked when the caller must reject truncated input. Trial 3 treated that check as mandatory and would discard an otherwise extractable table for a trailing incomplete tag inside it. That is a policy misunderstanding, not an undocumented API problem.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::next_token() special-element paragraph", + "problem": "The paragraph says special elements carry text on the opener token and should be read there, but it is easy to over-apply this during ordinary text extraction despite the separate recipe warning.", + "suggestion": "Repeat the policy distinction inline: ordinary subtree text should remain #text-only; read SCRIPT/STYLE/TITLE/TEXTAREA opener text only when the caller explicitly wants those element contents, noting raw versus decoded behavior." + }, + { + "location": "WP_HTML_Processor text-extraction recipe / get_modifiable_text() docblock", + "problem": "The docs distinguish modifiable text from ordinary DOM-style text, but the distinction is spread across sections and models still treated get_modifiable_text() availability as inclusion criteria.", + "suggestion": "Add a compact decision table: token type/name, whether it is ordinary subtree text, whether get_modifiable_text() is decoded or raw, and typical inclusion policy." + }, + { + "location": "paused_at_incomplete_token() references from WP_HTML_Processor::next_token() and get_current_depth()", + "problem": "The docs say to check truncation when a result must reject incomplete input, but do not give enough contrast between best-effort extraction, strict validation, and mutation/rewrite policies.", + "suggestion": "Add examples of the three policies: best-effort extraction may return data from visited tokens; strict extraction may reject on paused_at_incomplete_token(); mutations should usually require both no truncation and null get_last_error()." + }, + { + "location": "WP_HTML_Processor table-support documentation", + "problem": "The docs mention synthesized TBODY and implied structure, which was enough here, but table insertion modes are a recurring source of mistakes for subtree walkers.", + "suggestion": "Add a general table-walking note explaining that TABLE walks may visit virtual TBODY/TR/TD-related structure and implicit closers, so code should track row/cell state from visited opener/closer tokens rather than source text or absolute depths." + } + ] + } + }, + { + "id": "T09-mark-keyword", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Passed 8/8. Correctly chose WP_HTML_Processor::create_fragment() and walked tokens with next_token(), get_token_type(), get_modifiable_text(), and serialize_token(). The extra WP_HTML_Tag_Processor template for '' is documented and safe, but less direct than serializing the matched token inside fixed wrapper markup. Small edge-policy penalty for returning raw input on create_fragment()/get_last_error() failure, which would not be normalized." + }, + { + "trial_id": "trial-2", + "adherence": 99, + "hallucinated_methods": [], + "notes": "Passed 8/8. Uses the documented, idiomatic pattern almost exactly: BODY fragment processor, #text-only token walk, decoded get_modifiable_text() matching, and accumulated serialize_token() output. WP_HTML_Processor::normalize() is documented; its use is confined to the error fallback. Minor penalty only for redundant get_modifiable_text() calls and a slightly muddy error fallback policy." + }, + { + "trial_id": "trial-3", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Passed 8/8. Correct processor choice and clean token-by-token serialization with only ordinary #text nodes checked, which handles decoded entities, comments, attributes, split text, and special text-bearing elements appropriately. Small penalty for returning raw input on parser creation/error fallback, which conflicts with a normalized-output contract if unsupported input is encountered." + } + ], + "failure_analysis": "All trials passed every hidden case, so there are no failed cases to attribute to a misconception. The docs did well on the core decision points: html-processor.md explains under processor choice/create_fragment() that BODY fragments and normalized output call for WP_HTML_Processor; next_token(), get_token_type(), and get_modifiable_text() distinguish ordinary #text from comments and special element text; get_modifiable_text() states that #text is already decoded; and serialize_token() explicitly says concatenating walked tokens reconstructs normalized serialization and can be used for rewrite loops. Those passages directly supported the entity-encoded keyword, comment, attribute, split-across-elements, unclosed-tag, and normalization cases. Near-misses were in fallback behavior: the three candidates chose different parser-error policies, and two returned raw input, suggesting the docs still leave room for confusion about normalized-output fallbacks after get_last_error() or create_fragment() returning null.", + "doc_gaps": [ + { + "location": "html-processor.md: serialize_token() and the token-by-token rewrite overview", + "problem": "The docs say callers may emit extra markup around selected tokens, but the examples do not show a minimal normalized rewrite that inserts fixed literal markup while using serialize_token() for the original token.", + "suggestion": "Add a general rewrite example showing fixed markup inserted before/after a selected token and state that the accumulated string is the normalized output; get_updated_html() is for queued edits, not for reading a token-walk rewrite." + }, + { + "location": "html-processor.md: get_last_error(), serialize_token(), and paused_at_incomplete_token guidance", + "problem": "Candidates used inconsistent fallback policies after parser errors, including returning raw input, which is not normalized.", + "suggestion": "Add a short policy note: for normalized-output functions, raw input is not a normalized fallback; unsupported parser aborts should return an explicit failure/default value or a separately defined fallback, while incomplete trailing syntax can be accepted or rejected according to caller policy." + }, + { + "location": "html-processor.md: create_fragment() return value", + "problem": "The static|null return type is documented, but the docs do not clearly enumerate when null is expected for the default BODY context or what transformation functions should return when construction fails.", + "suggestion": "Document the likely null cases and recommend a consistent handling pattern for BODY-fragment transformations that need normalized output." + }, + { + "location": "html-tag-processor.md: Building markup from a template / get_updated_html()", + "problem": "The template-building pattern is useful, but when combined with HTML Processor rewrites it can obscure that get_updated_html() preserves untouched bytes and does not normalize an arbitrary input document.", + "suggestion": "Cross-link this section to HTML Processor serialization guidance and explicitly distinguish standalone generated templates from normalized whole-fragment serialization." + } + ] + } + }, + { + "id": "T10-last-h2", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the correct `WP_HTML_Tag_Processor` for a flat class edit. Every API call is documented: constructor, `next_tag`, `set_bookmark`, `seek`, `add_class`, `release_bookmark`, and `get_updated_html`. The implementation uses the documented last-match bookmark idiom, preserves existing classes via `add_class`, returns unchanged HTML when no H2 exists, and execution passed 6/6 with no `_doing_it_wrong` records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the correct tag processor and only documented APIs, including `has_bookmark` and `release_bookmark`. It walks all `H2` tags, repeatedly moves one bookmark, seeks back to the final opener, adds the class, and returns `get_updated_html`. Handles no-match and existing-class cases idiomatically; execution passed 6/6 with no misuse records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same high-adherence pattern as trial 2: correct processor, documented APIs only, literal bookmark reused to remember the final `H2`, `seek` before `add_class`, and `get_updated_html` for output. Edge cases covered by the chosen API behavior; execution passed 6/6 with no `_doing_it_wrong` records." + } + ], + "failure_analysis": "All trials passed every frozen case: `two-headings`, `single-heading`, `no-headings-unchanged`, `many-headings`, `comment-h2-not-counted`, and `existing-class`. There are no failed hidden cases to attribute to a misconception. The docs did well in the key places: `Which processor should I use?` clearly points flat class edits to `WP_HTML_Tag_Processor`; `Finding tags` documents `next_tag( 'H2' )`; `Bookmarks` and `WP_HTML_Tag_Processor::set_bookmark()` explicitly describe re-setting one bookmark to remember the last matching token; `add_class()` documents safe class addition without manual class parsing; and `get_updated_html()` explains how to emit the edited original markup. The main near-miss is incomplete input: the docs mention `next_tag()` returning false for both no match and incomplete syntax, but the successful candidates did not need to make a clean-EOF policy decision for this task.", + "doc_gaps": [ + { + "location": "`WP_HTML_Tag_Processor::set_bookmark()` / Bookmarks recipe", + "problem": "The last-match bookmark idiom is documented, but it is not paired directly with the `next_tag()` false-result ambiguity caused by incomplete trailing syntax.", + "suggestion": "Add a cross-reference note after the bookmark-reuse recipe: after a scan ends, callers that require proof of a complete input should check `paused_at_incomplete_token()` before seeking back and applying an edit; callers that only need the last complete token may safely use the bookmark." + } + ] + } + }, + { + "id": "T11-strip-tracking-attributes", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Tag_Processor, next_tag(), get_attribute_names_with_prefix(), remove_attribute(), and get_updated_html(), all documented. The approach matches the docs' flat attribute-edit pattern and handles case-insensitive attribute names, comments, no-match attributes, and byte-preserving output correctly." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same documented Tag Processor approach as the reference. No unsupported API use or _doing_it_wrong records. Correctly relies on the prefix helper rather than manual attribute parsing or normalization." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same documented, idiomatic implementation as trial 2. It uses the right processor for a flat attribute rewrite and returns queued edits with get_updated_html()." + } + ], + "failure_analysis": "All three trials passed all 7 hidden cases, so there are no failed hidden cases to attribute to documentation failures. The docs did well in the key places: the Tag Processor overview says to use this class for flat attribute/class edits with byte-precise preservation; next_tag() documents linear walking, real-tag-only matching, comments/rawtext exclusion, and incomplete-token behavior; get_attribute_names_with_prefix() documents lowercase returned names and case-insensitive prefix matching; remove_attribute() and get_updated_html() document the edit-and-return workflow. Near miss: candidates all guarded against null from get_attribute_names_with_prefix(), which is correct after the scan ends, but the docs do not explicitly state that a matched tag with no matching attributes returns an empty array rather than null. That gap did not cause failures here.", + "doc_gaps": [ + { + "location": "/tmp/html-api-docs-eval/round-29/html-tag-processor.md#get_attribute_names_with_prefix", + "problem": "The return contract distinguishes array|null, but only the no-current-tag null case is shown. It does not explicitly state the matched-tag/no-prefix-match case returns an empty array.", + "suggestion": "Add a short return-value table: matched tag with matches returns lowercase attribute names; matched tag with no matches returns array(); no matched tag opener returns null." + }, + { + "location": "/tmp/html-api-docs-eval/round-29/html-tag-processor.md#remove_attribute", + "problem": "The method docblock does not prominently state that attribute targeting is ASCII case-insensitive, even though this matters when callers pass normalized names returned from get_attribute_names_with_prefix() to remove attributes written with different casing.", + "suggestion": "Add a sentence that remove_attribute() matches attribute names case-insensitively in HTML and can safely consume names returned by get_attribute_names_with_prefix()." + }, + { + "location": "/tmp/html-api-docs-eval/round-29/html-tag-processor.md#modifying-html-attributes-for-a-found-tag", + "problem": "The overview shows removing one known attribute, but does not show the general pattern for bulk operations over discovered attribute names.", + "suggestion": "Add a generic recipe for enumerating attribute names from a read API, applying set/remove operations to that snapshot, and returning get_updated_html(), emphasizing that callers should not parse tag text manually." + } + ] + } + }, + { + "id": "T12-unwrap-spans", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment() for a body fragment, walked all tokens with next_token(), skipped SPAN opener/closer tokens via documented get_tag(), and accumulated normalized output with serialize_token(). All called methods are present in the rendered docs; no _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Same documented token-serialization pattern as the reference. Minor adherence penalty: on create_fragment() failure or get_last_error(), it returns the original input, which may violate a normalized-rewrite contract by preserving spans and non-normalized markup. This did not affect the hidden cases." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the documented HTML Processor rewrite pattern directly: create_fragment(), next_token(), get_tag(), serialize_token(), and get_last_error(). Correctly avoids Tag Processor get_updated_html() for a structural normalized rewrite; no undocumented API usage." + } + ], + "failure_analysis": "All three trials passed all seven hidden cases. The docs did well on the key distinction for this task: the HTML Processor overview says it adds structural awareness and normalized serialization, while the Tag Processor overview warns it has no tree awareness. The HTML Processor recipe 'rewrite while serializing tokens' and serialize_token() docs directly explain appending current-token serialization, skipping tokens to remove them, and not calling normalize() afterward. The serialize_token() section also includes a general example removing wrapper element tokens while preserving contents, which appears to have led all trials to the intended approach. Near-misses: all candidates relied on get_tag() returning null for non-tag tokens rather than explicitly checking get_token_type() === '#tag'. This is supported by the get_tag() docs and the serialize_token() example, but the safer token taxonomy is somewhat split between get_tag(), get_token_name(), and get_token_type(). Trial 2 also chose an original-input fallback on parser error; the docs say to reject or fall back when get_last_error() is non-null, but they do not tie fallback choice to whether the caller promised normalized rewritten output.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::serialize_token() / rewriting-loop guidance", + "problem": "The docs say to 'reject or fall back' on get_last_error(), but do not define what fallback preserves the contract when the caller is producing normalized rewritten HTML.", + "suggestion": "Add a note distinguishing fail-closed rewrites from display filters: if the function contract requires normalized transformed output, returning the original input is not equivalent to a successful rewrite because it may keep skipped elements and unnormalized bytes." + }, + { + "location": "WP_HTML_Processor::get_tag(), get_token_name(), and get_token_type() docs", + "problem": "The relationship between tag-name checks and token-type checks is spread across separate method docs. Readers can infer that get_tag() is null on non-tags, but the preferred predicate for token-rewrite filters is not stated in one place.", + "suggestion": "Add a short cross-reference in rewrite examples: for element-only filtering, either check get_tag() for a tag name or check get_token_type() === '#tag' plus get_token_name()/get_tag(); explain when each form is clearer." + }, + { + "location": "WP_HTML_Processor::create_fragment() return value docs", + "problem": "The method signature documents static|null, but the common default-body case does not clearly say what practical conditions can produce null.", + "suggestion": "Document the null-return causes for create_fragment(), especially unsupported context or encoding, so callers can choose an appropriate failure value for their own return contract." + } + ] + } + } + ] +} diff --git a/doc-experiment/results/round-29/codex-trials-output.json b/doc-experiment/results/round-29/codex-trials-output.json new file mode 100644 index 0000000000000..f356218d301b7 --- /dev/null +++ b/doc-experiment/results/round-29/codex-trials-output.json @@ -0,0 +1,383 @@ +{ + "subject_isolation": { + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-29/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." + }, + "result": [ + { + "id": "N03-first-list-count", + "trial": 1, + "ok": true, + "code": "next_tag() ) {\n $tag = $processor->get_tag();\n if ( 'UL' === $tag || 'OL' === $tag ) {\n $found_list = true;\n break;\n }\n }\n\n if ( ! $found_list ) {\n return $html;\n }\n\n if ( ! $processor->set_bookmark( 'first-list-opener' ) ) {\n return $html;\n }\n\n $list_depth = $processor->get_current_depth();\n $direct_li_count = 0;\n $scan_completed = false;\n\n while ( $processor->next_token() ) {\n if ( $processor->get_current_depth() < $list_depth ) {\n $scan_completed = true;\n break;\n }\n\n if (\n 'LI' === $processor->get_tag() &&\n ! $processor->is_tag_closer() &&\n $processor->get_current_depth() === $list_depth + 1\n ) {\n $direct_li_count++;\n }\n }\n\n if (\n ! $scan_completed ||\n $processor->paused_at_incomplete_token() ||\n null !== $processor->get_last_error() ||\n ! $processor->seek( 'first-list-opener' )\n ) {\n $processor->release_bookmark( 'first-list-opener' );\n return $html;\n }\n\n $processor->set_attribute( 'data-item-count', (string) $direct_li_count );\n $updated_html = $processor->get_updated_html();\n $processor->release_bookmark( 'first-list-opener' );\n\n return $updated_html;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because counting only direct `LI` children requires tree awareness. It finds the first `UL` or `OL` with `next_tag()`, bookmarks that opener, walks its subtree with `next_token()` and `get_current_depth()` to count only `LI` openers at exactly one level deeper, rejects the change if the scan did not finish cleanly via `paused_at_incomplete_token()` or `get_last_error()`, then seeks back and applies `set_attribute()` before returning `get_updated_html()`.", + "confidence": 92 + }, + { + "id": "N03-first-list-count", + "trial": 2, + "ok": true, + "code": "next_tag() ) {\n if ( $processor->is_tag_closer() ) {\n continue;\n }\n\n $tag_name = $processor->get_tag();\n if ( 'UL' !== $tag_name && 'OL' !== $tag_name ) {\n continue;\n }\n\n if ( ! $processor->set_bookmark( 'first-list' ) ) {\n return $html;\n }\n\n $list_depth = $processor->get_current_depth();\n $count = 0;\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $list_depth ) {\n if (\n '#tag' === $processor->get_token_type() &&\n ! $processor->is_tag_closer() &&\n 'LI' === $processor->get_tag() &&\n $processor->get_current_depth() === $list_depth + 1\n ) {\n $count++;\n }\n }\n\n $scan_finished_cleanly =\n ! $processor->paused_at_incomplete_token() &&\n null === $processor->get_last_error();\n\n if ( ! $scan_finished_cleanly ) {\n $processor->release_bookmark( 'first-list' );\n return $html;\n }\n\n if ( ! $processor->seek( 'first-list' ) ) {\n $processor->release_bookmark( 'first-list' );\n return $html;\n }\n\n $processor->set_attribute( 'data-item-count', (string) $count );\n $processor->release_bookmark( 'first-list' );\n\n return $processor->get_updated_html();\n }\n\n return $html;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure. It scans to the first `UL` or `OL` with `next_tag()`, bookmarks that opener with `set_bookmark()`, walks its subtree with `next_token()` and `get_current_depth()` to count only direct `LI` child opening tags, verifies the scan completed cleanly with `paused_at_incomplete_token()` and `get_last_error()`, then seeks back and adds `data-item-count` via `set_attribute()` before returning `get_updated_html()`.", + "confidence": 92 + }, + { + "id": "N03-first-list-count", + "trial": 3, + "ok": true, + "code": "next_tag() ) {\n $tag = $processor->get_tag();\n if ( 'UL' === $tag || 'OL' === $tag ) {\n break;\n }\n }\n\n $tag = $processor->get_tag();\n if ( 'UL' !== $tag && 'OL' !== $tag ) {\n return $html;\n }\n\n if ( ! $processor->set_bookmark( 'first-list' ) ) {\n return $html;\n }\n\n $list_depth = $processor->get_current_depth();\n $item_count = 0;\n $scan_completed = false;\n\n while ( $processor->next_token() ) {\n if ( $processor->get_current_depth() < $list_depth ) {\n $scan_completed = true;\n break;\n }\n\n if (\n '#tag' === $processor->get_token_type() &&\n 'LI' === $processor->get_tag() &&\n ! $processor->is_tag_closer() &&\n $processor->get_current_depth() === $list_depth + 1\n ) {\n $item_count++;\n }\n }\n\n if ( ! $scan_completed || $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) {\n $processor->release_bookmark( 'first-list' );\n return $html;\n }\n\n if ( ! $processor->seek( 'first-list' ) ) {\n $processor->release_bookmark( 'first-list' );\n return $html;\n }\n\n if ( ! $processor->set_attribute( 'data-item-count', (string) $item_count ) ) {\n $processor->release_bookmark( 'first-list' );\n return $html;\n }\n\n $updated_html = $processor->get_updated_html();\n $processor->release_bookmark( 'first-list' );\n\n return $updated_html;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because counting direct `LI` children depends on document structure. It finds the first `UL` or `OL` with `next_tag()`, bookmarks that opener, walks forward with `next_token()` while comparing `get_current_depth()` to count only direct-child `LI` openers, rejects the change if the scan ends incomplete or unsupported via `paused_at_incomplete_token()` or `get_last_error()`, then seeks back, applies `set_attribute()`, and returns `get_updated_html()`.", + "confidence": 89 + }, + { + "id": "N04-normalize-or-placeholder", + "trial": 1, + "ok": true, + "code": "Unsupported HTML

            ';\n }\n\n return $normalized;\n}\n", + "explanation": "This uses the documented `WP_HTML_Processor::normalize()` API, which is specifically for normalizing HTML fragments in BODY context and returns `null` when the processor cannot normalize unsupported input. The function returns that normalized serialization when available, otherwise it returns the exact required fallback HTML.", + "confidence": 97 + }, + { + "id": "N04-normalize-or-placeholder", + "trial": 2, + "ok": true, + "code": "Unsupported HTML

            ';\n }\n\n return $normalized;\n}\n", + "explanation": "This uses the documented `WP_HTML_Processor::normalize()` API, which normalizes an HTML fragment in BODY context and returns `null` when it cannot normalize unsupported input. The function simply returns that normalized serialization when available, or the exact required fallback placeholder HTML otherwise.", + "confidence": 98 + }, + { + "id": "N04-normalize-or-placeholder", + "trial": 3, + "ok": true, + "code": "Unsupported HTML

            ';\n\t}\n\n\treturn $normalized;\n}\n", + "explanation": "This uses the documented `WP_HTML_Processor::normalize()` API, which normalizes an HTML fragment in BODY context and returns `null` when the HTML Processor cannot normalize unsupported markup. The function therefore returns the normalized serialization on success and the exact placeholder HTML on failure.", + "confidence": 98 + }, + { + "id": "N06-extract-toc", + "trial": 1, + "ok": true, + "code": "next_token() ) {\n $token_type = $processor->get_token_type();\n\n if ( '#tag' === $token_type ) {\n $tag = $processor->get_tag();\n\n if ( null === $tag ) {\n continue;\n }\n\n if ( $processor->is_tag_closer() ) {\n if ( null !== $current_heading_tag && $tag === $current_heading_tag ) {\n $current_heading_tag = null;\n $current_index = null;\n }\n\n continue;\n }\n\n switch ( $tag ) {\n case 'H1':\n case 'H2':\n case 'H3':\n case 'H4':\n case 'H5':\n case 'H6':\n $toc[] = array(\n 'level' => (int) substr( $tag, 1 ),\n 'text' => '',\n );\n\n $current_index = count( $toc ) - 1;\n $current_heading_tag = $tag;\n break;\n\n case 'SCRIPT':\n case 'STYLE':\n case 'TEXTAREA':\n case 'TITLE':\n if ( null !== $current_index ) {\n $toc[ $current_index ]['text'] .= $processor->get_modifiable_text();\n }\n break;\n }\n\n continue;\n }\n\n if ( null !== $current_index && '#text' === $token_type ) {\n $toc[ $current_index ]['text'] .= $processor->get_modifiable_text();\n }\n }\n\n return $toc;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with tree awareness, then walks the document once with `next_token()`. On each heading opener (`H1` through `H6`) it starts a TOC entry, appends heading text from `#text` tokens via `get_modifiable_text()`, and closes the active entry when the matching heading closer is reached with `is_tag_closer()` and `get_tag()`.", + "confidence": 84 + }, + { + "id": "N06-extract-toc", + "trial": 2, + "ok": true, + "code": "next_tag() ) {\n $tag = $processor->get_tag();\n\n if ( null === $tag || ! in_array( $tag, array( 'H1', 'H2', 'H3', 'H4', 'H5', 'H6' ), true ) ) {\n continue;\n }\n\n $level = (int) substr( $tag, 1 );\n $depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n continue;\n }\n\n if ( $processor->is_tag_closer() ) {\n continue;\n }\n\n $token_name = $processor->get_token_name();\n if ( in_array( $token_name, array( 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE' ), true ) ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) {\n return array();\n }\n\n $toc[] = array(\n 'level' => $level,\n 'text' => $text,\n );\n }\n\n if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) {\n return array();\n }\n\n return $toc;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because heading text extraction is tree-aware. It scans tags with `next_tag()`, filters `H1` through `H6`, then walks each heading’s subtree with `next_token()` bounded by `get_current_depth()`, appending text from `#text` tokens via `get_modifiable_text()` and also reading documented special-element text (`SCRIPT`, `STYLE`, `TEXTAREA`, `TITLE`) from their opening tokens when encountered.", + "confidence": 86 + }, + { + "id": "N06-extract-toc", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n $token_type = $processor->get_token_type();\n\n if ( '#text' === $token_type ) {\n if ( null !== $current_heading ) {\n $current_heading['text'] .= $processor->get_modifiable_text();\n }\n\n continue;\n }\n\n if ( '#tag' !== $token_type ) {\n continue;\n }\n\n $tag = $processor->get_tag();\n\n if ( null === $tag ) {\n continue;\n }\n\n if ( null !== $current_heading && ! $processor->is_tag_closer() && in_array( $tag, array( 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE' ), true ) ) {\n $current_heading['text'] .= $processor->get_modifiable_text();\n continue;\n }\n\n if ( 2 !== strlen( $tag ) || 'H' !== $tag[0] || $tag[1] < '1' || $tag[1] > '6' ) {\n continue;\n }\n\n if ( $processor->is_tag_closer() ) {\n if ( null !== $current_heading && $current_heading['tag'] === $tag ) {\n $toc[] = array(\n 'level' => $current_heading['level'],\n 'text' => $current_heading['text'],\n );\n $current_heading = null;\n }\n\n continue;\n }\n\n if ( null !== $current_heading ) {\n $toc[] = array(\n 'level' => $current_heading['level'],\n 'text' => $current_heading['text'],\n );\n }\n\n $current_heading = array(\n 'tag' => $tag,\n 'level' => (int) $tag[1],\n 'text' => '',\n );\n }\n\n if ( null !== $current_heading && null === $processor->get_last_error() ) {\n $toc[] = array(\n 'level' => $current_heading['level'],\n 'text' => $current_heading['text'],\n );\n }\n\n return $toc;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` and a single `next_token()` pass so headings are collected in document order without nested cursor loops. It tracks heading openers/closers with `get_token_type()`, `get_tag()`, and `is_tag_closer()`, appends descendant text from `#text` tokens via `get_modifiable_text()`, and also includes text carried on special text-bearing element tokens like `TEXTAREA`, `TITLE`, `SCRIPT`, and `STYLE` when they occur inside a heading.", + "confidence": 83 + }, + { + "id": "T01-add-image-class", + "trial": 1, + "ok": true, + "code": "next_tag( 'img' ) ) {\n $processor->add_class( 'wp-image' );\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat, byte-preserving attribute/class edit. The function constructs the processor with the input HTML, repeatedly finds `IMG` tags via `next_tag( 'img' )` using case-insensitive tag matching, adds `wp-image` with `add_class()`, and returns the result with `get_updated_html()`, which preserves untouched bytes and does not modify comment contents.", + "confidence": 98 + }, + { + "id": "T01-add-image-class", + "trial": 2, + "ok": true, + "code": "next_tag( 'img' ) ) {\n $processor->add_class( 'wp-image' );\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat, byte-preserving attribute/class edit. It constructs the processor with the input HTML, repeatedly finds each `IMG` tag via `next_tag( 'img' )` (case-insensitive and ignoring comment contents), adds `wp-image` with `add_class()`, and returns the result with `get_updated_html()` so untouched bytes remain unchanged.", + "confidence": 98 + }, + { + "id": "T01-add-image-class", + "trial": 3, + "ok": true, + "code": "next_tag( 'img' ) ) {\n $processor->add_class( 'wp-image' );\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat, byte-preserving attribute/class edit. The function scans forward with `next_tag( 'img' )`, which matches `IMG` case-insensitively and ignores tag-like text inside comments or special raw-text regions, then calls `add_class( 'wp-image' )` on each matched image and returns the result with `get_updated_html()`.", + "confidence": 98 + }, + { + "id": "T02-link-targets", + "trial": 1, + "ok": true, + "code": "next_tag( 'A' ) ) {\n if ( null === $processor->get_attribute( 'href' ) ) {\n continue;\n }\n\n $processor->set_attribute( 'target', '_blank' );\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat, byte-preserving attribute edit. It scans only `A` tags with `next_tag( 'A' )`, checks `get_attribute( 'href' ) !== null` so empty-string and boolean `href` forms still count as present, then overwrites or creates `target` with `set_attribute( 'target', '_blank' )` and returns the result via `get_updated_html()`.", + "confidence": 98 + }, + { + "id": "T02-link-targets", + "trial": 2, + "ok": true, + "code": "next_tag( 'A' ) ) {\n if ( null !== $processor->get_attribute( 'href' ) ) {\n $processor->set_attribute( 'target', '_blank' );\n }\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat, byte-preserving attribute edit: it scans only `A` tag openers with `next_tag( 'A' )`, checks `get_attribute( 'href' )` so empty-string and boolean `href` still count as present, then overwrites or adds `target` via `set_attribute( 'target', '_blank' )`, and finally returns the result with `get_updated_html()`.", + "confidence": 96 + }, + { + "id": "T02-link-targets", + "trial": 3, + "ok": true, + "code": "next_tag( 'A' ) ) {\n if ( null !== $processor->get_attribute( 'href' ) ) {\n $processor->set_attribute( 'target', '_blank' );\n }\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` for byte-precise attribute edits. It scans only `A` tag openers with `next_tag( 'A' )`, checks `get_attribute( 'href' )` so `href=\"\"` and boolean `href` both count as present because only a missing attribute returns `null`, then overwrites or adds `target` via `set_attribute()`, and finally returns the modified document with `get_updated_html()`.", + "confidence": 96 + }, + { + "id": "T03-first-h1-text", + "trial": 1, + "ok": true, + "code": "next_tag( 'H1' ) ) {\n return null;\n }\n\n $depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation, then finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, and walks its subtree with `next_token()`, appending only `#text` tokens via `get_modifiable_text()`. That follows the documented HTML Processor recipe for collecting ordinary subtree text, which also gives decoded character references automatically.", + "confidence": 95 + }, + { + "id": "T03-first-h1-text", + "trial": 2, + "ok": true, + "code": "next_tag( 'H1' ) ) {\n return null;\n }\n\n $depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation. It finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, then walks that subtree with `next_token()`, concatenating only `#text` tokens via `get_token_type()` and `get_modifiable_text()`, which provides decoded text content.", + "confidence": 95 + }, + { + "id": "T03-first-h1-text", + "trial": 3, + "ok": true, + "code": "next_tag( 'H1' ) ) {\n return null;\n }\n\n $h1_depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) {\n $token_type = $processor->get_token_type();\n\n if ( '#text' === $token_type ) {\n $text .= $processor->get_modifiable_text();\n continue;\n }\n\n if ( '#tag' !== $token_type || $processor->is_tag_closer() ) {\n continue;\n }\n\n $token_name = $processor->get_token_name();\n if (\n 'SCRIPT' === $token_name ||\n 'STYLE' === $token_name ||\n 'TEXTAREA' === $token_name ||\n 'TITLE' === $token_name\n ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation. It finds the first `H1` with `next_tag()`, records that element’s depth with `get_current_depth()`, then walks its subtree with `next_token()`, appending decoded text from `#text` tokens via `get_modifiable_text()`. To include text carried on special-element opener tokens, it also appends `get_modifiable_text()` for nested `SCRIPT`, `STYLE`, `TEXTAREA`, and `TITLE` opening tags.", + "confidence": 84 + }, + { + "id": "T04-build-figure", + "trial": 1, + "ok": true, + "code": "\"\"
            .
            ' );\n\n $processor->next_tag( 'img' );\n $processor->set_attribute( 'src', $url );\n $processor->set_attribute( 'alt', $alt );\n\n while ( $processor->next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $processor->set_modifiable_text( $caption );\n break;\n }\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This builds the fragment from a fixed HTML template with `WP_HTML_Tag_Processor`, which the documentation recommends when exact markup shape and attribute order matter. It updates the existing `src` and `alt` attributes with `set_attribute()` so the API handles HTML encoding while preserving their template order, then walks tokens with `next_token()` until the `figcaption` placeholder text node and replaces it with `set_modifiable_text()`, finally returning the result with `get_updated_html()`.", + "confidence": 96 + }, + { + "id": "T04-build-figure", + "trial": 2, + "ok": true, + "code": "\"\"
            .
            ' );\n\n if ( $processor->next_tag( 'img' ) ) {\n $processor->set_attribute( 'src', $url );\n $processor->set_attribute( 'alt', $alt );\n }\n\n while ( $processor->next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $processor->set_modifiable_text( $caption );\n break;\n }\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` with a fixed literal template so the output shape and `img` attribute order stay exact. It fills `src` and `alt` via `next_tag()` and `set_attribute()`, then replaces the placeholder text inside `figcaption` by scanning tokens with `next_token()` until the `#text` node and calling `set_modifiable_text()`, relying on the HTML API to perform all required escaping.", + "confidence": 96 + }, + { + "id": "T04-build-figure", + "trial": 3, + "ok": true, + "code": "\"\"
            .
            ' );\n\n $processor->next_tag( 'img' );\n $processor->set_attribute( 'src', $url );\n $processor->set_attribute( 'alt', $alt );\n\n while ( $processor->next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $processor->set_modifiable_text( $caption );\n break;\n }\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` with a fixed template so the output shape and attribute order stay exact. It updates the existing `img` attributes via `next_tag()` and `set_attribute()`, then walks tokens with `next_token()` until the `figcaption` placeholder text node and replaces it with `set_modifiable_text()`, relying on the HTML API to perform all required escaping.", + "confidence": 94 + }, + { + "id": "T05-text-excerpt", + "trial": 1, + "ok": true, + "code": "next_token() ) {\n $chunk = '';\n $type = $processor->get_token_type();\n\n if ( '#text' === $type ) {\n $chunk = $processor->get_modifiable_text();\n } elseif ( '#tag' === $type && ! $processor->is_tag_closer() ) {\n $name = $processor->get_token_name();\n if ( 'TEXTAREA' === $name || 'TITLE' === $name ) {\n $chunk = $processor->get_modifiable_text();\n }\n }\n\n if ( '' === $chunk ) {\n continue;\n }\n\n $chunk_length = mb_strlen( $chunk, 'UTF-8' );\n if ( $count + $chunk_length <= $max_codepoints ) {\n $text .= $chunk;\n $count += $chunk_length;\n continue;\n }\n\n $text .= mb_substr( $chunk, 0, $max_codepoints - $count, 'UTF-8' );\n return $text;\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment with structure awareness, then walks tokens with `next_token()`, appending only ordinary `#text` tokens plus the decoded opener text of `TITLE` and `TEXTAREA` via `get_modifiable_text()`, while excluding other modifiable-text carriers like `SCRIPT`, `STYLE`, and comments. Truncation is done incrementally with `mb_strlen()` and `mb_substr()` using explicit `UTF-8` so the result is limited by Unicode code points without splitting a multi-byte character.", + "confidence": 88 + }, + { + "id": "T05-text-excerpt", + "trial": 2, + "ok": true, + "code": "next_token() ) {\n\t\t$text = '';\n\n\t\tif ( '#text' === $processor->get_token_type() ) {\n\t\t\t$text = $processor->get_modifiable_text();\n\t\t} elseif ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) {\n\t\t\t$token_name = $processor->get_token_name();\n\t\t\tif ( 'TITLE' === $token_name || 'TEXTAREA' === $token_name ) {\n\t\t\t\t$text = $processor->get_modifiable_text();\n\t\t\t}\n\t\t}\n\n\t\tif ( '' === $text ) {\n\t\t\tcontinue;\n\t\t}\n\n\t\t$matched = preg_match_all( '/./us', $text, $codepoints );\n\t\tif ( false === $matched || 0 === $matched ) {\n\t\t\tcontinue;\n\t\t}\n\n\t\tif ( $matched <= $remaining ) {\n\t\t\t$excerpt .= $text;\n\t\t\t$remaining -= $matched;\n\t\t\tcontinue;\n\t\t}\n\n\t\t$excerpt .= implode( '', array_slice( $codepoints[0], 0, $remaining ) );\n\t\tbreak;\n\t}\n\n\treturn $excerpt;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, then walks tokens with `next_token()`, appending only ordinary `#text` tokens plus the documented special text carried on opening `TITLE` and `TEXTAREA` tokens via `get_modifiable_text()`, while excluding other special elements such as `SCRIPT` and `STYLE`. Truncation is done by UTF-8 code points with `preg_match_all('/./us', ...)` so multi-byte characters are never split.", + "confidence": 91 + }, + { + "id": "T05-text-excerpt", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n continue;\n }\n\n if ( '#tag' !== $processor->get_token_type() || $processor->is_tag_closer() ) {\n continue;\n }\n\n $tag_name = $processor->get_token_name();\n if ( 'TITLE' === $tag_name || 'TEXTAREA' === $tag_name ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n if ( null !== $processor->get_last_error() ) {\n return '';\n }\n\n return mb_substr( $text, 0, $max_codepoints, 'UTF-8' );\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, then walks tokens with `next_token()`, appending only ordinary `#text` token content plus the decoded opener-carried text for `TITLE` and `TEXTAREA` via `get_modifiable_text()`, while naturally excluding `SCRIPT` and `STYLE`. The final string is truncated by Unicode code points with `mb_substr(..., 'UTF-8')`.", + "confidence": 84 + }, + { + "id": "T06-collect-links", + "trial": 1, + "ok": true, + "code": "next_tag( 'A' ) ) {\n $href = $processor->get_attribute( 'href' );\n $depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n if ( is_string( $href ) ) {\n $links[] = array(\n 'href' => $href,\n 'text' => $text,\n );\n }\n }\n\n return $links;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because link text extraction is tree-aware work. It scans `A` openers with `next_tag('A')`, reads `href` with `get_attribute()` and keeps only string-valued attributes, then walks that anchor’s subtree with `next_token()` bounded by `get_current_depth()`, concatenating only `#text` token contents via `get_modifiable_text()` so markup contributes nothing and character references are already decoded.", + "confidence": 90 + }, + { + "id": "T06-collect-links", + "trial": 2, + "ok": true, + "code": "next_token() ) {\n if ( 'A' === $processor->get_token_name() ) {\n if ( $processor->is_tag_closer() ) {\n if ( null !== $current_link ) {\n $links[] = $current_link;\n $current_link = null;\n }\n continue;\n }\n\n $href = $processor->get_attribute( 'href' );\n if ( is_string( $href ) ) {\n $current_link = array(\n 'href' => $href,\n 'text' => '',\n );\n } else {\n $current_link = null;\n }\n\n continue;\n }\n\n if ( null !== $current_link && '#text' === $processor->get_token_type() ) {\n $current_link['text'] .= $processor->get_modifiable_text();\n }\n }\n\n if ( null !== $current_link ) {\n $links[] = $current_link;\n }\n\n return $links;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation, then performs a single `next_token()` walk as recommended by the docs. On each `A` opener it reads `href` with `get_attribute()` and keeps only string-valued attributes, accumulates decoded text from `#text` tokens via `get_modifiable_text()`, and finalizes each link when the matching `A` closer is visited.", + "confidence": 90 + }, + { + "id": "T06-collect-links", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n if ( 'A' === $processor->get_token_name() ) {\n if ( $processor->is_tag_closer() ) {\n array_pop( $open_links );\n } else {\n $href = $processor->get_attribute( 'href' );\n\n if ( is_string( $href ) ) {\n $links[] = array(\n 'href' => $href,\n 'text' => '',\n );\n $open_links[] = count( $links ) - 1;\n } else {\n $open_links[] = null;\n }\n }\n\n continue;\n }\n\n if ( '#text' !== $processor->get_token_type() ) {\n continue;\n }\n\n for ( $i = count( $open_links ) - 1; $i >= 0; $i-- ) {\n if ( null !== $open_links[ $i ] ) {\n $links[ $open_links[ $i ] ]['text'] .= $processor->get_modifiable_text();\n break;\n }\n }\n }\n\n return $links;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is tree-aware, then walks the fragment with `next_token()`. On each `A` opener it reads `href` with `get_attribute()` and includes only string-valued attributes (excluding missing and boolean `href`), and while inside a collected link it concatenates decoded `#text` token content from `get_modifiable_text()` until the corresponding closer is reached.", + "confidence": 89 + }, + { + "id": "T07-nested-lists", + "trial": 1, + "ok": true, + "code": "next_tag() ) {\n $tag = $processor->get_tag();\n if ( 'UL' !== $tag && 'OL' !== $tag ) {\n continue;\n }\n\n $breadcrumbs = $processor->get_breadcrumbs();\n array_pop( $breadcrumbs );\n\n foreach ( $breadcrumbs as $ancestor ) {\n if ( 'UL' === $ancestor || 'OL' === $ancestor ) {\n $processor->add_class( 'nested-list' );\n break;\n }\n }\n }\n\n if ( null !== $processor->get_last_error() ) {\n return $html;\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure. It walks opening tags with `next_tag()`, filters to `UL` and `OL` via `get_tag()`, inspects ancestors with `get_breadcrumbs()`, and calls `add_class()` only when an earlier breadcrumb is also `UL` or `OL`. The result is returned with `get_updated_html()`, and the original HTML is returned unchanged if the processor reports unsupported markup via `get_last_error()`.", + "confidence": 90 + }, + { + "id": "T07-nested-lists", + "trial": 2, + "ok": true, + "code": "next_tag( array( 'tag_name' => 'UL' ) ) ) {\n $breadcrumbs = $processor->get_breadcrumbs();\n array_pop( $breadcrumbs );\n\n if ( in_array( 'UL', $breadcrumbs, true ) || in_array( 'OL', $breadcrumbs, true ) ) {\n $processor->add_class( 'nested-list' );\n }\n }\n\n if ( null !== $processor->get_last_error() ) {\n return $html;\n }\n\n while ( $processor->next_tag( array( 'tag_name' => 'OL' ) ) ) {\n $breadcrumbs = $processor->get_breadcrumbs();\n array_pop( $breadcrumbs );\n\n if ( in_array( 'UL', $breadcrumbs, true ) || in_array( 'OL', $breadcrumbs, true ) ) {\n $processor->add_class( 'nested-list' );\n }\n }\n\n if ( null !== $processor->get_last_error() ) {\n return $html;\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure. It scans `UL` and `OL` openers with `next_tag()`, inspects their ancestor chain via `get_breadcrumbs()`, and when any ancestor is a `UL` or `OL` it adds the `nested-list` class with `add_class()`. The final output comes from `get_updated_html()`, and it falls back to the original HTML if the processor reports unsupported markup with `get_last_error()`.", + "confidence": 67 + }, + { + "id": "T07-nested-lists", + "trial": 3, + "ok": true, + "code": "next_tag() ) {\n $tag = $processor->get_tag();\n\n if ( 'UL' !== $tag && 'OL' !== $tag ) {\n continue;\n }\n\n $breadcrumbs = $processor->get_breadcrumbs();\n array_pop( $breadcrumbs );\n\n foreach ( $breadcrumbs as $ancestor_tag ) {\n if ( 'UL' === $ancestor_tag || 'OL' === $ancestor_tag ) {\n $processor->add_class( 'nested-list' );\n break;\n }\n }\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure. It scans tag openers with `next_tag()`, filters to `UL` and `OL` via `get_tag()`, checks their ancestor chain with `get_breadcrumbs()`, and adds the `nested-list` class with `add_class()` whenever an ancestor list is present. The final HTML comes from `get_updated_html()`, which preserves untouched bytes exactly.", + "confidence": 91 + }, + { + "id": "T08-table-extract", + "trial": 1, + "ok": true, + "code": "next_tag( 'TABLE' ) ) {\n return array();\n }\n\n $table_depth = $processor->get_current_depth();\n $rows = array();\n $current_row = null;\n $current_cell = null;\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) {\n $token_type = $processor->get_token_type();\n\n if ( '#text' === $token_type ) {\n if ( null !== $current_cell ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n continue;\n }\n\n if ( '#tag' !== $token_type ) {\n continue;\n }\n\n $tag = $processor->get_tag();\n if ( null === $tag ) {\n continue;\n }\n\n if ( $processor->is_tag_closer() ) {\n if ( ( 'TD' === $tag || 'TH' === $tag ) && null !== $current_cell && null !== $current_row ) {\n $current_row[] = $current_cell;\n $current_cell = null;\n continue;\n }\n\n if ( 'TR' === $tag && null !== $current_row ) {\n $rows[] = $current_row;\n $current_row = null;\n }\n\n continue;\n }\n\n if ( 'TR' === $tag ) {\n $current_row = array();\n continue;\n }\n\n if ( 'TD' === $tag || 'TH' === $tag ) {\n if ( null === $current_row ) {\n $current_row = array();\n }\n\n $current_cell = '';\n continue;\n }\n\n if (\n null !== $current_cell &&\n ( 'SCRIPT' === $tag || 'STYLE' === $tag || 'TEXTAREA' === $tag || 'TITLE' === $tag )\n ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n }\n\n if ( null !== $current_cell && null !== $current_row ) {\n $current_row[] = $current_cell;\n }\n\n if ( null !== $current_row ) {\n $rows[] = $current_row;\n }\n\n return $rows;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because table extraction is structure-aware. It finds the first `TABLE` with `next_tag()`, then performs a single depth-bounded `next_token()` walk through that table subtree, opening and closing rows/cells based on `get_tag()` and `is_tag_closer()`, and concatenating decoded text from `#text` tokens via `get_modifiable_text()`; for special text-only elements whose content lives on the opener token, it also appends their documented modifiable text.", + "confidence": 87 + }, + { + "id": "T08-table-extract", + "trial": 2, + "ok": true, + "code": "next_tag( 'TABLE' ) ) {\n return array();\n }\n\n $table_depth = $processor->get_current_depth();\n $rows = array();\n $current_row = null;\n $current_cell = null;\n $cell_depth = null;\n\n while ( $processor->next_token() ) {\n $depth = $processor->get_current_depth();\n if ( $depth < $table_depth ) {\n break;\n }\n\n $token_type = $processor->get_token_type();\n $token_name = $processor->get_token_name();\n\n if ( '#text' === $token_type ) {\n if ( null !== $current_cell ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n continue;\n }\n\n if ( '#tag' !== $token_type || null === $token_name ) {\n continue;\n }\n\n if ( 'TR' === $token_name ) {\n if ( $processor->is_tag_closer() ) {\n if ( null !== $current_row ) {\n $rows[] = $current_row;\n $current_row = null;\n }\n } else {\n $current_row = array();\n }\n continue;\n }\n\n if ( 'TD' === $token_name || 'TH' === $token_name ) {\n if ( $processor->is_tag_closer() ) {\n if ( null !== $current_row && null !== $current_cell ) {\n $current_row[] = $current_cell;\n }\n $current_cell = null;\n $cell_depth = null;\n } else {\n if ( null === $current_row ) {\n $current_row = array();\n }\n $current_cell = '';\n $cell_depth = $depth;\n }\n continue;\n }\n\n if ( null !== $current_cell && $depth < $cell_depth ) {\n if ( null !== $current_row ) {\n $current_row[] = $current_cell;\n }\n $current_cell = null;\n $cell_depth = null;\n }\n }\n\n if ( null !== $current_cell && null !== $current_row ) {\n $current_row[] = $current_cell;\n }\n\n if ( null !== $current_row ) {\n $rows[] = $current_row;\n }\n\n if ( null !== $processor->get_last_error() ) {\n return array();\n }\n\n return $rows;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with HTML5 table semantics, finds the first `TABLE` via `next_tag()`, then does a single `next_token()` walk bounded by the table depth from `get_current_depth()`. It tracks `TR` and `TD`/`TH` openers and closers, and concatenates only `#text` token contents with `get_modifiable_text()`, which already returns decoded character references.", + "confidence": 78 + }, + { + "id": "T08-table-extract", + "trial": 3, + "ok": true, + "code": "next_tag( 'TABLE' ) ) {\n return array();\n }\n\n $table_depth = $processor->get_current_depth();\n $rows = array();\n $current_row = null;\n $current_cell = null;\n\n $text_only_elements = array(\n 'IFRAME',\n 'NOEMBED',\n 'NOFRAME',\n 'NOFRAMES',\n 'NOSCRIPT',\n 'SCRIPT',\n 'STYLE',\n 'TEXTAREA',\n 'TITLE',\n 'XMP',\n );\n\n while ( $processor->next_token() ) {\n if ( $processor->get_current_depth() < $table_depth ) {\n break;\n }\n\n if ( '#text' === $processor->get_token_type() ) {\n if ( null !== $current_cell ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n continue;\n }\n\n if ( '#tag' !== $processor->get_token_type() ) {\n continue;\n }\n\n $tag = $processor->get_tag();\n\n if ( $processor->is_tag_closer() ) {\n if ( ( 'TD' === $tag || 'TH' === $tag ) && null !== $current_cell && null !== $current_row ) {\n $current_row[] = $current_cell;\n $current_cell = null;\n } elseif ( 'TR' === $tag && null !== $current_row ) {\n $rows[] = $current_row;\n $current_row = null;\n }\n\n continue;\n }\n\n if ( 'TR' === $tag ) {\n $current_row = array();\n continue;\n }\n\n if ( ( 'TD' === $tag || 'TH' === $tag ) && null !== $current_row ) {\n $current_cell = '';\n continue;\n }\n\n if ( null !== $current_cell && in_array( $tag, $text_only_elements, true ) ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n }\n\n if ( null !== $processor->get_last_error() || $processor->paused_at_incomplete_token() ) {\n return array();\n }\n\n return $rows;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the fragment with browser-style table handling, `next_tag()` to find the first `TABLE`, and then a bounded `next_token()` walk using `get_current_depth()` so only that table subtree is visited. Row and cell boundaries come from `TR`/`TD`/`TH` tag openers and closers, ordinary text is collected from `#text` tokens via `get_modifiable_text()`, and text-only element contents are appended from their opener tokens because the HTML Processor carries those contents on the element token rather than separate text nodes.", + "confidence": 84 + }, + { + "id": "T09-mark-keyword", + "trial": 1, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text = $processor->get_modifiable_text();\n\n if ( str_contains( $text, $keyword ) ) {\n $wrapper = new WP_HTML_Tag_Processor( '.' );\n while ( $wrapper->next_token() ) {\n if ( '#text' === $wrapper->get_token_type() ) {\n $wrapper->set_modifiable_text( $text );\n break;\n }\n }\n\n $output .= $wrapper->get_updated_html();\n continue;\n }\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return $html;\n }\n\n return $output;\n}\n", + "explanation": "This walks the fragment with `WP_HTML_Processor::create_fragment()` and `next_token()`, rewrites only ordinary `#text` tokens whose decoded `get_modifiable_text()` contains the case-sensitive keyword, and otherwise appends each token’s normalized `serialize_token()` output. Matching text nodes are replaced by building a normalized `` template with `WP_HTML_Tag_Processor`, updating its text via `set_modifiable_text()`, and reading it back with `get_updated_html()` so the wrapped text is safely re-encoded.", + "confidence": 85 + }, + { + "id": "T09-mark-keyword", + "trial": 2, + "ok": true, + "code": "next_token() ) {\n if (\n '#text' === $processor->get_token_type() &&\n '' !== $processor->get_modifiable_text() &&\n false !== strpos( $processor->get_modifiable_text(), $keyword )\n ) {\n $output .= '' . $processor->serialize_token() . '';\n continue;\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n $normalized = WP_HTML_Processor::normalize( $html );\n return null === $normalized ? '' : $normalized;\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-context fragment structurally, walks every token with `next_token()`, checks only ordinary `#text` tokens via `get_token_type()` and decoded text from `get_modifiable_text()`, and rebuilds normalized HTML with `serialize_token()`, inserting `...` around any matching text token. Special text-bearing elements like `script`, `style`, `title`, and `textarea` are naturally skipped because their contents are not exposed as `#text` child tokens in the HTML Processor.", + "confidence": 86 + }, + { + "id": "T09-mark-keyword", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text = $processor->get_modifiable_text();\n\n if ( false !== strpos( $text, $keyword ) ) {\n $output .= '' . $processor->serialize_token() . '';\n continue;\n }\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return $html;\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-context fragment, then walks every token with `next_token()`. For ordinary `#text` tokens only, it checks the decoded text via `get_modifiable_text()` and, when the case-sensitive keyword is present, emits a `` wrapper around that token’s normalized serialization from `serialize_token()`. All other tokens are emitted unchanged with `serialize_token()`, which preserves the processor’s normalized output behavior and naturally excludes special text-bearing elements like `script`, `style`, `title`, and `textarea` because their contents are not exposed as ordinary `#text` tokens.", + "confidence": 86 + }, + { + "id": "T10-last-h2", + "trial": 1, + "ok": true, + "code": "next_tag( 'H2' ) ) {\n $processor->set_bookmark( 'last-h2' );\n $found_h2 = true;\n }\n\n if ( ! $found_h2 ) {\n return $html;\n }\n\n if ( $processor->seek( 'last-h2' ) ) {\n $processor->add_class( 'final-section' );\n }\n\n $processor->release_bookmark( 'last-h2' );\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat, position-based scan. It walks forward with `next_tag( 'H2' )`, reuses a single bookmark via `set_bookmark()` to remember the most recent `H2`, then `seek()`s back to that bookmarked opener, adds the `final-section` class with `add_class()`, and returns the modified markup with `get_updated_html()`.", + "confidence": 96 + }, + { + "id": "T10-last-h2", + "trial": 2, + "ok": true, + "code": "next_tag( 'H2' ) ) {\n $processor->set_bookmark( 'last-h2' );\n }\n\n if ( ! $processor->has_bookmark( 'last-h2' ) ) {\n return $html;\n }\n\n if ( $processor->seek( 'last-h2' ) ) {\n $processor->add_class( 'final-section' );\n }\n\n $processor->release_bookmark( 'last-h2' );\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat, position-based edit. It scans forward with `next_tag( 'H2' )`, repeatedly moves the same bookmark with `set_bookmark()` to remember the last matched `H2`, then `seek()`s back to that bookmark, applies `add_class( 'final-section' )`, and returns the result with `get_updated_html()`.", + "confidence": 95 + }, + { + "id": "T10-last-h2", + "trial": 3, + "ok": true, + "code": "next_tag( 'H2' ) ) {\n $processor->set_bookmark( 'last-h2' );\n }\n\n if ( ! $processor->has_bookmark( 'last-h2' ) ) {\n return $html;\n }\n\n if ( $processor->seek( 'last-h2' ) ) {\n $processor->add_class( 'final-section' );\n }\n\n $processor->release_bookmark( 'last-h2' );\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` for a single linear scan, re-setting one bookmark on each matched `H2` so the bookmark ends up at the last `H2` opener. It then `seek()`s back to that bookmarked tag, adds the `final-section` class with `add_class()`, and returns the edited markup via `get_updated_html()`.", + "confidence": 96 + }, + { + "id": "T11-strip-tracking-attributes", + "trial": 1, + "ok": true, + "code": "next_tag() ) {\n $attribute_names = $processor->get_attribute_names_with_prefix( 'data-track-' );\n\n if ( null === $attribute_names ) {\n continue;\n }\n\n foreach ( $attribute_names as $attribute_name ) {\n $processor->remove_attribute( $attribute_name );\n }\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat attribute edit on individual tag openers. It scans each tag with `next_tag()`, gets matching attribute names via `get_attribute_names_with_prefix( 'data-track-' )`, removes each one with `remove_attribute()`, and returns the edited markup with `get_updated_html()`.", + "confidence": 96 + }, + { + "id": "T11-strip-tracking-attributes", + "trial": 2, + "ok": true, + "code": "next_tag() ) {\n $attributes = $processor->get_attribute_names_with_prefix( 'data-track-' );\n\n if ( null === $attributes ) {\n continue;\n }\n\n foreach ( $attributes as $attribute_name ) {\n $processor->remove_attribute( $attribute_name );\n }\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat attribute-rewrite pass over every tag opener with `next_tag()`. For each matched tag, it calls `get_attribute_names_with_prefix( 'data-track-' )` to find exactly the lowercase attribute names that start with that prefix, removes each one with `remove_attribute()`, and returns the edited markup with `get_updated_html()`.", + "confidence": 98 + }, + { + "id": "T11-strip-tracking-attributes", + "trial": 3, + "ok": true, + "code": "next_tag() ) {\n $attributes = $processor->get_attribute_names_with_prefix( 'data-track-' );\n\n if ( null === $attributes ) {\n continue;\n }\n\n foreach ( $attributes as $attribute_name ) {\n $processor->remove_attribute( $attribute_name );\n }\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat attribute rewrite on individual tag openers. The function scans every tag with `next_tag()`, gets all matching attribute names via `get_attribute_names_with_prefix( 'data-track-' )`, removes each one with `remove_attribute()`, and returns the modified markup with `get_updated_html()`.", + "confidence": 98 + }, + { + "id": "T12-unwrap-spans", + "trial": 1, + "ok": true, + "code": "next_token() ) {\n if ( 'SPAN' === $processor->get_tag() ) {\n continue;\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return '';\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-fragment HTML, then walks every token with `next_token()` and rebuilds normalized output with `serialize_token()`. Any token whose tag name is `SPAN` is skipped, which removes both opening and closing span tags while preserving their contents, including nested spans; `get_last_error()` is checked so unsupported markup does not return a partial rewrite.", + "confidence": 90 + }, + { + "id": "T12-unwrap-spans", + "trial": 2, + "ok": true, + "code": "next_token() ) {\n if ( 'SPAN' === $processor->get_tag() ) {\n continue;\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return $html;\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body-fragment HTML structurally, then walks every token with `next_token()`. For normalized output, it rebuilds the fragment token-by-token with `serialize_token()`, skipping any token whose tag is `SPAN`, which removes both span openers and closers while preserving their contents and normalizing the rest of the HTML.", + "confidence": 86 + }, + { + "id": "T12-unwrap-spans", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n if ( 'SPAN' === $processor->get_tag() ) {\n continue;\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return '';\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, then rewrites it token by token with `next_token()` and `serialize_token()`, skipping every `SPAN` token so both opening and closing span wrappers disappear while their contents remain. Using HTML Processor serialization produces the required normalized HTML output.", + "confidence": 87 + } + ] +} diff --git a/doc-experiment/results/round-29/round-metadata.json b/doc-experiment/results/round-29/round-metadata.json new file mode 100644 index 0000000000000..3605858b4cdf6 --- /dev/null +++ b/doc-experiment/results/round-29/round-metadata.json @@ -0,0 +1,333 @@ +{ + "round": "round-29", + "mode": "scored-train", + "task_ids": [ + "N03-first-list-count", + "N04-normalize-or-placeholder", + "N06-extract-toc", + "T01-add-image-class", + "T02-link-targets", + "T03-first-h1-text", + "T04-build-figure", + "T05-text-excerpt", + "T06-collect-links", + "T07-nested-lists", + "T08-table-extract", + "T09-mark-keyword", + "T10-last-h2", + "T11-strip-tracking-attributes", + "T12-unwrap-spans" + ], + "task_count": 15, + "splits": { + "train": 15 + }, + "concepts": { + "attributes": 3, + "classes": 1, + "normalization": 1, + "serialization": 2, + "text": 3, + "traversal": 5 + }, + "trials_per_task": 3, + "subject": { + "model": "gpt-5.4", + "reasoning_effort": "medium", + "service_tier": "priority" + }, + "judge": { + "model": "gpt-5.5", + "reasoning_effort": "xhigh", + "service_tier": "priority" + }, + "git_head": "95173a4486717c852b3e9cc69cb6c4ff227854ec", + "git_status_short": "", + "source_file_digests": { + "ref": "95173a4486717c852b3e9cc69cb6c4ff227854ec", + "algorithm": "sha256", + "php_behavior_fingerprint": "token_get_all with T_COMMENT, T_DOC_COMMENT, and T_WHITESPACE removed; JSON-encoded token names and text", + "files": { + "src/wp-includes/html-api/class-wp-html-tag-processor.php": { + "source_sha256": "9e431d345cc6d5563a65e3456b8a944fe1def8ceb8db299e89f20e8e9f560058", + "php_without_comments_sha256": "d073f5772a64826878e4132ed9bb30717fb1b6f3ddb5e23a014a4fa9db218ad7", + "php_without_comments_token_count": 9881 + }, + "src/wp-includes/html-api/class-wp-html-processor.php": { + "source_sha256": "a8d7ce78fc9dd5548b6012747db1deed5da67b4facd12feb1b4a50b4365041b7", + "php_without_comments_sha256": "2da9aa482f295c89a7018c5b74b622b4add7e942599c97785434408cfa72c083", + "php_without_comments_token_count": 16806 + } + } + }, + "corpus_file_digests": { + "ref": "95173a4486717c852b3e9cc69cb6c4ff227854ec", + "algorithm": "sha256", + "tasks": { + "N03-first-list-count": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/N03-first-list-count/task.md": "a928648fe82d23863b2e1f8f0a1654ef97e6cb8fb579f3d37fdc0712caaa0082", + "doc-experiment/corpus/N03-first-list-count/reference.php": "588468ac5c4009fd1a1a27a530efa208e1ce08025d34a0e4353ea5b976d705ba", + "doc-experiment/corpus/N03-first-list-count/tests.json": "e0467ccc5475d2c87cc11eb54b4f2cb40fe4d44ef82c761f745d1cca91101314" + } + }, + "N04-normalize-or-placeholder": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "normalization", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/N04-normalize-or-placeholder/task.md": "cef7cd04d639b6b7b435be2d33f3435b8b76f0a4d83748aa7c232204746ad3d0", + "doc-experiment/corpus/N04-normalize-or-placeholder/reference.php": "e47feb6d3be887e1ee5df77e39160ffe812cb8322ca8600898de2a34e56ddeed", + "doc-experiment/corpus/N04-normalize-or-placeholder/tests.json": "417d2f770b341fc46bb8b7b46df899f1fd1e65d343aa8cc6c4e39307afcb4d18" + } + }, + "N06-extract-toc": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/N06-extract-toc/task.md": "ba341a3927bb81410b8662b29f479f23a0887d4ec6f19db95702c1346d670581", + "doc-experiment/corpus/N06-extract-toc/reference.php": "7e5cf569519dc0411da5659e473fc773d82eb634886b5623610b696936c0b2e2", + "doc-experiment/corpus/N06-extract-toc/tests.json": "22f8950aace5d22a842a8c76df372d5d09e770cf4adefb06099eb81f7410317e" + } + }, + "T01-add-image-class": { + "labels": { + "split": "train", + "role": "smoke", + "commonness": "high", + "concept": "classes", + "processor": "tag" + }, + "files": { + "doc-experiment/corpus/T01-add-image-class/task.md": "d3b65e68db2bf26d3423b5474c3adb49b592375afc8005c127d4d20eb2740e28", + "doc-experiment/corpus/T01-add-image-class/reference.php": "8c9cf2ac9194dfce36a45c2f987d7bbfcdb334020aa94a86fb388bb1ce28171f", + "doc-experiment/corpus/T01-add-image-class/tests.json": "0ab7c7527432604d31e5e8d57e810a0167730a111c018d8a05a1bbad24617787" + } + }, + "T02-link-targets": { + "labels": { + "split": "train", + "role": "smoke", + "commonness": "high", + "concept": "attributes", + "processor": "tag" + }, + "files": { + "doc-experiment/corpus/T02-link-targets/task.md": "86f539bd48a600db4d47492e2abdf44d3048ba0463bcca0f3ed628228f82d0f8", + "doc-experiment/corpus/T02-link-targets/reference.php": "8942a4ef55241d6d1deda51046430c3d8cd7c4ca716c8bc79bdf9aa723fdb6f6", + "doc-experiment/corpus/T02-link-targets/tests.json": "c21515f7d43962b86975ccee1b3a39747170edc75ec1af26416494d312870a4a" + } + }, + "T03-first-h1-text": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T03-first-h1-text/task.md": "66a882aef20a6af4430fb6a7eafc54ba306adcf982980967e07eba7881fc0030", + "doc-experiment/corpus/T03-first-h1-text/reference.php": "a64806dcd5f7f91e254c287aad72401502aec44a7661eb691b51e56e17064f2d", + "doc-experiment/corpus/T03-first-h1-text/tests.json": "499c5da8b3d916a6739142c6ed164e8a6a05c43a6c517003bcb31d6929094533" + } + }, + "T04-build-figure": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "attributes", + "processor": "tag" + }, + "files": { + "doc-experiment/corpus/T04-build-figure/task.md": "69c807b773b41e98c0e5122bee32eca3780ccdfd123279590b4d098f8b7848e1", + "doc-experiment/corpus/T04-build-figure/reference.php": "973bcd94265496fe52fe7f9b895c266b4ae91c99460abbfd9088f1931a5d590e", + "doc-experiment/corpus/T04-build-figure/tests.json": "45cea68d9b4606600a7d57cd46e84348edb5599ec43a1fec44002424a0820d3a" + } + }, + "T05-text-excerpt": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T05-text-excerpt/task.md": "5e69766456a6343f99ee24037574ed3f9bdf7795ab8be9a61203f6254b8fa2de", + "doc-experiment/corpus/T05-text-excerpt/reference.php": "1081c297e36fa566e8a6a2cf8098fe8d3117da8ac49d4106691b9bb65b3739b6", + "doc-experiment/corpus/T05-text-excerpt/tests.json": "8b2276d754c887d6246f321b2dfad0de59718bc252c0a6bed52b65ae1bad1496" + } + }, + "T06-collect-links": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T06-collect-links/task.md": "29bc31f664b27ca880bf5cdd6a50d0da8f3828cf59d085d2897df2cb987d7d3e", + "doc-experiment/corpus/T06-collect-links/reference.php": "c0e5ae9ec5d4448a525962f1e27250a7243541d613209e3bbdc86913ff7e7a81", + "doc-experiment/corpus/T06-collect-links/tests.json": "34460376a147fa18739e7743d341479489f46a88b58b9f5aa5fe0d901f2b3140" + } + }, + "T07-nested-lists": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T07-nested-lists/task.md": "481851a0b94a81599154a82cc309e4c0e6e6204ab4270a98faa8bb60fe1401b3", + "doc-experiment/corpus/T07-nested-lists/reference.php": "f082fca4f2419435d0d8fe275068a0d486bda2359a668dc969b6cddf78100a61", + "doc-experiment/corpus/T07-nested-lists/tests.json": "79632d22e7c44696e5779df8ea2351c36e335c20aa316437a8edc885d20c40cd" + } + }, + "T08-table-extract": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "traversal", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T08-table-extract/task.md": "d7aee8a4765781a8f7c36b976f914b495b70989817ba0612aa9fc4589bedfeee", + "doc-experiment/corpus/T08-table-extract/reference.php": "289b8c6d1b69d4ef4dd7810dbdcba35445c30a7f26b631a00835ef737b78709e", + "doc-experiment/corpus/T08-table-extract/tests.json": "cdec7559d63ee541f1b0dad34a4f9f0982ce66e58ddfea98ec482c425e1b2638" + } + }, + "T09-mark-keyword": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "serialization", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T09-mark-keyword/task.md": "5abba63d508cb56284ce7a02e374d35e4cdd45958d664174eb3002c13f1c3bce", + "doc-experiment/corpus/T09-mark-keyword/reference.php": "0141ececbe459f50729e657f59dd75e8b74911cd1d70940b8fd5072d2f1efb60", + "doc-experiment/corpus/T09-mark-keyword/tests.json": "946bb96c922fea63cda3c31b94840b05ffded9f2e1dfa4b1547c755b99cda0c5" + } + }, + "T10-last-h2": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "traversal", + "processor": "tag" + }, + "files": { + "doc-experiment/corpus/T10-last-h2/task.md": "285eca1aece8e63d9ab86d1d72319e1eef13f1e184a95216b373bbab22de2d0d", + "doc-experiment/corpus/T10-last-h2/reference.php": "dabda32a9cf4ffb8a1b6ca613feb0a87791619be3bdfd2502b93f5972ba454a5", + "doc-experiment/corpus/T10-last-h2/tests.json": "4b3be0b61681e587c88a2d2ae0b9dec88306dcdad5b0f41a9b3a6a5883aceb07" + } + }, + "T11-strip-tracking-attributes": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "attributes", + "processor": "tag" + }, + "files": { + "doc-experiment/corpus/T11-strip-tracking-attributes/task.md": "6f2bc727f767a27a8f5d6f7a5e56ebd075db494c444dc0bbc8318c1bf4f8715b", + "doc-experiment/corpus/T11-strip-tracking-attributes/reference.php": "6ab0dceded92febd085f447eafd1ded777bf7ad6d184c6dc05b265ec41f420f0", + "doc-experiment/corpus/T11-strip-tracking-attributes/tests.json": "65032bd058bfa4819f040c3d8f9590224684a6cfff28434ad6708ab1f6ce94fc" + } + }, + "T12-unwrap-spans": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "serialization", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T12-unwrap-spans/task.md": "e95fcebf3ce7e2fd9004c4ff82b66a77d712b81b1317f3d020c4c89ace25eb6b", + "doc-experiment/corpus/T12-unwrap-spans/reference.php": "93d6828161817ec9a0b90765d91e7dae1c48f3b3aac35f6cad6f785b1c715797", + "doc-experiment/corpus/T12-unwrap-spans/tests.json": "fcade4b9f46bbb824b0095f58b99d4c71c8b3e696093ab0d0ae25d666aae5f53" + } + } + } + }, + "created_at_utc": "2026-06-13T12:51:27+00:00", + "isolation": { + "scratch_contains": [ + "html-tag-processor.md", + "html-processor.md", + "tasks/.md" + ], + "subjects_must_not_read": [ + "reference.php", + "tests.json", + "source files", + "logs", + "plans", + "hypothesis docs" + ] + }, + "scratch": "/tmp/html-api-docs-eval/round-29", + "staged_task_files": [ + "tasks/N03-first-list-count.md", + "tasks/N04-normalize-or-placeholder.md", + "tasks/N06-extract-toc.md", + "tasks/T01-add-image-class.md", + "tasks/T02-link-targets.md", + "tasks/T03-first-h1-text.md", + "tasks/T04-build-figure.md", + "tasks/T05-text-excerpt.md", + "tasks/T06-collect-links.md", + "tasks/T07-nested-lists.md", + "tasks/T08-table-extract.md", + "tasks/T09-mark-keyword.md", + "tasks/T10-last-h2.md", + "tasks/T11-strip-tracking-attributes.md", + "tasks/T12-unwrap-spans.md" + ], + "scratch_isolation_check": "OK: /tmp/html-api-docs-eval/round-29 exposes 2 docs and 15 task prompt(s), with no forbidden files.", + "scratch_file_sha256": { + "html-processor.md": "485d2b4a540833a79ba97b67b85bd7d266f25745e2ffa292801210cead6fa3f5", + "html-tag-processor.md": "8ca4ec32aa54723fe16a0a9f45386ca6ebeecde7a06575d8a782e53fb395a664", + "tasks/N03-first-list-count.md": "a928648fe82d23863b2e1f8f0a1654ef97e6cb8fb579f3d37fdc0712caaa0082", + "tasks/N04-normalize-or-placeholder.md": "cef7cd04d639b6b7b435be2d33f3435b8b76f0a4d83748aa7c232204746ad3d0", + "tasks/N06-extract-toc.md": "ba341a3927bb81410b8662b29f479f23a0887d4ec6f19db95702c1346d670581", + "tasks/T01-add-image-class.md": "d3b65e68db2bf26d3423b5474c3adb49b592375afc8005c127d4d20eb2740e28", + "tasks/T02-link-targets.md": "86f539bd48a600db4d47492e2abdf44d3048ba0463bcca0f3ed628228f82d0f8", + "tasks/T03-first-h1-text.md": "66a882aef20a6af4430fb6a7eafc54ba306adcf982980967e07eba7881fc0030", + "tasks/T04-build-figure.md": "69c807b773b41e98c0e5122bee32eca3780ccdfd123279590b4d098f8b7848e1", + "tasks/T05-text-excerpt.md": "5e69766456a6343f99ee24037574ed3f9bdf7795ab8be9a61203f6254b8fa2de", + "tasks/T06-collect-links.md": "29bc31f664b27ca880bf5cdd6a50d0da8f3828cf59d085d2897df2cb987d7d3e", + "tasks/T07-nested-lists.md": "481851a0b94a81599154a82cc309e4c0e6e6204ab4270a98faa8bb60fe1401b3", + "tasks/T08-table-extract.md": "d7aee8a4765781a8f7c36b976f914b495b70989817ba0612aa9fc4589bedfeee", + "tasks/T09-mark-keyword.md": "5abba63d508cb56284ce7a02e374d35e4cdd45958d664174eb3002c13f1c3bce", + "tasks/T10-last-h2.md": "285eca1aece8e63d9ab86d1d72319e1eef13f1e184a95216b373bbab22de2d0d", + "tasks/T11-strip-tracking-attributes.md": "6f2bc727f767a27a8f5d6f7a5e56ebd075db494c444dc0bbc8318c1bf4f8715b", + "tasks/T12-unwrap-spans.md": "e95fcebf3ce7e2fd9004c4ff82b66a77d712b81b1317f3d020c4c89ace25eb6b" + } +} diff --git a/doc-experiment/results/round-29/round-summary.json b/doc-experiment/results/round-29/round-summary.json new file mode 100644 index 0000000000000..e2cd4c9d9d803 --- /dev/null +++ b/doc-experiment/results/round-29/round-summary.json @@ -0,0 +1,566 @@ +{ + "round_score": 98.31, + "core_score": 98.05, + "by_split": { + "train": 98.31 + }, + "by_concept": { + "attributes": 99.83, + "classes": 100.0, + "normalization": 100.0, + "serialization": 99.5, + "text": 99.7, + "traversal": 95.41 + }, + "tasks": { + "N03-first-list-count": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 11, + "total": 11, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 11, + "total": 11, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 11, + "total": 11, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html", + "split": "train" + } + }, + "N04-normalize-or-placeholder": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "normalization", + "processor": "html", + "split": "train" + } + }, + "N06-extract-toc": { + "score": 97.6, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 92, + "score": 97.6 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 93, + "score": 97.9 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 91, + "score": 97.3 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html", + "split": "train" + } + }, + "T01-add-image-class": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "smoke", + "commonness": "high", + "concept": "classes", + "processor": "tag", + "split": "train" + } + }, + "T02-link-targets": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "smoke", + "commonness": "high", + "concept": "attributes", + "processor": "tag", + "split": "train" + } + }, + "T03-first-h1-text": { + "score": 99.4, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 94, + "score": 98.2 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html", + "split": "train" + } + }, + "T04-build-figure": { + "score": 99.5, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 98, + "score": 99.4 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 99, + "score": 99.7 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 98, + "score": 99.4 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "attributes", + "processor": "tag", + "split": "train" + } + }, + "T05-text-excerpt": { + "score": 99.8, + "trials": [ + { + "trial": "trial-1", + "passed": 10, + "total": 10, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 10, + "total": 10, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 10, + "total": 10, + "adherence": 98, + "score": 99.4 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html", + "split": "train" + } + }, + "T06-collect-links": { + "score": 99.9, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 99, + "score": 99.7 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html", + "split": "train" + } + }, + "T07-nested-lists": { + "score": 81.13, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 2, + "total": 7, + "adherence": 82, + "score": 44.6 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 96, + "score": 98.8 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html", + "split": "train" + } + }, + "T08-table-extract": { + "score": 98.3, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 94, + "score": 98.2 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 98, + "score": 99.4 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 91, + "score": 97.3 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "traversal", + "processor": "html", + "split": "train" + } + }, + "T09-mark-keyword": { + "score": 99.2, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 96, + "score": 98.8 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 99, + "score": 99.7 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 97, + "score": 99.1 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "serialization", + "processor": "html", + "split": "train" + } + }, + "T10-last-h2": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 6, + "total": 6, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 6, + "total": 6, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 6, + "total": 6, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "traversal", + "processor": "tag", + "split": "train" + } + }, + "T11-strip-tracking-attributes": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "attributes", + "processor": "tag", + "split": "train" + } + }, + "T12-unwrap-spans": { + "score": 99.8, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 98, + "score": 99.4 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "serialization", + "processor": "html", + "split": "train" + } + } + }, + "round_metadata": { + "round": "round-29", + "mode": "scored-train", + "task_ids": [ + "N03-first-list-count", + "N04-normalize-or-placeholder", + "N06-extract-toc", + "T01-add-image-class", + "T02-link-targets", + "T03-first-h1-text", + "T04-build-figure", + "T05-text-excerpt", + "T06-collect-links", + "T07-nested-lists", + "T08-table-extract", + "T09-mark-keyword", + "T10-last-h2", + "T11-strip-tracking-attributes", + "T12-unwrap-spans" + ], + "task_count": 15, + "trials_per_task": 3, + "subject": { + "model": "gpt-5.4", + "reasoning_effort": "medium", + "service_tier": "priority" + }, + "judge": { + "model": "gpt-5.5", + "reasoning_effort": "xhigh", + "service_tier": "priority" + }, + "git_head": "95173a4486717c852b3e9cc69cb6c4ff227854ec", + "git_status_short": "" + }, + "subject_isolation": { + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-29/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." + } +} diff --git a/doc-experiment/results/round-29/subject-isolation.json b/doc-experiment/results/round-29/subject-isolation.json new file mode 100644 index 0000000000000..6ba8cbe03bc08 --- /dev/null +++ b/doc-experiment/results/round-29/subject-isolation.json @@ -0,0 +1,19 @@ +{ + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-29/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." +} diff --git a/doc-experiment/results/round-30/N03-first-list-count/judge.json b/doc-experiment/results/round-30/N03-first-list-count/judge.json new file mode 100644 index 0000000000000..13460c3899718 --- /dev/null +++ b/doc-experiment/results/round-30/N03-first-list-count/judge.json @@ -0,0 +1,45 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used `WP_HTML_Processor::create_fragment()` for a structure-sensitive task, then followed the documented bookmark, depth-bounded `next_token()`, clean-scan check, `seek()`, `set_attribute()`, and `get_updated_html()` pattern. Every API method called appears in the rendered docs, and execution recorded no `_doing_it_wrong` notices." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the correct HTML Processor and closely matched the documented 'scan a region before editing its opener' recipe. The depth guard, direct-child depth comparison, incomplete-token and parser-error checks, bookmark release, and `get_updated_html()` output path were all documented and idiomatic. No undocumented calls or misuse were recorded." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same documented pattern as trial 2: HTML Processor fragment parsing, first list bookmark, subtree walk bounded by `get_current_depth()`, direct-child `LI` counting, clean-scan rejection, seek back, attribute update, and updated HTML return. No hallucinated methods and no `_doing_it_wrong` records." + } + ], + "failure_analysis": "All trials passed all 11 frozen cases, so there were no failed hidden cases to attribute to documentation gaps. The docs did well in the exact areas this task needed: the HTML Processor overview says to choose `WP_HTML_Processor` when document structure matters; the 'Recipe: scan a region before editing its opener' heading gives the bookmark-walk-clean-check-seek-edit pattern; `next_token()` explains structural token walking and implicit/virtual closers; `get_current_depth()` explicitly teaches the `>=` subtree guard and warns against `>`; `paused_at_incomplete_token()` and `get_last_error()` explain truncation and unsupported-markup rejection; and `set_attribute()` plus `get_updated_html()` document overwrite semantics and how to retrieve patched markup. Near-misses were minor: the candidates had to infer the direct-child formula from depth semantics, and trial 1's extra `$closed` flag suggests some uncertainty about whether a depth-bounded walk will reliably reach the container boundary via virtual closers. Trials 2 and 3 also relied on strict `get_tag()` comparisons on all token types, which is valid because non-tag tokens return `null`, but the docs could make that scanning idiom more explicit.", + "doc_gaps": [ + { + "location": "`WP_HTML_Processor::get_current_depth()` / `WP_HTML_Processor::next_token()` docs", + "problem": "The docs explain how to stay inside a subtree, but they do not explicitly state the general direct-child test: an opening element is a direct child when it is not a closer and its depth is exactly the container opener depth plus one.", + "suggestion": "Add a small general example for filtering or counting direct child elements of a container using recorded depth, without using this specific list-counting task." + }, + { + "location": "Bounded-scan recipe and `paused_at_incomplete_token()` docs", + "problem": "The docs say to check truncation and parser errors after a scan, but the distinction between 'this subtree was scanned completely' and 'the entire input was scanned completely' is implicit.", + "suggestion": "Add a note that after a depth-bounded walk stops at the container's own closer, the clean-scan checks apply to what has been encountered; callers should drain to EOF only when their contract requires validating trailing markup too." + }, + { + "location": "`WP_HTML_Processor::next_token()` virtual-closer discussion", + "problem": "The docs state that implicit and end-of-input closers are visited, but the practical consequence for region scans is easy to miss.", + "suggestion": "Add a generic malformed-but-parseable example showing that omitted end tags still produce structural closer events, while incomplete syntax still requires `paused_at_incomplete_token()` checking." + }, + { + "location": "`WP_HTML_Processor::get_tag()` docs", + "problem": "Examples use `get_tag()` inside token walks, but the safe idiom for non-tag tokens is not highlighted near the method itself.", + "suggestion": "Clarify that `get_tag()` returns `null` on non-tag tokens, so strict string comparisons are safe, and mention that callers can additionally guard with `get_token_type() === '#tag'` when only tag tokens should be considered." + } + ] +} diff --git a/doc-experiment/results/round-30/N03-first-list-count/trial-1/candidate.php b/doc-experiment/results/round-30/N03-first-list-count/trial-1/candidate.php new file mode 100644 index 0000000000000..7a2038a765d80 --- /dev/null +++ b/doc-experiment/results/round-30/N03-first-list-count/trial-1/candidate.php @@ -0,0 +1,60 @@ +next_tag() ) { + $tag = $processor->get_tag(); + if ( 'UL' !== $tag && 'OL' !== $tag ) { + continue; + } + + if ( ! $processor->set_bookmark( 'first-list' ) ) { + return $html; + } + + $list_depth = $processor->get_current_depth(); + $count = 0; + $closed = false; + + while ( $processor->next_token() ) { + if ( $processor->get_current_depth() < $list_depth ) { + $closed = true; + break; + } + + if ( + '#tag' === $processor->get_token_type() && + 'LI' === $processor->get_tag() && + ! $processor->is_tag_closer() && + $processor->get_current_depth() === $list_depth + 1 + ) { + $count++; + } + } + + $scan_finished_cleanly = + $closed && + ! $processor->paused_at_incomplete_token() && + null === $processor->get_last_error(); + + if ( ! $scan_finished_cleanly ) { + $processor->release_bookmark( 'first-list' ); + return $html; + } + + if ( ! $processor->seek( 'first-list' ) ) { + $processor->release_bookmark( 'first-list' ); + return $html; + } + + $processor->set_attribute( 'data-item-count', (string) $count ); + $processor->release_bookmark( 'first-list' ); + + return $processor->get_updated_html(); + } + + return $html; +} diff --git a/doc-experiment/results/round-30/N03-first-list-count/trial-1/execution.json b/doc-experiment/results/round-30/N03-first-list-count/trial-1/execution.json new file mode 100644 index 0000000000000..1a221076ebf3e --- /dev/null +++ b/doc-experiment/results/round-30/N03-first-list-count/trial-1/execution.json @@ -0,0 +1,107 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-30/N03-first-list-count/trial-1/candidate.php", + "function": "add_first_list_item_count", + "passed": 11, + "total": 11, + "cases": [ + { + "id": "simple-ul", + "status": "pass", + "expected": "
            • A
            • B
            • C
            ", + "actual": "
            • A
            • B
            • C
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ol", + "status": "pass", + "expected": "
            1. A
            2. B
            ", + "actual": "
            1. A
            2. B
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-list", + "status": "pass", + "expected": "

            No list here.

            ", + "actual": "

            No list here.

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-count-overwritten", + "status": "pass", + "expected": "
            • A
            ", + "actual": "
            • A
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-li-closers", + "status": "pass", + "expected": "
            • one
            • two", + "actual": "
              • one
              • two", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-list-counts-direct-children", + "status": "pass", + "expected": "
                  • x
                • y", + "actual": "
                    • x
                  • y", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-token-inside-list", + "status": "pass", + "expected": "
                    • ", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-images", + "status": "pass", + "expected": "

                      Nothing here.

                      ", + "actual": "

                      Nothing here.

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unquoted-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-tag-at-end", + "status": "pass", + "expected": "

                      text

                      text

                      next_tag( 'img' ) ) { + $processor->add_class( 'wp-image' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-32/T01-add-image-class/trial-2/execution.json b/doc-experiment/results/round-32/T01-add-image-class/trial-2/execution.json new file mode 100644 index 0000000000000..adbb9adae90a1 --- /dev/null +++ b/doc-experiment/results/round-32/T01-add-image-class/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T01-add-image-class/trial-2/candidate.php", + "function": "add_image_class", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

                      ", + "actual": "

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple", + "status": "pass", + "expected": "
                      ", + "actual": "
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-classes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-tag", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-images", + "status": "pass", + "expected": "

                      Nothing here.

                      ", + "actual": "

                      Nothing here.

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unquoted-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-tag-at-end", + "status": "pass", + "expected": "

                      text

                      text

                      next_tag( 'img' ) ) { + $processor->add_class( 'wp-image' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-32/T01-add-image-class/trial-3/execution.json b/doc-experiment/results/round-32/T01-add-image-class/trial-3/execution.json new file mode 100644 index 0000000000000..857cfa826afdb --- /dev/null +++ b/doc-experiment/results/round-32/T01-add-image-class/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T01-add-image-class/trial-3/candidate.php", + "function": "add_image_class", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

                      ", + "actual": "

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple", + "status": "pass", + "expected": "
                      ", + "actual": "
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-classes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-tag", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-images", + "status": "pass", + "expected": "

                      Nothing here.

                      ", + "actual": "

                      Nothing here.

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unquoted-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-tag-at-end", + "status": "pass", + "expected": "

                      text

                      text

                      next_tag( 'A' ) ) { + if ( null === $processor->get_attribute( 'href' ) ) { + continue; + } + + $processor->set_attribute( 'target', '_blank' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-32/T02-link-targets/trial-1/execution.json b/doc-experiment/results/round-32/T02-link-targets/trial-1/execution.json new file mode 100644 index 0000000000000..59777e2d660a7 --- /dev/null +++ b/doc-experiment/results/round-32/T02-link-targets/trial-1/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T02-link-targets/trial-1/candidate.php", + "function": "add_link_targets", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-skipped", + "status": "pass", + "expected": "staygo", + "actual": "staygo", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-target-overwritten", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup-in-link", + "status": "pass", + "expected": "bold move", + "actual": "bold move", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T02-link-targets/trial-1/response.json b/doc-experiment/results/round-32/T02-link-targets/trial-1/response.json new file mode 100644 index 0000000000000..5fddbd6d8c6f0 --- /dev/null +++ b/doc-experiment/results/round-32/T02-link-targets/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for byte-preserving attribute edits: it scans only `A` tags with `next_tag( 'A' )`, checks `get_attribute( 'href' )` so empty-string and boolean `href` values still count as present, and overwrites or inserts `target` via `set_attribute( 'target', '_blank' )` before returning the result with `get_updated_html()`.", + "confidence": 97 +} diff --git a/doc-experiment/results/round-32/T02-link-targets/trial-2/candidate.php b/doc-experiment/results/round-32/T02-link-targets/trial-2/candidate.php new file mode 100644 index 0000000000000..b69b5917371f2 --- /dev/null +++ b/doc-experiment/results/round-32/T02-link-targets/trial-2/candidate.php @@ -0,0 +1,15 @@ +next_tag( 'A' ) ) { + if ( null === $processor->get_attribute( 'href' ) ) { + continue; + } + + $processor->set_attribute( 'target', '_blank' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-32/T02-link-targets/trial-2/execution.json b/doc-experiment/results/round-32/T02-link-targets/trial-2/execution.json new file mode 100644 index 0000000000000..d7902581f17e9 --- /dev/null +++ b/doc-experiment/results/round-32/T02-link-targets/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T02-link-targets/trial-2/candidate.php", + "function": "add_link_targets", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-skipped", + "status": "pass", + "expected": "staygo", + "actual": "staygo", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-target-overwritten", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup-in-link", + "status": "pass", + "expected": "bold move", + "actual": "bold move", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T02-link-targets/trial-2/response.json b/doc-experiment/results/round-32/T02-link-targets/trial-2/response.json new file mode 100644 index 0000000000000..0877a65a987d0 --- /dev/null +++ b/doc-experiment/results/round-32/T02-link-targets/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat, byte-preserving attribute edit. It scans only `A` tags with `next_tag( 'A' )`, checks `get_attribute( 'href' )` so empty-string and boolean `href` still count as present while missing `href` returns `null`, then overwrites or adds `target` via `set_attribute( 'target', '_blank' )` and returns the result with `get_updated_html()`.", + "confidence": 97 +} diff --git a/doc-experiment/results/round-32/T02-link-targets/trial-3/candidate.php b/doc-experiment/results/round-32/T02-link-targets/trial-3/candidate.php new file mode 100644 index 0000000000000..972d891e974db --- /dev/null +++ b/doc-experiment/results/round-32/T02-link-targets/trial-3/candidate.php @@ -0,0 +1,14 @@ +next_tag( 'A' ) ) { + if ( null === $processor->get_attribute( 'href' ) ) { + continue; + } + + $processor->set_attribute( 'target', '_blank' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-32/T02-link-targets/trial-3/execution.json b/doc-experiment/results/round-32/T02-link-targets/trial-3/execution.json new file mode 100644 index 0000000000000..3611da013f916 --- /dev/null +++ b/doc-experiment/results/round-32/T02-link-targets/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T02-link-targets/trial-3/candidate.php", + "function": "add_link_targets", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-skipped", + "status": "pass", + "expected": "staygo", + "actual": "staygo", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-target-overwritten", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup-in-link", + "status": "pass", + "expected": "bold move", + "actual": "bold move", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T02-link-targets/trial-3/response.json b/doc-experiment/results/round-32/T02-link-targets/trial-3/response.json new file mode 100644 index 0000000000000..69e69d36412a9 --- /dev/null +++ b/doc-experiment/results/round-32/T02-link-targets/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat, byte-preserving attribute edit pass: it scans only `A` tags with `next_tag( 'A' )`, checks `get_attribute( 'href' )` so present-empty (`\"\"`) and boolean (`true` for ``) still count as present, then overwrites or adds `target` via `set_attribute( 'target', '_blank' )` and returns the result with `get_updated_html()`.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-32/T03-first-h1-text/judge.json b/doc-experiment/results/round-32/T03-first-h1-text/judge.json new file mode 100644 index 0000000000000..02e4d85d577dd --- /dev/null +++ b/doc-experiment/results/round-32/T03-first-h1-text/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 94, + "hallucinated_methods": [], + "notes": "Used the correct tree-aware WP_HTML_Processor with create_fragment(), next_tag('H1'), a recorded get_current_depth(), and a depth-bounded next_token() walk. Every called method is present in the rendered docs and execution recorded no _doing_it_wrong notices. Minor deduction: it also whitelists SCRIPT, STYLE, TEXTAREA, and TITLE opener modifiable text. The docs' DOM-style text recipe says ordinary subtree text should append only #text tokens unless the caller explicitly opts into special-element contents; this task did not require that. Passed 8/8 frozen cases." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "This matches the documented and canonical pattern exactly: create a fragment processor, find the first H1, record its depth, walk tokens while depth stays >= the opener depth, and append get_modifiable_text() only for #text tokens. It handles decoded text, image-only empty string, missing H1 as null, nested markup, and the unclosed H1 case without undocumented calls. Passed 8/8 frozen cases with no _doing_it_wrong notices." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same high-adherence solution as trial 2. It chooses WP_HTML_Processor for structure, uses only documented methods, applies the documented subtree text walk with the correct >= depth guard, and relies on get_modifiable_text() for decoded #text content. Passed 8/8 frozen cases with no _doing_it_wrong notices." + } + ], + "failure_analysis": "No hidden case failed in any trial; all candidates passed all 8 frozen expectations. The docs did well in several places: Tag Processor > Which processor should I use? explicitly directs text-content extraction and subtree walking to WP_HTML_Processor; HTML Processor > Recipe: collect DOM-style text from a subtree gives almost exactly the needed pattern; next_token() and get_current_depth() explain why the walk must be bounded and why the guard must be >=; get_modifiable_text() documents decoded #text output; and the depth/virtual-closer behavior supports the unclosed-H1 case. The only near-miss is trial-1's special-element handling. It likely overgeneralized HTML Processor > next_token(), which says SCRIPT, STYLE, TITLE, and TEXTAREA have no #text child tokens and their text is carried on the opener. The more controlling passage is HTML Processor > Recipe: collect DOM-style text from a subtree, especially the default policy saying ordinary subtree text is only reached #text tokens and special-element opener text should be opt-in. A test such as an H1 containing SCRIPT or TEXTAREA would distinguish that interpretation from the canonical policy.", + "doc_gaps": [ + { + "location": "html-processor.md > next_token() special-element exception", + "problem": "The paragraph correctly explains that special elements carry modifiable text on their opener token, but outside the subtree-text recipe it can read like a general instruction to include that text during element text extraction.", + "suggestion": "Add a cross-reference sentence: read special-element opener text only when the caller explicitly wants those element contents; for ordinary DOM-style subtree text, continue collecting only #text tokens as shown in the recipe." + }, + { + "location": "html-processor.md > Recipe: collect DOM-style text from a subtree", + "problem": "The recipe is strong, but the contract could be named more explicitly so readers can distinguish ordinary descendant text from visible text, all modifiable text, comments, and special-element raw/plaintext contents.", + "suggestion": "Precede the example with a compact contract statement: ordinary subtree text means descendant #text tokens reached by a depth- or breadcrumb-bounded HTML Processor walk; comments, processing instructions, and special-element opener text are excluded unless deliberately whitelisted." + }, + { + "location": "html-processor.md > get_current_depth() / subtree walk guidance", + "problem": "Incomplete input is discussed mainly for mutations and clean scans, while read-only extraction readers may not know whether an unclosed container should be rejected or parsed best-effort.", + "suggestion": "Add a read-only note: a bounded walk can return best-effort text from the parsed tree even when trailing markup is unclosed; check paused_at_incomplete_token only when the caller requires proof of complete source or before applying mutations." + } + ] +} diff --git a/doc-experiment/results/round-32/T03-first-h1-text/trial-1/candidate.php b/doc-experiment/results/round-32/T03-first-h1-text/trial-1/candidate.php new file mode 100644 index 0000000000000..1fa6d223c7de9 --- /dev/null +++ b/doc-experiment/results/round-32/T03-first-h1-text/trial-1/candidate.php @@ -0,0 +1,39 @@ +next_tag( 'H1' ) ) { + return null; + } + + $depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + continue; + } + + if ( '#tag' !== $processor->get_token_type() || $processor->is_tag_closer() ) { + continue; + } + + $tag_name = $processor->get_token_name(); + if ( + 'SCRIPT' === $tag_name || + 'STYLE' === $tag_name || + 'TEXTAREA' === $tag_name || + 'TITLE' === $tag_name + ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-32/T03-first-h1-text/trial-1/execution.json b/doc-experiment/results/round-32/T03-first-h1-text/trial-1/execution.json new file mode 100644 index 0000000000000..f2a1c06a3f547 --- /dev/null +++ b/doc-experiment/results/round-32/T03-first-h1-text/trial-1/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T03-first-h1-text/trial-1/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T03-first-h1-text/trial-1/response.json b/doc-experiment/results/round-32/T03-first-h1-text/trial-1/response.json new file mode 100644 index 0000000000000..bbe5e48e90ffc --- /dev/null +++ b/doc-experiment/results/round-32/T03-first-h1-text/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation. It finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, then walks that subtree with `next_token()`, concatenating decoded text from `#text` tokens via `get_modifiable_text()` and also opting in to special-element text carried on opener tokens (`SCRIPT`, `STYLE`, `TEXTAREA`, `TITLE`) as documented by the HTML Processor.", + "confidence": 84 +} diff --git a/doc-experiment/results/round-32/T03-first-h1-text/trial-2/candidate.php b/doc-experiment/results/round-32/T03-first-h1-text/trial-2/candidate.php new file mode 100644 index 0000000000000..a65b49649f352 --- /dev/null +++ b/doc-experiment/results/round-32/T03-first-h1-text/trial-2/candidate.php @@ -0,0 +1,23 @@ +next_tag( 'H1' ) ) { + return null; + } + + $depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-32/T03-first-h1-text/trial-2/execution.json b/doc-experiment/results/round-32/T03-first-h1-text/trial-2/execution.json new file mode 100644 index 0000000000000..2e64d237ed639 --- /dev/null +++ b/doc-experiment/results/round-32/T03-first-h1-text/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T03-first-h1-text/trial-2/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T03-first-h1-text/trial-2/response.json b/doc-experiment/results/round-32/T03-first-h1-text/trial-2/response.json new file mode 100644 index 0000000000000..17a9ba20b6a39 --- /dev/null +++ b/doc-experiment/results/round-32/T03-first-h1-text/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction depends on document structure. It finds the first `H1` with `next_tag()`, records that element's depth with `get_current_depth()`, then walks forward with `next_token()` while still inside that subtree, appending only `#text` token content via `get_modifiable_text()`, which already returns decoded text.", + "confidence": 91 +} diff --git a/doc-experiment/results/round-32/T03-first-h1-text/trial-3/candidate.php b/doc-experiment/results/round-32/T03-first-h1-text/trial-3/candidate.php new file mode 100644 index 0000000000000..06135c971fa91 --- /dev/null +++ b/doc-experiment/results/round-32/T03-first-h1-text/trial-3/candidate.php @@ -0,0 +1,22 @@ +next_tag( 'H1' ) ) { + return null; + } + + $depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-32/T03-first-h1-text/trial-3/execution.json b/doc-experiment/results/round-32/T03-first-h1-text/trial-3/execution.json new file mode 100644 index 0000000000000..65bb07af6974a --- /dev/null +++ b/doc-experiment/results/round-32/T03-first-h1-text/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T03-first-h1-text/trial-3/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T03-first-h1-text/trial-3/response.json b/doc-experiment/results/round-32/T03-first-h1-text/trial-3/response.json new file mode 100644 index 0000000000000..aec5a5c385f0a --- /dev/null +++ b/doc-experiment/results/round-32/T03-first-h1-text/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware task. It finds the first `H1` with `next_tag()`, records that element\u2019s depth with `get_current_depth()`, then walks its subtree with `next_token()` and concatenates only `#text` tokens via `get_token_type()` and `get_modifiable_text()`, which already returns decoded text.", + "confidence": 92 +} diff --git a/doc-experiment/results/round-32/T04-build-figure/judge.json b/doc-experiment/results/round-32/T04-build-figure/judge.json new file mode 100644 index 0000000000000..32870d43c951f --- /dev/null +++ b/doc-experiment/results/round-32/T04-build-figure/judge.json @@ -0,0 +1,35 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the correct `WP_HTML_Tag_Processor` for byte-exact template filling. Every called method is documented: `next_tag`, `set_attribute`, `next_token`, `get_token_type`, `set_modifiable_text`, and `get_updated_html`. The approach follows the documented template pattern, preserves attribute order by predeclaring attributes, and relies on API encoding for attributes and text." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation as trial-1. Correct processor choice, no undocumented API calls, idiomatic token walk to the placeholder `#text` node, and correct use of `get_updated_html()` after queued edits." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation as trial-1. Handles the documented escaping edge cases through `set_attribute()` and `set_modifiable_text()` with plain, unescaped input values; no `_doing_it_wrong` records were emitted." + } + ], + "failure_analysis": "All trials passed all 7 hidden cases, so there were no functional failures to attribute to documentation gaps. The docs did especially well in `WP_HTML_Tag_Processor` > `Building markup from a template`, which directly explained using a literal shape, preexisting empty attributes for stable attribute order, placeholder text for later replacement, `next_token()` plus `#text`, and `get_updated_html()`. The `set_attribute()` section also clearly states that callers provide plain unescaped values and that new attributes sort by name, while existing attributes retain position. The `set_modifiable_text()` section clearly says it accepts plaintext and encodes as needed, and warns that empty elements have no text token to replace. Near-miss: all candidates ignored the documented advice to check `set_modifiable_text()`'s boolean return value. In this fixed-template case the `#text` guard makes failure unlikely, but the examples themselves also omit the check, so models may learn to ignore the return contract in riskier contexts.", + "doc_gaps": [ + { + "location": "html-tag-processor.md: `WP_HTML_Tag_Processor::set_modifiable_text()` examples and `Building markup from a template` recipe", + "problem": "The prose says to always check the boolean return value, but the nearby examples call `set_modifiable_text()` without checking it. This weakens the contract even though the submitted solutions happened to be safe for the fixed template.", + "suggestion": "Make example code consistent with the contract: either check the return value or explicitly state when a prior `#text` token guard plus known template makes omission acceptable." + }, + { + "location": "html-tag-processor.md: `Building markup from a template` recipe", + "problem": "The recipe scans for the first `#text` token. That is fine for compact single-placeholder templates, but general templates with whitespace, multiple placeholders, or preexisting text nodes can make 'first text token' the wrong target.", + "suggestion": "Add a general note that placeholder text should be uniquely reachable, and that more complex templates should first navigate to the intended region or use structural checks rather than replacing the first text token blindly." + } + ] +} diff --git a/doc-experiment/results/round-32/T04-build-figure/trial-1/candidate.php b/doc-experiment/results/round-32/T04-build-figure/trial-1/candidate.php new file mode 100644 index 0000000000000..bb347f67e790b --- /dev/null +++ b/doc-experiment/results/round-32/T04-build-figure/trial-1/candidate.php @@ -0,0 +1,18 @@ +
                      .
                      ' ); + + if ( $processor->next_tag( 'img' ) ) { + $processor->set_attribute( 'src', $url ); + $processor->set_attribute( 'alt', $alt ); + } + + while ( $processor->next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $processor->set_modifiable_text( $caption ); + break; + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-32/T04-build-figure/trial-1/execution.json b/doc-experiment/results/round-32/T04-build-figure/trial-1/execution.json new file mode 100644 index 0000000000000..adc594f4030fd --- /dev/null +++ b/doc-experiment/results/round-32/T04-build-figure/trial-1/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T04-build-figure/trial-1/candidate.php", + "function": "build_figure", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "
                      \"A
                      My dog
                      ", + "actual": "
                      \"A
                      My dog
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ampersand-in-caption", + "status": "pass", + "expected": "
                      \"Pair\"
                      Fish & Chips
                      ", + "actual": "
                      \"Pair\"
                      Fish & Chips
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "quotes-in-alt", + "status": "pass", + "expected": "
                      \"The
                      Caption
                      ", + "actual": "
                      \"The
                      Caption
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "special-chars-in-url", + "status": "pass", + "expected": "
                      \"Alt\"
                      Caption
                      ", + "actual": "
                      \"Alt\"
                      Caption
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "angle-brackets-in-caption", + "status": "pass", + "expected": "
                      \"Code\"
                      Use <em> tags & enjoy
                      ", + "actual": "
                      \"Code\"
                      Use <em> tags & enjoy
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unicode", + "status": "pass", + "expected": "
                      \"Schnée
                      Winter 🌨️ scene
                      ", + "actual": "
                      \"Schnée
                      Winter 🌨️ scene
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "html-in-caption-not-parsed", + "status": "pass", + "expected": "
                      \"alt\"
                      <script>alert(1)</script>
                      ", + "actual": "
                      \"alt\"
                      <script>alert(1)</script>
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T04-build-figure/trial-1/response.json b/doc-experiment/results/round-32/T04-build-figure/trial-1/response.json new file mode 100644 index 0000000000000..f8f1750bf1618 --- /dev/null +++ b/doc-experiment/results/round-32/T04-build-figure/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This builds from a fixed HTML template with `WP_HTML_Tag_Processor`, which is the documented way to get exact markup shape and attribute order. It updates the existing `src` and `alt` attributes with `set_attribute()`, then walks tokens with `next_token()` to find the placeholder `#text` node inside `figcaption` and replaces it with `set_modifiable_text()`, letting the HTML API handle all required escaping.", + "confidence": 96 +} diff --git a/doc-experiment/results/round-32/T04-build-figure/trial-2/candidate.php b/doc-experiment/results/round-32/T04-build-figure/trial-2/candidate.php new file mode 100644 index 0000000000000..bb347f67e790b --- /dev/null +++ b/doc-experiment/results/round-32/T04-build-figure/trial-2/candidate.php @@ -0,0 +1,18 @@ +
                      .
                      ' ); + + if ( $processor->next_tag( 'img' ) ) { + $processor->set_attribute( 'src', $url ); + $processor->set_attribute( 'alt', $alt ); + } + + while ( $processor->next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $processor->set_modifiable_text( $caption ); + break; + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-32/T04-build-figure/trial-2/execution.json b/doc-experiment/results/round-32/T04-build-figure/trial-2/execution.json new file mode 100644 index 0000000000000..faa17d9950d0a --- /dev/null +++ b/doc-experiment/results/round-32/T04-build-figure/trial-2/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T04-build-figure/trial-2/candidate.php", + "function": "build_figure", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "
                      \"A
                      My dog
                      ", + "actual": "
                      \"A
                      My dog
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ampersand-in-caption", + "status": "pass", + "expected": "
                      \"Pair\"
                      Fish & Chips
                      ", + "actual": "
                      \"Pair\"
                      Fish & Chips
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "quotes-in-alt", + "status": "pass", + "expected": "
                      \"The
                      Caption
                      ", + "actual": "
                      \"The
                      Caption
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "special-chars-in-url", + "status": "pass", + "expected": "
                      \"Alt\"
                      Caption
                      ", + "actual": "
                      \"Alt\"
                      Caption
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "angle-brackets-in-caption", + "status": "pass", + "expected": "
                      \"Code\"
                      Use <em> tags & enjoy
                      ", + "actual": "
                      \"Code\"
                      Use <em> tags & enjoy
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unicode", + "status": "pass", + "expected": "
                      \"Schnée
                      Winter 🌨️ scene
                      ", + "actual": "
                      \"Schnée
                      Winter 🌨️ scene
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "html-in-caption-not-parsed", + "status": "pass", + "expected": "
                      \"alt\"
                      <script>alert(1)</script>
                      ", + "actual": "
                      \"alt\"
                      <script>alert(1)</script>
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T04-build-figure/trial-2/response.json b/doc-experiment/results/round-32/T04-build-figure/trial-2/response.json new file mode 100644 index 0000000000000..758f32c3e937f --- /dev/null +++ b/doc-experiment/results/round-32/T04-build-figure/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This builds the fragment from a fixed HTML template using `WP_HTML_Tag_Processor`, which the docs explicitly recommend for safely filling untrusted values into known markup. The function preserves the required `img` attribute order by including `src` and `alt` in the template, sets those attributes with `set_attribute()`, replaces the placeholder figcaption text by scanning tokens with `next_token()`, checking `get_token_type()` for `#text`, and calling `set_modifiable_text()`, then returns the result with `get_updated_html()`.", + "confidence": 94 +} diff --git a/doc-experiment/results/round-32/T04-build-figure/trial-3/candidate.php b/doc-experiment/results/round-32/T04-build-figure/trial-3/candidate.php new file mode 100644 index 0000000000000..bb347f67e790b --- /dev/null +++ b/doc-experiment/results/round-32/T04-build-figure/trial-3/candidate.php @@ -0,0 +1,18 @@ +
                      .
                      ' ); + + if ( $processor->next_tag( 'img' ) ) { + $processor->set_attribute( 'src', $url ); + $processor->set_attribute( 'alt', $alt ); + } + + while ( $processor->next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $processor->set_modifiable_text( $caption ); + break; + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-32/T04-build-figure/trial-3/execution.json b/doc-experiment/results/round-32/T04-build-figure/trial-3/execution.json new file mode 100644 index 0000000000000..c1c023f476aa4 --- /dev/null +++ b/doc-experiment/results/round-32/T04-build-figure/trial-3/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T04-build-figure/trial-3/candidate.php", + "function": "build_figure", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "
                      \"A
                      My dog
                      ", + "actual": "
                      \"A
                      My dog
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ampersand-in-caption", + "status": "pass", + "expected": "
                      \"Pair\"
                      Fish & Chips
                      ", + "actual": "
                      \"Pair\"
                      Fish & Chips
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "quotes-in-alt", + "status": "pass", + "expected": "
                      \"The
                      Caption
                      ", + "actual": "
                      \"The
                      Caption
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "special-chars-in-url", + "status": "pass", + "expected": "
                      \"Alt\"
                      Caption
                      ", + "actual": "
                      \"Alt\"
                      Caption
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "angle-brackets-in-caption", + "status": "pass", + "expected": "
                      \"Code\"
                      Use <em> tags & enjoy
                      ", + "actual": "
                      \"Code\"
                      Use <em> tags & enjoy
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unicode", + "status": "pass", + "expected": "
                      \"Schnée
                      Winter 🌨️ scene
                      ", + "actual": "
                      \"Schnée
                      Winter 🌨️ scene
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "html-in-caption-not-parsed", + "status": "pass", + "expected": "
                      \"alt\"
                      <script>alert(1)</script>
                      ", + "actual": "
                      \"alt\"
                      <script>alert(1)</script>
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T04-build-figure/trial-3/response.json b/doc-experiment/results/round-32/T04-build-figure/trial-3/response.json new file mode 100644 index 0000000000000..e4af41b25214a --- /dev/null +++ b/doc-experiment/results/round-32/T04-build-figure/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This builds from a fixed HTML template so the output shape and attribute order stay exact, then uses `WP_HTML_Tag_Processor` to safely fill unescaped values through the documented `set_attribute()`, `next_token()`, `get_token_type()`, `set_modifiable_text()`, and `get_updated_html()` APIs. The placeholder text inside `figcaption` is replaced via the text-token API so caption content is encoded correctly.", + "confidence": 95 +} diff --git a/doc-experiment/results/round-32/T05-text-excerpt/judge.json b/doc-experiment/results/round-32/T05-text-excerpt/judge.json new file mode 100644 index 0000000000000..aff5918c6cad3 --- /dev/null +++ b/doc-experiment/results/round-32/T05-text-excerpt/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Processor::create_fragment() for a body fragment, walked tokens with documented next_token(), gated ordinary text by get_token_type() === '#text', and explicitly whitelisted TITLE/TEXTAREA opener tokens before calling get_modifiable_text(). All API calls appear in the rendered docs; execution had no _doing_it_wrong records. Accumulating the full text before truncating is less efficient than necessary but not an API-adherence problem." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same correct documented API pattern as the reference, with an efficient running mb_strlen()/mb_substr() truncation path. It follows the docs' distinction between ordinary #text tokens and opt-in special element text, and avoids raw SCRIPT/STYLE modifiable text. No undocumented methods or misuse records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Uses only documented methods, including get_last_error(), and otherwise follows the documented fragment/token/text walk pattern. The final get_last_error() fallback is conservative and not required by the task, but it is a documented post-scan concern rather than a hallucinated API use. No _doing_it_wrong records." + } + ], + "failure_analysis": "No failed hidden cases across trials. All three passed 10/10 with no _doing_it_wrong or trigger_error entries. The docs did well in three places: the Tag Processor overview explicitly says to use the HTML Processor for collecting an element's text content; WP_HTML_Processor::next_token() explains that text may be split across #text tokens and that SCRIPT, STYLE, TITLE, and TEXTAREA carry text on the element token instead of child #text tokens; and get_modifiable_text() states that #text, TITLE, and TEXTAREA are decoded UTF-8 while SCRIPT/STYLE are raw. The HTML Processor recipe also warns not to append get_modifiable_text() from every token and instead to whitelist token types. The only near-miss was trial-3's empty-string fallback on get_last_error(): reasonable from the docs' scan-safety language, but the docs do not fully define the expected policy for read-only text extraction after unsupported markup or incomplete trailing syntax.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_modifiable_text()", + "problem": "The method accurately describes all tokens with modifiable text, but that broad contract can still tempt callers to treat it as DOM textContent.", + "suggestion": "Add a prominent note that get_modifiable_text() is not a text-content predicate: callers should first decide eligible token types, usually #text plus explicit special-element opener opt-ins." + }, + { + "location": "WP_HTML_Processor::next_token() and scan recipes", + "problem": "The docs mention get_last_error() and paused_at_incomplete_token(), but do not clearly separate policies for mutations/rewrites from best-effort read-only extraction.", + "suggestion": "Document post-scan policy choices: when partial accumulated data is valid, when callers should reject or fallback, and what is guaranteed after unsupported markup or incomplete trailing syntax." + }, + { + "location": "Text handling examples around next_token()/get_modifiable_text()", + "problem": "The docs recommend mb_substr(..., 'UTF-8') but do not fully spell out length measurement and code-point versus grapheme-cluster expectations.", + "suggestion": "Pair truncation examples with mb_strlen(..., 'UTF-8') and clarify that mb_* slicing is suitable for Unicode code-point limits, while grapheme_* APIs are needed for user-perceived character limits." + } + ] +} diff --git a/doc-experiment/results/round-32/T05-text-excerpt/trial-1/candidate.php b/doc-experiment/results/round-32/T05-text-excerpt/trial-1/candidate.php new file mode 100644 index 0000000000000..e8b082966f16a --- /dev/null +++ b/doc-experiment/results/round-32/T05-text-excerpt/trial-1/candidate.php @@ -0,0 +1,33 @@ +next_token() ) { + $token_type = $processor->get_token_type(); + + if ( '#text' === $token_type ) { + $text .= $processor->get_modifiable_text(); + continue; + } + + if ( '#tag' !== $token_type || $processor->is_tag_closer() ) { + continue; + } + + $token_name = $processor->get_token_name(); + if ( 'TEXTAREA' === $token_name || 'TITLE' === $token_name ) { + $text .= $processor->get_modifiable_text(); + } + } + + return mb_substr( $text, 0, $max_codepoints, 'UTF-8' ); +} diff --git a/doc-experiment/results/round-32/T05-text-excerpt/trial-1/execution.json b/doc-experiment/results/round-32/T05-text-excerpt/trial-1/execution.json new file mode 100644 index 0000000000000..684583c1ea58b --- /dev/null +++ b/doc-experiment/results/round-32/T05-text-excerpt/trial-1/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T05-text-excerpt/trial-1/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T05-text-excerpt/trial-1/response.json b/doc-experiment/results/round-32/T05-text-excerpt/trial-1/response.json new file mode 100644 index 0000000000000..bc0ae6b57e895 --- /dev/null +++ b/doc-experiment/results/round-32/T05-text-excerpt/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, then walks every token with `next_token()`, appending decoded text from ordinary `#text` tokens plus the special decoded opener-token text for `TEXTAREA` and `TITLE` via `get_modifiable_text()`, while naturally excluding `SCRIPT` and `STYLE`. The result is truncated by Unicode code points with `mb_substr(..., 'UTF-8')`.", + "confidence": 92 +} diff --git a/doc-experiment/results/round-32/T05-text-excerpt/trial-2/candidate.php b/doc-experiment/results/round-32/T05-text-excerpt/trial-2/candidate.php new file mode 100644 index 0000000000000..4207920424d5e --- /dev/null +++ b/doc-experiment/results/round-32/T05-text-excerpt/trial-2/candidate.php @@ -0,0 +1,42 @@ +next_token() ) { + $chunk = ''; + + if ( '#text' === $processor->get_token_type() ) { + $chunk = $processor->get_modifiable_text(); + } else { + $token_name = $processor->get_token_name(); + if ( ! $processor->is_tag_closer() && ( 'TITLE' === $token_name || 'TEXTAREA' === $token_name ) ) { + $chunk = $processor->get_modifiable_text(); + } + } + + if ( '' === $chunk ) { + continue; + } + + $chunk_length = mb_strlen( $chunk, 'UTF-8' ); + if ( $length + $chunk_length > $max_codepoints ) { + return $text . mb_substr( $chunk, 0, $max_codepoints - $length, 'UTF-8' ); + } + + $text .= $chunk; + $length += $chunk_length; + } + + return $text; +} diff --git a/doc-experiment/results/round-32/T05-text-excerpt/trial-2/execution.json b/doc-experiment/results/round-32/T05-text-excerpt/trial-2/execution.json new file mode 100644 index 0000000000000..6f5923720184d --- /dev/null +++ b/doc-experiment/results/round-32/T05-text-excerpt/trial-2/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T05-text-excerpt/trial-2/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T05-text-excerpt/trial-2/response.json b/doc-experiment/results/round-32/T05-text-excerpt/trial-2/response.json new file mode 100644 index 0000000000000..fd779bf3b0ed4 --- /dev/null +++ b/doc-experiment/results/round-32/T05-text-excerpt/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the HTML fragment with structure awareness, then walks tokens with `next_token()`, appending ordinary `#text` token text via `get_modifiable_text()` and explicitly whitelisting opener tokens for `TITLE` and `TEXTAREA`, whose text is carried on the element token itself. It excludes script/style content by never reading special-element text except for those two tags, and truncates by Unicode code points with `mb_strlen()` and `mb_substr()` using UTF-8.", + "confidence": 92 +} diff --git a/doc-experiment/results/round-32/T05-text-excerpt/trial-3/candidate.php b/doc-experiment/results/round-32/T05-text-excerpt/trial-3/candidate.php new file mode 100644 index 0000000000000..ce6cde1bc0b32 --- /dev/null +++ b/doc-experiment/results/round-32/T05-text-excerpt/trial-3/candidate.php @@ -0,0 +1,37 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + continue; + } + + if ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) { + $tag_name = $processor->get_token_name(); + if ( 'TITLE' === $tag_name || 'TEXTAREA' === $tag_name ) { + $text .= $processor->get_modifiable_text(); + } + } + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + if ( mb_strlen( $text, 'UTF-8' ) <= $max_codepoints ) { + return $text; + } + + return mb_substr( $text, 0, $max_codepoints, 'UTF-8' ); +} diff --git a/doc-experiment/results/round-32/T05-text-excerpt/trial-3/execution.json b/doc-experiment/results/round-32/T05-text-excerpt/trial-3/execution.json new file mode 100644 index 0000000000000..610e534feec61 --- /dev/null +++ b/doc-experiment/results/round-32/T05-text-excerpt/trial-3/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T05-text-excerpt/trial-3/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T05-text-excerpt/trial-3/response.json b/doc-experiment/results/round-32/T05-text-excerpt/trial-3/response.json new file mode 100644 index 0000000000000..d30097996bd9e --- /dev/null +++ b/doc-experiment/results/round-32/T05-text-excerpt/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment with full HTML structure awareness, walks tokens with `next_token()`, appends ordinary `#text` token content via `get_modifiable_text()`, and explicitly opt-ins `TITLE` and `TEXTAREA` opener tokens so their decoded text is included while `SCRIPT` and `STYLE` remain excluded. It then truncates by Unicode code points with `mb_strlen()` and `mb_substr()` using UTF-8, as the docs recommend.", + "confidence": 87 +} diff --git a/doc-experiment/results/round-32/T06-collect-links/judge.json b/doc-experiment/results/round-32/T06-collect-links/judge.json new file mode 100644 index 0000000000000..3f19649b77be2 --- /dev/null +++ b/doc-experiment/results/round-32/T06-collect-links/judge.json @@ -0,0 +1,45 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), walked tokens, filtered href with is_string(), appended only #text get_modifiable_text(), and relied on documented virtual/end-of-input closers. All HTML API methods used are present in the rendered docs; no _doing_it_wrong records; passed 8/8." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Essentially matches the documented subtree-text recipe and canonical reference: next_tag('A'), get_attribute(), get_current_depth(), bounded next_token() walk with >= depth, #text guard, get_modifiable_text(). All API calls are documented; no _doing_it_wrong records; passed 8/8." + }, + { + "trial_id": "trial-3", + "adherence": 99, + "hallucinated_methods": [], + "notes": "Used the correct HTML Processor and a documented single-pass token walk with depth state. get_tag(), is_tag_closer(), get_current_depth(), get_attribute(), get_token_type(), and get_modifiable_text() are all documented. Minor reservation: it records the link on opener rather than flushing on structural close, but its depth reset follows the documented closer-depth contract. No _doing_it_wrong records; passed 8/8." + } + ], + "failure_analysis": "No hidden case failed in any trial. The docs were effective for this task because they directly covered the required decisions: the Tag Processor overview says to use WP_HTML_Processor for collecting element text and missing/implied closers; the HTML Processor subtree-text recipe shows the key next_tag + get_current_depth + next_token + #text + get_modifiable_text pattern; get_attribute documents string|true|null so subjects used is_string() and excluded missing/boolean href; get_modifiable_text documents decoded text for #text nodes; and next_token/get_current_depth document virtual/end-of-input closers and >= depth bounds, which explains the unclosed-link case. Near misses: trial-1 depended on closer-driven flushing, but the next_token section’s DT example and closer guarantee made that a documented pattern. trial-2 used an inner bounded walk despite the broader warning about nested next_token loops; it is safe here because the outer scan is next_tag('A'), but the warning could be read too broadly. trial-3 used a depth-drop state machine rather than the exact recipe, and get_current_depth’s closer-depth explanation was enough to make it correct.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_attribute() docblock", + "problem": "The HTML Processor method entry lists string|true|null but omits the decoded-value explanation that appears in the Tag Processor docs. Readers using only the method entry may not know attribute strings are already entity-decoded.", + "suggestion": "Repeat the inherited contract in the HTML Processor entry: string values are decoded; valueless attributes return true; absent/unavailable attributes return null; callers that require a real value should test is_string()." + }, + { + "location": "WP_HTML_Processor::next_token() section, nested-loop warning", + "problem": "The warning correctly discourages nested next_token loops for repeated regions, but it does not distinguish that a next_tag() outer scan plus a bounded next_token() subtree walk can be appropriate for independent matched elements.", + "suggestion": "Add a short clarification of when bounded subtree walks compose safely with next_tag(), and when repeated extraction should instead use a single token loop with state." + }, + { + "location": "WP_HTML_Processor subtree-text recipe", + "problem": "The recipe says ordinary text is only #text tokens, but examples do not explicitly call out that descendant element attributes such as img alt are not DOM text content.", + "suggestion": "Add one general example showing inline markup text is concatenated while void/replaced elements and their attributes contribute no text unless the caller explicitly reads attributes." + }, + { + "location": "Incomplete-input guidance in next_token()/get_current_depth docs", + "problem": "The docs mention checking paused_at_incomplete_token() when a result must reject truncated input, but the distinction between structural best-effort extraction and complete-source validation is easy to miss.", + "suggestion": "State explicitly that virtual closers make read-only structural extraction possible for unclosed elements, while paused_at_incomplete_token() is a policy check for callers that require complete source or are about to mutate/serialize output." + } + ] +} diff --git a/doc-experiment/results/round-32/T06-collect-links/trial-1/candidate.php b/doc-experiment/results/round-32/T06-collect-links/trial-1/candidate.php new file mode 100644 index 0000000000000..e281e35946361 --- /dev/null +++ b/doc-experiment/results/round-32/T06-collect-links/trial-1/candidate.php @@ -0,0 +1,40 @@ +next_token() ) { + $token_name = $processor->get_token_name(); + + if ( 'A' === $token_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_link ) { + $links[] = $current_link; + $current_link = null; + } + continue; + } + + $href = $processor->get_attribute( 'href' ); + if ( is_string( $href ) ) { + $current_link = array( + 'href' => $href, + 'text' => '', + ); + } + + continue; + } + + if ( null !== $current_link && '#text' === $processor->get_token_type() ) { + $current_link['text'] .= $processor->get_modifiable_text(); + } + } + + return $links; +} diff --git a/doc-experiment/results/round-32/T06-collect-links/trial-1/execution.json b/doc-experiment/results/round-32/T06-collect-links/trial-1/execution.json new file mode 100644 index 0000000000000..76f147428e658 --- /dev/null +++ b/doc-experiment/results/round-32/T06-collect-links/trial-1/execution.json @@ -0,0 +1,148 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T06-collect-links/trial-1/candidate.php", + "function": "collect_links", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "actual": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-excluded", + "status": "pass", + "expected": [ + { + "href": "/only", + "text": "real" + } + ], + "actual": [ + { + "href": "/only", + "text": "real" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-in-href-decoded", + "status": "pass", + "expected": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "actual": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-link-empty-text", + "status": "pass", + "expected": [ + { + "href": "/img", + "text": "" + } + ], + "actual": [ + { + "href": "/img", + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-text", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "actual": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-links", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-link", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "actual": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T06-collect-links/trial-1/response.json b/doc-experiment/results/round-32/T06-collect-links/trial-1/response.json new file mode 100644 index 0000000000000..ec29850a6e6e5 --- /dev/null +++ b/doc-experiment/results/round-32/T06-collect-links/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because collecting an element\u2019s text content is a tree-aware task. It walks the fragment with `next_token()`, starts a result entry when it sees an `A` opener whose `href` from `get_attribute()` is a string (excluding missing and boolean-valued `href`), appends decoded text from descendant `#text` tokens via `get_modifiable_text()`, and finalizes each link when the matching `A` closer is visited with `is_tag_closer()`.", + "confidence": 87 +} diff --git a/doc-experiment/results/round-32/T06-collect-links/trial-2/candidate.php b/doc-experiment/results/round-32/T06-collect-links/trial-2/candidate.php new file mode 100644 index 0000000000000..5457a096209c1 --- /dev/null +++ b/doc-experiment/results/round-32/T06-collect-links/trial-2/candidate.php @@ -0,0 +1,32 @@ +next_tag( 'A' ) ) { + $href = $processor->get_attribute( 'href' ); + if ( ! is_string( $href ) ) { + continue; + } + + $text = ''; + $a_depth = $processor->get_current_depth(); + + while ( $processor->next_token() && $processor->get_current_depth() >= $a_depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + $links[] = array( + 'href' => $href, + 'text' => $text, + ); + } + + return $links; +} diff --git a/doc-experiment/results/round-32/T06-collect-links/trial-2/execution.json b/doc-experiment/results/round-32/T06-collect-links/trial-2/execution.json new file mode 100644 index 0000000000000..1193eb9ea154f --- /dev/null +++ b/doc-experiment/results/round-32/T06-collect-links/trial-2/execution.json @@ -0,0 +1,148 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T06-collect-links/trial-2/candidate.php", + "function": "collect_links", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "actual": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-excluded", + "status": "pass", + "expected": [ + { + "href": "/only", + "text": "real" + } + ], + "actual": [ + { + "href": "/only", + "text": "real" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-in-href-decoded", + "status": "pass", + "expected": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "actual": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-link-empty-text", + "status": "pass", + "expected": [ + { + "href": "/img", + "text": "" + } + ], + "actual": [ + { + "href": "/img", + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-text", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "actual": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-links", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-link", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "actual": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T06-collect-links/trial-2/response.json b/doc-experiment/results/round-32/T06-collect-links/trial-2/response.json new file mode 100644 index 0000000000000..e99ca1988b41b --- /dev/null +++ b/doc-experiment/results/round-32/T06-collect-links/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction depends on document structure. It scans `A` openers with `next_tag( 'A' )`, reads `href` with `get_attribute()` and keeps only string-valued attributes, then walks that anchor\u2019s subtree with `next_token()` bounded by `get_current_depth()`, concatenating only `#text` tokens via `get_token_type()` and `get_modifiable_text()` so nested markup contributes no text of its own.", + "confidence": 92 +} diff --git a/doc-experiment/results/round-32/T06-collect-links/trial-3/candidate.php b/doc-experiment/results/round-32/T06-collect-links/trial-3/candidate.php new file mode 100644 index 0000000000000..c943fb4c32a74 --- /dev/null +++ b/doc-experiment/results/round-32/T06-collect-links/trial-3/candidate.php @@ -0,0 +1,42 @@ +next_token() ) { + if ( null !== $current_link_index && $processor->get_current_depth() < $current_link_depth ) { + $current_link_index = null; + $current_link_depth = null; + } + + if ( 'A' === $processor->get_tag() && ! $processor->is_tag_closer() ) { + $href = $processor->get_attribute( 'href' ); + + if ( is_string( $href ) ) { + $links[] = array( + 'href' => $href, + 'text' => '', + ); + + $current_link_index = count( $links ) - 1; + $current_link_depth = $processor->get_current_depth(); + } + + continue; + } + + if ( null !== $current_link_index && '#text' === $processor->get_token_type() ) { + $links[ $current_link_index ]['text'] .= $processor->get_modifiable_text(); + } + } + + return $links; +} diff --git a/doc-experiment/results/round-32/T06-collect-links/trial-3/execution.json b/doc-experiment/results/round-32/T06-collect-links/trial-3/execution.json new file mode 100644 index 0000000000000..d2d30262a4569 --- /dev/null +++ b/doc-experiment/results/round-32/T06-collect-links/trial-3/execution.json @@ -0,0 +1,148 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T06-collect-links/trial-3/candidate.php", + "function": "collect_links", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "actual": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-excluded", + "status": "pass", + "expected": [ + { + "href": "/only", + "text": "real" + } + ], + "actual": [ + { + "href": "/only", + "text": "real" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-in-href-decoded", + "status": "pass", + "expected": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "actual": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-link-empty-text", + "status": "pass", + "expected": [ + { + "href": "/img", + "text": "" + } + ], + "actual": [ + { + "href": "/img", + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-text", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "actual": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-links", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-link", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "actual": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T06-collect-links/trial-3/response.json b/doc-experiment/results/round-32/T06-collect-links/trial-3/response.json new file mode 100644 index 0000000000000..ef1d23b4cf815 --- /dev/null +++ b/doc-experiment/results/round-32/T06-collect-links/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because link text extraction is tree-aware work. It walks the fragment once with `next_token()`, starts a new result entry on each `A` opener whose `href` from `get_attribute()` is a string (excluding missing and boolean `href`), tracks that anchor by its `get_current_depth()`, and concatenates decoded text from descendant `#text` tokens via `get_modifiable_text()` until the walk leaves that anchor.", + "confidence": 89 +} diff --git a/doc-experiment/results/round-32/T07-nested-lists/judge.json b/doc-experiment/results/round-32/T07-nested-lists/judge.json new file mode 100644 index 0000000000000..c7686e91b8e19 --- /dev/null +++ b/doc-experiment/results/round-32/T07-nested-lists/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Used the correct tree-aware `WP_HTML_Processor::create_fragment()` path, walked open tags with `next_tag()`, checked `get_breadcrumbs()` excluding the current element, used documented `add_class()`, and returned via `get_updated_html()`. Also checked `get_last_error()`. Minor edge-case gap: it does not check `paused_at_incomplete_token()`, though that is not needed for the frozen cases." + }, + { + "trial_id": "trial-2", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Essentially the same high-adherence implementation as trial 1. Processor choice, breadcrumb ancestor logic, class mutation, and output retrieval all match documented API patterns. No undocumented calls or `_doing_it_wrong` records. Same small omission around incomplete-token detection." + }, + { + "trial_id": "trial-3", + "adherence": 97, + "hallucinated_methods": [], + "notes": "All API calls are documented, including inherited `paused_at_incomplete_token()`. Correctly uses `WP_HTML_Processor`, breadcrumbs, `add_class()`, and `get_updated_html()`. The preliminary full-document pass is conservative and documented-adjacent, but slightly over-broad for this task because it rejects any incomplete trailing syntax instead of editing complete visited tokens." + } + ], + "failure_analysis": "All three trials passed all 7 frozen cases, so there are no failed hidden cases to attribute to misconceptions. The docs did well on the central decision: the Tag Processor overview explicitly says it has no tree awareness and that `get_breadcrumbs()` belongs to `WP_HTML_Processor`, while the HTML Processor overview points to structure-aware parsing. The `next_tag()` docs also clearly warn that `tag_name` is not a list of alternatives, which likely pushed candidates toward scanning all tags and branching on `get_tag()`. The `get_breadcrumbs()` docs were sufficient for candidates to infer that the current element is included and must be excluded for ancestor-only checks. The main near-miss is incomplete input: trials 1 and 2 ignore `paused_at_incomplete_token()`, while trial 3 preflights and rejects incomplete input wholesale. That variance suggests the docs describe the mechanism but not the recommended mutation policy for byte-preserving filters.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_breadcrumbs() / Breadcrumbs overview", + "problem": "The docs state that breadcrumbs include the current matched node, but they do not explicitly name the common ancestor-only idiom. Implementers must infer that containment checks should ignore the final breadcrumb.", + "suggestion": "Add a short note and generic example: for ancestor checks, inspect `array_slice( $processor->get_breadcrumbs(), 0, -1 )`; the final item is the current token, not an ancestor." + }, + { + "location": "WP_HTML_Tag_Processor::paused_at_incomplete_token() and HTML Processor recipes", + "problem": "The docs explain how to detect truncated syntax, but not how that state should affect class/attribute mutation workflows that otherwise preserve untouched bytes.", + "suggestion": "Document the policy distinction: `get_updated_html()` preserves unvisited trailing incomplete syntax, while callers needing all-or-nothing or complete-subtree results should check `paused_at_incomplete_token()` after draining the processor and fall back." + }, + { + "location": "WP_HTML_Processor::get_last_error() / HTML Support", + "problem": "The unsupported-markup guidance says the parser aborts and exposes `get_last_error()`, but it is not explicit whether queued edits before the abort should be returned or discarded by mutating filters.", + "suggestion": "Add guidance for mutating callbacks: after a scan, check `get_last_error()` if partial edits are unacceptable; otherwise `get_updated_html()` returns queued edits plus untouched input bytes." + } + ] +} diff --git a/doc-experiment/results/round-32/T07-nested-lists/trial-1/candidate.php b/doc-experiment/results/round-32/T07-nested-lists/trial-1/candidate.php new file mode 100644 index 0000000000000..498593fcda109 --- /dev/null +++ b/doc-experiment/results/round-32/T07-nested-lists/trial-1/candidate.php @@ -0,0 +1,36 @@ +next_tag() ) { + $tag = $processor->get_tag(); + if ( 'UL' !== $tag && 'OL' !== $tag ) { + continue; + } + + $breadcrumbs = $processor->get_breadcrumbs(); + $is_nested = false; + $last_index = count( $breadcrumbs ) - 1; + + for ( $i = 0; $i < $last_index; $i++ ) { + if ( 'UL' === $breadcrumbs[ $i ] || 'OL' === $breadcrumbs[ $i ] ) { + $is_nested = true; + break; + } + } + + if ( $is_nested ) { + $processor->add_class( 'nested-list' ); + } + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-32/T07-nested-lists/trial-1/execution.json b/doc-experiment/results/round-32/T07-nested-lists/trial-1/execution.json new file mode 100644 index 0000000000000..ad21a94923189 --- /dev/null +++ b/doc-experiment/results/round-32/T07-nested-lists/trial-1/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T07-nested-lists/trial-1/candidate.php", + "function": "mark_nested_lists", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple-ol-inside-ul", + "status": "pass", + "expected": "
                      • One
                        1. Nested
                      ", + "actual": "
                      • One
                        1. Nested
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "top-level-lists-untouched", + "status": "pass", + "expected": "
                      1. Top
                      • Also top
                      ", + "actual": "
                      1. Top
                      • Also top
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ul-inside-ol", + "status": "pass", + "expected": "
                      1. One
                        • Nested
                      ", + "actual": "
                      1. One
                        • Nested
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "deep-descendant", + "status": "pass", + "expected": "
                        1. Deep
                      ", + "actual": "
                        1. Deep
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class-preserved", + "status": "pass", + "expected": "
                        1. Nested
                      ", + "actual": "
                        1. Nested
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-nested-levels", + "status": "pass", + "expected": "
                      • A
                        1. B
                          • C
                      ", + "actual": "
                      • A
                        1. B
                          • C
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "mixed-document", + "status": "pass", + "expected": "

                      intro

                      • A
                        1. B
                      1. C
                      ", + "actual": "

                      intro

                      • A
                        1. B
                      1. C
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T07-nested-lists/trial-1/response.json b/doc-experiment/results/round-32/T07-nested-lists/trial-1/response.json new file mode 100644 index 0000000000000..26b4a89d482cb --- /dev/null +++ b/doc-experiment/results/round-32/T07-nested-lists/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on tree structure. It scans list openers with `next_tag()`, checks `get_breadcrumbs()` to see whether any ancestor before the current element is `UL` or `OL`, adds the class with `add_class()`, and returns the byte-preserving edited fragment via `get_updated_html()`. If the HTML Processor aborts on unsupported markup, it returns the original HTML unchanged.", + "confidence": 88 +} diff --git a/doc-experiment/results/round-32/T07-nested-lists/trial-2/candidate.php b/doc-experiment/results/round-32/T07-nested-lists/trial-2/candidate.php new file mode 100644 index 0000000000000..db78c7d42786a --- /dev/null +++ b/doc-experiment/results/round-32/T07-nested-lists/trial-2/candidate.php @@ -0,0 +1,36 @@ +next_tag() ) { + $tag = $processor->get_tag(); + if ( 'UL' !== $tag && 'OL' !== $tag ) { + continue; + } + + $breadcrumbs = $processor->get_breadcrumbs(); + $has_list_ancestor = false; + $last_index = count( $breadcrumbs ) - 1; + + for ( $i = 0; $i < $last_index; $i++ ) { + if ( 'UL' === $breadcrumbs[ $i ] || 'OL' === $breadcrumbs[ $i ] ) { + $has_list_ancestor = true; + break; + } + } + + if ( $has_list_ancestor ) { + $processor->add_class( 'nested-list' ); + } + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-32/T07-nested-lists/trial-2/execution.json b/doc-experiment/results/round-32/T07-nested-lists/trial-2/execution.json new file mode 100644 index 0000000000000..38c5003076ba0 --- /dev/null +++ b/doc-experiment/results/round-32/T07-nested-lists/trial-2/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T07-nested-lists/trial-2/candidate.php", + "function": "mark_nested_lists", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple-ol-inside-ul", + "status": "pass", + "expected": "
                      • One
                        1. Nested
                      ", + "actual": "
                      • One
                        1. Nested
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "top-level-lists-untouched", + "status": "pass", + "expected": "
                      1. Top
                      • Also top
                      ", + "actual": "
                      1. Top
                      • Also top
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ul-inside-ol", + "status": "pass", + "expected": "
                      1. One
                        • Nested
                      ", + "actual": "
                      1. One
                        • Nested
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "deep-descendant", + "status": "pass", + "expected": "
                        1. Deep
                      ", + "actual": "
                        1. Deep
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class-preserved", + "status": "pass", + "expected": "
                        1. Nested
                      ", + "actual": "
                        1. Nested
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-nested-levels", + "status": "pass", + "expected": "
                      • A
                        1. B
                          • C
                      ", + "actual": "
                      • A
                        1. B
                          • C
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "mixed-document", + "status": "pass", + "expected": "

                      intro

                      • A
                        1. B
                      1. C
                      ", + "actual": "

                      intro

                      • A
                        1. B
                      1. C
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T07-nested-lists/trial-2/response.json b/doc-experiment/results/round-32/T07-nested-lists/trial-2/response.json new file mode 100644 index 0000000000000..cb8133a434c91 --- /dev/null +++ b/doc-experiment/results/round-32/T07-nested-lists/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure, then scans tags with `next_tag()`, checks each `UL`/`OL` opener's ancestor path via `get_breadcrumbs()`, and calls `add_class()` only when an earlier breadcrumb is also `UL` or `OL`. The final HTML is returned with `get_updated_html()`, and if the processor aborts on unsupported markup via `get_last_error()`, the original fragment is returned unchanged.", + "confidence": 88 +} diff --git a/doc-experiment/results/round-32/T07-nested-lists/trial-3/candidate.php b/doc-experiment/results/round-32/T07-nested-lists/trial-3/candidate.php new file mode 100644 index 0000000000000..123873dea9604 --- /dev/null +++ b/doc-experiment/results/round-32/T07-nested-lists/trial-3/candidate.php @@ -0,0 +1,39 @@ +next_tag() ) { + continue; + } + + if ( $probe->paused_at_incomplete_token() || null !== $probe->get_last_error() ) { + return $html; + } + + $processor = WP_HTML_Processor::create_fragment( $html ); + if ( null === $processor ) { + return $html; + } + + while ( $processor->next_tag() ) { + $tag = $processor->get_tag(); + if ( 'UL' !== $tag && 'OL' !== $tag ) { + continue; + } + + $breadcrumbs = $processor->get_breadcrumbs(); + array_pop( $breadcrumbs ); + + foreach ( $breadcrumbs as $ancestor ) { + if ( 'UL' === $ancestor || 'OL' === $ancestor ) { + $processor->add_class( 'nested-list' ); + break; + } + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-32/T07-nested-lists/trial-3/execution.json b/doc-experiment/results/round-32/T07-nested-lists/trial-3/execution.json new file mode 100644 index 0000000000000..0d2605faafe67 --- /dev/null +++ b/doc-experiment/results/round-32/T07-nested-lists/trial-3/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T07-nested-lists/trial-3/candidate.php", + "function": "mark_nested_lists", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple-ol-inside-ul", + "status": "pass", + "expected": "
                      • One
                        1. Nested
                      ", + "actual": "
                      • One
                        1. Nested
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "top-level-lists-untouched", + "status": "pass", + "expected": "
                      1. Top
                      • Also top
                      ", + "actual": "
                      1. Top
                      • Also top
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ul-inside-ol", + "status": "pass", + "expected": "
                      1. One
                        • Nested
                      ", + "actual": "
                      1. One
                        • Nested
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "deep-descendant", + "status": "pass", + "expected": "
                        1. Deep
                      ", + "actual": "
                        1. Deep
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class-preserved", + "status": "pass", + "expected": "
                        1. Nested
                      ", + "actual": "
                        1. Nested
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-nested-levels", + "status": "pass", + "expected": "
                      • A
                        1. B
                          • C
                      ", + "actual": "
                      • A
                        1. B
                          • C
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "mixed-document", + "status": "pass", + "expected": "

                      intro

                      • A
                        1. B
                      1. C
                      ", + "actual": "

                      intro

                      • A
                        1. B
                      1. C
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T07-nested-lists/trial-3/response.json b/doc-experiment/results/round-32/T07-nested-lists/trial-3/response.json new file mode 100644 index 0000000000000..ed5af190e9947 --- /dev/null +++ b/doc-experiment/results/round-32/T07-nested-lists/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure. A first pass with `next_tag()`, `paused_at_incomplete_token()`, and `get_last_error()` rejects incomplete or unsupported fragments so no partial edits are applied. A second pass scans tags with `next_tag()`, checks each `UL`/`OL` opener\u2019s ancestors via `get_breadcrumbs()`, and adds the `nested-list` class with `add_class()`, then returns the byte-preserving result from `get_updated_html()`.", + "confidence": 88 +} diff --git a/doc-experiment/results/round-32/T08-table-extract/judge.json b/doc-experiment/results/round-32/T08-table-extract/judge.json new file mode 100644 index 0000000000000..be97bff024955 --- /dev/null +++ b/doc-experiment/results/round-32/T08-table-extract/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 94, + "hallucinated_methods": [], + "notes": "Used the right structural API: `WP_HTML_Processor::create_fragment()`, `next_tag('TABLE')`, a single depth-bounded `next_token()` loop, tag closer handling, and `get_modifiable_text()` only on `#text` tokens. All called methods are documented in the two rendered files and no `_doing_it_wrong` records appeared. Minor issue: the incomplete-input check only runs when the table boundary was not observed; docs note virtual closers can still appear before `paused_at_incomplete_token()` is true." + }, + { + "trial_id": "trial-2", + "adherence": 89, + "hallucinated_methods": [], + "notes": "Correct processor choice and no undocumented API usage. The main walk is idiomatic and depth-bounded. The main near-miss is including `SCRIPT`, `STYLE`, `TEXTAREA`, and `TITLE` opener modifiable text inside cells. The docs describe that as an opt-in policy, while the task/reference use ordinary `#text` descendants only; for `SCRIPT`/`STYLE` this also appends raw, undecoded text. It also has no explicit incomplete-input policy." + }, + { + "trial_id": "trial-3", + "adherence": 93, + "hallucinated_methods": [], + "notes": "Used the documented HTML Processor APIs correctly with a single table-depth walk and decoded `#text` extraction. All method calls are documented and execution produced no misuse records. Slightly less explicit than trial 1 because it relies on `get_tag()` nullness rather than checking `#tag`, and its `paused_at_incomplete_token()`/`get_last_error()` check is bypassed once virtual table closers are observed." + } + ], + "failure_analysis": "All trials passed all 8 frozen cases: simple table, THEAD/TBODY, omitted closers, inline markup in cells, decoded entities, no table, first table only, and empty cells. The docs did well on the central decisions: the Tag Processor overview explicitly says to use the HTML Processor when structure, text collection, or omitted closing tags matter; the HTML Processor `next_token()` docs explain implied/virtual tokens, synthesized table structure such as TBODY, single-loop state tracking for repeated regions, and `>=` depth-bounded walks; `get_modifiable_text()` documents decoded text for `#text` nodes. Near-misses were outside the frozen suite. Trial 2 appears to have over-applied the special-element exception from `next_token()`/`get_modifiable_text()`, appending opener text for SCRIPT/STYLE/TEXTAREA/TITLE even though the ordinary subtree text recipe says to include only `#text` tokens unless the caller explicitly opts in. Trials 1 and 3 attempted incomplete-input handling, but in a way the docs make easy to get subtly wrong: a depth-bounded walk can see virtual closers and still leave `paused_at_incomplete_token()` true, so tying the check to a local `completed`/`finished_table` flag does not actually reject truncation if that was the intended policy.", + "doc_gaps": [ + { + "location": "html-processor.md: `next_token()` and `get_current_depth()` incomplete-input notes", + "problem": "The docs say to check `paused_at_incomplete_token()` when completeness matters, but do not make it concrete that virtual closers may be visited and the subtree boundary may be reached while the processor is still paused at truncated input.", + "suggestion": "Add a short trace example such as `
                      oknext_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = null; + $completed = false; + + while ( $processor->next_token() ) { + if ( $processor->get_current_depth() < $table_depth ) { + $completed = true; + break; + } + + if ( '#text' === $processor->get_token_type() ) { + if ( null !== $current_cell ) { + $current_cell .= $processor->get_modifiable_text(); + } + continue; + } + + if ( '#tag' !== $processor->get_token_type() ) { + continue; + } + + $tag = $processor->get_tag(); + if ( null === $tag ) { + continue; + } + + if ( $processor->is_tag_closer() ) { + if ( 'TD' === $tag || 'TH' === $tag ) { + if ( null !== $current_row && null !== $current_cell ) { + $current_row[] = $current_cell; + $current_cell = null; + } + continue; + } + + if ( 'TR' === $tag ) { + if ( null !== $current_row ) { + if ( null !== $current_cell ) { + $current_row[] = $current_cell; + $current_cell = null; + } + $rows[] = $current_row; + $current_row = null; + } + } + + continue; + } + + if ( 'TR' === $tag ) { + if ( null !== $current_row ) { + if ( null !== $current_cell ) { + $current_row[] = $current_cell; + $current_cell = null; + } + $rows[] = $current_row; + } + $current_row = array(); + continue; + } + + if ( ( 'TD' === $tag || 'TH' === $tag ) && null !== $current_row ) { + if ( null !== $current_cell ) { + $current_row[] = $current_cell; + } + $current_cell = ''; + } + } + + if ( ! $completed && null !== $processor->get_last_error() ) { + return array(); + } + + if ( ! $completed ) { + if ( null !== $current_row ) { + if ( null !== $current_cell ) { + $current_row[] = $current_cell; + } + $rows[] = $current_row; + } + } + + return $rows; +} diff --git a/doc-experiment/results/round-32/T08-table-extract/trial-1/execution.json b/doc-experiment/results/round-32/T08-table-extract/trial-1/execution.json new file mode 100644 index 0000000000000..b9f0b295f86c4 --- /dev/null +++ b/doc-experiment/results/round-32/T08-table-extract/trial-1/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T08-table-extract/trial-1/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T08-table-extract/trial-1/response.json b/doc-experiment/results/round-32/T08-table-extract/trial-1/response.json new file mode 100644 index 0000000000000..df75f020281c6 --- /dev/null +++ b/doc-experiment/results/round-32/T08-table-extract/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because table extraction is structure-sensitive, then finds the first `TABLE` with `next_tag()`. It walks the table subtree with a single `next_token()` loop, using `get_current_depth()` to stop at the table boundary, `get_token_type()`/`get_tag()`/`is_tag_closer()` to detect rows and cells, and `get_modifiable_text()` only on `#text` tokens so cell text matches ordinary DOM-style text with character references already decoded.", + "confidence": 88 +} diff --git a/doc-experiment/results/round-32/T08-table-extract/trial-2/candidate.php b/doc-experiment/results/round-32/T08-table-extract/trial-2/candidate.php new file mode 100644 index 0000000000000..736ce82f89cdc --- /dev/null +++ b/doc-experiment/results/round-32/T08-table-extract/trial-2/candidate.php @@ -0,0 +1,79 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = null; + + while ( $processor->next_token() ) { + if ( $processor->get_current_depth() < $table_depth ) { + break; + } + + $token_type = $processor->get_token_type(); + $token_name = $processor->get_token_name(); + + if ( '#tag' === $token_type ) { + if ( 'TR' === $token_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_cell ) { + $current_row[] = $current_cell; + $current_cell = null; + } + + if ( null !== $current_row ) { + $rows[] = $current_row; + $current_row = null; + } + } else { + $current_row = array(); + } + + continue; + } + + if ( 'TD' === $token_name || 'TH' === $token_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_cell && null !== $current_row ) { + $current_row[] = $current_cell; + } + $current_cell = null; + } elseif ( null !== $current_row ) { + $current_cell = ''; + } + + continue; + } + + if ( + null !== $current_cell && + ! $processor->is_tag_closer() && + ( + 'SCRIPT' === $token_name || + 'STYLE' === $token_name || + 'TEXTAREA' === $token_name || + 'TITLE' === $token_name + ) + ) { + $current_cell .= $processor->get_modifiable_text(); + } + + continue; + } + + if ( '#text' === $token_type && null !== $current_cell ) { + $current_cell .= $processor->get_modifiable_text(); + } + } + + return $rows; +} diff --git a/doc-experiment/results/round-32/T08-table-extract/trial-2/execution.json b/doc-experiment/results/round-32/T08-table-extract/trial-2/execution.json new file mode 100644 index 0000000000000..94179a7b28e22 --- /dev/null +++ b/doc-experiment/results/round-32/T08-table-extract/trial-2/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T08-table-extract/trial-2/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T08-table-extract/trial-2/response.json b/doc-experiment/results/round-32/T08-table-extract/trial-2/response.json new file mode 100644 index 0000000000000..ae8f277ce727f --- /dev/null +++ b/doc-experiment/results/round-32/T08-table-extract/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because table extraction is structure-sensitive, then finds the first `TABLE` with `next_tag()`, records its depth with `get_current_depth()`, and walks tokens with `next_token()` until leaving that subtree. It collects rows and cells from `TR`/`TD`/`TH` tag openers and closers via `get_token_type()`, `get_token_name()`, and `is_tag_closer()`, appending decoded text from `#text` tokens with `get_modifiable_text()` and also reading special text-only element contents (`SCRIPT`, `STYLE`, `TEXTAREA`, `TITLE`) from their opening tokens when they appear inside a cell.", + "confidence": 86 +} diff --git a/doc-experiment/results/round-32/T08-table-extract/trial-3/candidate.php b/doc-experiment/results/round-32/T08-table-extract/trial-3/candidate.php new file mode 100644 index 0000000000000..7f9f7fc3181f7 --- /dev/null +++ b/doc-experiment/results/round-32/T08-table-extract/trial-3/candidate.php @@ -0,0 +1,67 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = null; + $finished_table = false; + + while ( $processor->next_token() ) { + if ( $processor->get_current_depth() < $table_depth ) { + $finished_table = true; + break; + } + + $tag = $processor->get_tag(); + if ( null !== $tag ) { + if ( 'TR' === $tag ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_row ) { + $rows[] = $current_row; + $current_row = null; + } + } else { + $current_row = array(); + } + + continue; + } + + if ( 'TD' === $tag || 'TH' === $tag ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_row && null !== $current_cell ) { + $current_row[] = $current_cell; + $current_cell = null; + } + } else { + if ( null === $current_row ) { + $current_row = array(); + } + + $current_cell = ''; + } + + continue; + } + } + + if ( null !== $current_cell && '#text' === $processor->get_token_type() ) { + $current_cell .= $processor->get_modifiable_text(); + } + } + + if ( ! $finished_table && ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) ) { + return array(); + } + + return $rows; +} diff --git a/doc-experiment/results/round-32/T08-table-extract/trial-3/execution.json b/doc-experiment/results/round-32/T08-table-extract/trial-3/execution.json new file mode 100644 index 0000000000000..15ba7e33d6bd5 --- /dev/null +++ b/doc-experiment/results/round-32/T08-table-extract/trial-3/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T08-table-extract/trial-3/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T08-table-extract/trial-3/response.json b/doc-experiment/results/round-32/T08-table-extract/trial-3/response.json new file mode 100644 index 0000000000000..eb1177939a9db --- /dev/null +++ b/doc-experiment/results/round-32/T08-table-extract/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` so table parsing follows browser-style HTML structure, including implied table elements. It finds the first `TABLE` with `next_tag()`, then does a single `next_token()` walk bounded by the table depth, building rows on `TR` open/close, cells on `TD`/`TH` open/close, and concatenating only `#text` tokens with `get_modifiable_text()` so character references are already decoded and markup contributes nothing.", + "confidence": 89 +} diff --git a/doc-experiment/results/round-32/T09-mark-keyword/judge.json b/doc-experiment/results/round-32/T09-mark-keyword/judge.json new file mode 100644 index 0000000000000..53ef88455712e --- /dev/null +++ b/doc-experiment/results/round-32/T09-mark-keyword/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly chose WP_HTML_Processor::create_fragment for BODY-fragment parsing, walked tokens with next_token(), gated matching on get_token_type() === '#text', used get_modifiable_text() for decoded text, and emitted normalized output with serialize_token(). All called HTML API methods are documented in the two rendered files. The get_last_error() fallback is documented as a policy choice after token serialization and did not produce misuse records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same strong documented pattern as trial-1, with str_contains() for the task-level substring check. It correctly avoids attributes, comments, and special text-bearing elements by only wrapping ordinary #text tokens, and uses serialize_token() rather than get_updated_html() for a token-rewrite output stream." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Matches the reference approach most closely: create_fragment(), next_token(), #text filtering, decoded get_modifiable_text(), and serialize_token() wrapping. No undocumented methods or _doing_it_wrong records. Returning an empty string on processor creation/error is a reasonable string-returning rejection policy for this task." + } + ], + "failure_analysis": "All trials passed all frozen cases. The docs did well in three specific places: the HTML Processor overview explicitly steers BODY fragments to WP_HTML_Processor::create_fragment(); the text-extraction recipe says ordinary DOM text is only #text tokens and warns that get_modifiable_text() on every token is too broad; and serialize_token() is documented as the token-walking rewrite mechanism for wrapping, dropping, or adding output while preserving normalized serialization. The get_modifiable_text() docs also clearly state that #text text is already decoded, which explains why all candidates handled character references correctly. Near-misses were around policy rather than API misunderstanding: trial-1 and trial-2 return the original unnormalized input if create_fragment() fails or get_last_error() becomes non-null, while trial-3 returns ''. The docs say to reject or fall back after get_last_error(), but they do not give much guidance for string-returning normalizers where returning raw input can violate a normalized-output contract.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::serialize_token() docblock / rewrite-while-serializing recipe", + "problem": "The docs say to reject or fall back on get_last_error(), but do not distinguish safe fallbacks for functions whose contract promises normalized serialization. This leaves room for returning raw input after a partial parser abort.", + "suggestion": "Add a short note that if the caller promises normalized output, falling back to the original input may violate that contract; prefer a documented sentinel policy such as null, empty string, or a separately normalized fallback chosen by the caller." + }, + { + "location": "WP_HTML_Processor::create_fragment() returns section", + "problem": "The null return is documented, but the common failure conditions and recommended handling for default BODY/UTF-8 parsing are not concrete. Candidates defensively chose inconsistent null policies.", + "suggestion": "Clarify when create_fragment() can return null under currently supported defaults and show a minimal guard that ties the fallback to the caller's return type and output contract." + }, + { + "location": "Text extraction guidance around get_token_type() and get_modifiable_text()", + "problem": "The docs successfully warn that modifiable text is broader than ordinary text, but the Tag Processor token example uses get_token_name() for #text while other guidance uses get_token_type().", + "suggestion": "Add a concise cross-reference: use get_token_type() === '#text' when the distinction is token kind, and reserve get_token_name() for tag names or DOM-style node names. This would reduce ambiguity in text-walking examples." + } + ] +} diff --git a/doc-experiment/results/round-32/T09-mark-keyword/trial-1/candidate.php b/doc-experiment/results/round-32/T09-mark-keyword/trial-1/candidate.php new file mode 100644 index 0000000000000..537b55a6cb09c --- /dev/null +++ b/doc-experiment/results/round-32/T09-mark-keyword/trial-1/candidate.php @@ -0,0 +1,30 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text = $processor->get_modifiable_text(); + + if ( false !== strpos( $text, $keyword ) ) { + $output .= '' . $processor->serialize_token() . ''; + continue; + } + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + return $output; +} diff --git a/doc-experiment/results/round-32/T09-mark-keyword/trial-1/execution.json b/doc-experiment/results/round-32/T09-mark-keyword/trial-1/execution.json new file mode 100644 index 0000000000000..7ee71c8566e78 --- /dev/null +++ b/doc-experiment/results/round-32/T09-mark-keyword/trial-1/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T09-mark-keyword/trial-1/candidate.php", + "function": "mark_keyword", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple-unclosed", + "status": "pass", + "expected": "

                      hello world

                      ", + "actual": "

                      hello world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-text-nodes", + "status": "pass", + "expected": "

                      alpha beta

                      beta gamma

                      delta

                      ", + "actual": "

                      alpha beta

                      beta gamma

                      delta

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-attribute-not-wrapped", + "status": "pass", + "expected": "
                      somewhere world", + "actual": "somewhere world", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-encoded-keyword-matches", + "status": "pass", + "expected": "

                      world peace

                      ", + "actual": "

                      world peace

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "split-across-elements-no-match", + "status": "pass", + "expected": "

                      world

                      ", + "actual": "

                      world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-comment-not-wrapped", + "status": "pass", + "expected": "

                      world

                      ", + "actual": "

                      world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive", + "status": "pass", + "expected": "

                      World world

                      ", + "actual": "

                      World world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "normalization-side-effects", + "status": "pass", + "expected": "
                      bold world

                      unclosed & markup

                      ", + "actual": "
                      bold world

                      unclosed & markup

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T09-mark-keyword/trial-1/response.json b/doc-experiment/results/round-32/T09-mark-keyword/trial-1/response.json new file mode 100644 index 0000000000000..ebb997bc7a50f --- /dev/null +++ b/doc-experiment/results/round-32/T09-mark-keyword/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the input as a BODY fragment, then rewrites the normalized serialization token-by-token with `next_token()` and `serialize_token()`. Only ordinary `#text` tokens are considered, using decoded text from `get_modifiable_text()` for the case-sensitive substring check, and matching text tokens are wrapped as `...`, which naturally excludes attributes, comments, and special text-bearing elements like `script`, `style`, `title`, and `textarea` because those are not exposed as ordinary `#text` child tokens.", + "confidence": 83 +} diff --git a/doc-experiment/results/round-32/T09-mark-keyword/trial-2/candidate.php b/doc-experiment/results/round-32/T09-mark-keyword/trial-2/candidate.php new file mode 100644 index 0000000000000..3902460ebfbc3 --- /dev/null +++ b/doc-experiment/results/round-32/T09-mark-keyword/trial-2/candidate.php @@ -0,0 +1,29 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text = $processor->get_modifiable_text(); + + if ( str_contains( $text, $keyword ) ) { + $output .= '' . $processor->serialize_token() . ''; + continue; + } + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + return $output; +} diff --git a/doc-experiment/results/round-32/T09-mark-keyword/trial-2/execution.json b/doc-experiment/results/round-32/T09-mark-keyword/trial-2/execution.json new file mode 100644 index 0000000000000..a575aabf6467b --- /dev/null +++ b/doc-experiment/results/round-32/T09-mark-keyword/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T09-mark-keyword/trial-2/candidate.php", + "function": "mark_keyword", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple-unclosed", + "status": "pass", + "expected": "

                      hello world

                      ", + "actual": "

                      hello world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-text-nodes", + "status": "pass", + "expected": "

                      alpha beta

                      beta gamma

                      delta

                      ", + "actual": "

                      alpha beta

                      beta gamma

                      delta

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-attribute-not-wrapped", + "status": "pass", + "expected": "somewhere world", + "actual": "somewhere world", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-encoded-keyword-matches", + "status": "pass", + "expected": "

                      world peace

                      ", + "actual": "

                      world peace

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "split-across-elements-no-match", + "status": "pass", + "expected": "

                      world

                      ", + "actual": "

                      world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-comment-not-wrapped", + "status": "pass", + "expected": "

                      world

                      ", + "actual": "

                      world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive", + "status": "pass", + "expected": "

                      World world

                      ", + "actual": "

                      World world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "normalization-side-effects", + "status": "pass", + "expected": "
                      bold world

                      unclosed & markup

                      ", + "actual": "
                      bold world

                      unclosed & markup

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T09-mark-keyword/trial-2/response.json b/doc-experiment/results/round-32/T09-mark-keyword/trial-2/response.json new file mode 100644 index 0000000000000..891d754339577 --- /dev/null +++ b/doc-experiment/results/round-32/T09-mark-keyword/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-context fragment, then walks every token with `next_token()` and rebuilds normalized HTML with `serialize_token()`. Only ordinary `#text` tokens are considered for wrapping, and their decoded text is checked with `get_modifiable_text()`, so character references match correctly while special text-bearing elements like `script`, `style`, `title`, and `textarea` are left untouched.", + "confidence": 81 +} diff --git a/doc-experiment/results/round-32/T09-mark-keyword/trial-3/candidate.php b/doc-experiment/results/round-32/T09-mark-keyword/trial-3/candidate.php new file mode 100644 index 0000000000000..4127f28a48343 --- /dev/null +++ b/doc-experiment/results/round-32/T09-mark-keyword/trial-3/candidate.php @@ -0,0 +1,28 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text = $processor->get_modifiable_text(); + + if ( str_contains( $text, $keyword ) ) { + $output .= '' . $processor->serialize_token() . ''; + continue; + } + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + return $output; +} diff --git a/doc-experiment/results/round-32/T09-mark-keyword/trial-3/execution.json b/doc-experiment/results/round-32/T09-mark-keyword/trial-3/execution.json new file mode 100644 index 0000000000000..0fea277d0c659 --- /dev/null +++ b/doc-experiment/results/round-32/T09-mark-keyword/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T09-mark-keyword/trial-3/candidate.php", + "function": "mark_keyword", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple-unclosed", + "status": "pass", + "expected": "

                      hello world

                      ", + "actual": "

                      hello world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-text-nodes", + "status": "pass", + "expected": "

                      alpha beta

                      beta gamma

                      delta

                      ", + "actual": "

                      alpha beta

                      beta gamma

                      delta

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-attribute-not-wrapped", + "status": "pass", + "expected": "somewhere world", + "actual": "somewhere world", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-encoded-keyword-matches", + "status": "pass", + "expected": "

                      world peace

                      ", + "actual": "

                      world peace

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "split-across-elements-no-match", + "status": "pass", + "expected": "

                      world

                      ", + "actual": "

                      world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-comment-not-wrapped", + "status": "pass", + "expected": "

                      world

                      ", + "actual": "

                      world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive", + "status": "pass", + "expected": "

                      World world

                      ", + "actual": "

                      World world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "normalization-side-effects", + "status": "pass", + "expected": "
                      bold world

                      unclosed & markup

                      ", + "actual": "
                      bold world

                      unclosed & markup

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T09-mark-keyword/trial-3/response.json b/doc-experiment/results/round-32/T09-mark-keyword/trial-3/response.json new file mode 100644 index 0000000000000..9a26e88976885 --- /dev/null +++ b/doc-experiment/results/round-32/T09-mark-keyword/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` and a token-by-token rewrite loop with `next_token()` plus `serialize_token()` so the result is normalized HTML. It wraps only ordinary `#text` tokens whose decoded text from `get_modifiable_text()` contains the case-sensitive keyword, which automatically excludes attributes, comments, and special text-bearing elements like `script`, `style`, `title`, and `textarea` because those are not exposed as ordinary `#text` child tokens.", + "confidence": 88 +} diff --git a/doc-experiment/results/round-32/T10-last-h2/judge.json b/doc-experiment/results/round-32/T10-last-h2/judge.json new file mode 100644 index 0000000000000..66f73e776efef --- /dev/null +++ b/doc-experiment/results/round-32/T10-last-h2/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Tag_Processor for a flat position-based class edit. All called methods are documented: next_tag, set_bookmark, seek, add_class, release_bookmark, and get_updated_html. The repeated literal bookmark pattern is idiomatic; the extra found_h2 flag is redundant but harmless. Passed 6/6 with no _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correct processor choice and canonical bookmark approach: scan H2 tags, keep moving one bookmark, seek back, add_class, release, then get_updated_html. All methods are present in the rendered docs. Passed 6/6 with no _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same correct documented pattern as trial-2. It uses the Tag Processor, a single reusable bookmark, has_bookmark/seek checks, add_class for preserving existing classes, and get_updated_html for byte-preserving output. Passed 6/6 with no _doing_it_wrong records." + } + ], + "failure_analysis": "No failed hidden cases across trials. All three passed two-headings, single-heading, no-headings-unchanged, many-headings, comment-h2-not-counted, and existing-class. The docs did well here: the Tag Processor overview explicitly says it is the right tool for flat tag/class edits; next_tag documents forward token walking and tag-name queries; the bookmarks section explicitly describes re-setting the same bookmark name to remember the last matching tag; add_class documents creating/appending/preserving classes; get_updated_html documents returning queued edits while preserving untouched bytes. The only near-miss was trial-1 carrying a separate found_h2 flag instead of relying solely on has_bookmark, but that is still documented and correct.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::set_bookmark() rendered method docs", + "problem": "The HTML Processor bookmark docs do not mirror the Tag Processor's explicit statement that setting an existing bookmark name moves it. A reader starting from the Processor docs could miss the reusable-bookmark idiom.", + "suggestion": "Repeat or cross-reference the bookmark contract: reusing a bookmark name moves it to the current token, and this is the supported way to track the most recent matching token." + }, + { + "location": "WP_HTML_Tag_Processor::next_tag() method docs", + "problem": "The comment/rawtext non-match behavior is inferable but spread across sections. The hidden comment case depends on knowing that tag-like text inside comments is not returned as a tag.", + "suggestion": "Add a concise method-level note that next_tag only matches complete HTML tag tokens in parsed syntax, not tag-shaped text inside comments, text nodes, or rawtext/plaintext content." + }, + { + "location": "WP_HTML_Tag_Processor::get_updated_html() method docs", + "problem": "The no-op case is implicit. Some solutions early-returned the original HTML when no bookmark existed, which is fine, but readers may not know get_updated_html is also safe with no queued updates.", + "suggestion": "State explicitly that if no updates were enqueued, get_updated_html returns the original input bytes unchanged." + } + ] +} diff --git a/doc-experiment/results/round-32/T10-last-h2/trial-1/candidate.php b/doc-experiment/results/round-32/T10-last-h2/trial-1/candidate.php new file mode 100644 index 0000000000000..3d84be60774d6 --- /dev/null +++ b/doc-experiment/results/round-32/T10-last-h2/trial-1/candidate.php @@ -0,0 +1,22 @@ +next_tag( 'H2' ) ) { + $processor->set_bookmark( 'last-h2' ); + $found_h2 = true; + } + + if ( ! $found_h2 ) { + return $html; + } + + if ( $processor->seek( 'last-h2' ) ) { + $processor->add_class( 'final-section' ); + } + + $processor->release_bookmark( 'last-h2' ); + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-32/T10-last-h2/trial-1/execution.json b/doc-experiment/results/round-32/T10-last-h2/trial-1/execution.json new file mode 100644 index 0000000000000..801c7209f4108 --- /dev/null +++ b/doc-experiment/results/round-32/T10-last-h2/trial-1/execution.json @@ -0,0 +1,62 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T10-last-h2/trial-1/candidate.php", + "function": "mark_last_h2", + "passed": 6, + "total": 6, + "cases": [ + { + "id": "two-headings", + "status": "pass", + "expected": "

                      One

                      a

                      Two

                      b

                      ", + "actual": "

                      One

                      a

                      Two

                      b

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "single-heading", + "status": "pass", + "expected": "

                      Only

                      ", + "actual": "

                      Only

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-headings-unchanged", + "status": "pass", + "expected": "

                      nothing

                      ", + "actual": "

                      nothing

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "many-headings", + "status": "pass", + "expected": "

                      1

                      2

                      3

                      4

                      5

                      6

                      7

                      8

                      9

                      10

                      11

                      12

                      ", + "actual": "

                      1

                      2

                      3

                      4

                      5

                      6

                      7

                      8

                      9

                      10

                      11

                      12

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comment-h2-not-counted", + "status": "pass", + "expected": "

                      Real

                      ", + "actual": "

                      Real

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class", + "status": "pass", + "expected": "

                      A

                      B

                      ", + "actual": "

                      A

                      B

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T10-last-h2/trial-1/response.json b/doc-experiment/results/round-32/T10-last-h2/trial-1/response.json new file mode 100644 index 0000000000000..fdfe8ee98b91a --- /dev/null +++ b/doc-experiment/results/round-32/T10-last-h2/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a single linear scan, because the task is position-based rather than tree-based. It repeatedly finds `H2` openers with `next_tag( 'H2' )`, moves one named bookmark to the latest match with `set_bookmark()`, seeks back to that final match with `seek()`, adds the class via `add_class()`, and returns the edited markup with `get_updated_html()`.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-32/T10-last-h2/trial-2/candidate.php b/doc-experiment/results/round-32/T10-last-h2/trial-2/candidate.php new file mode 100644 index 0000000000000..f87b3e865c40e --- /dev/null +++ b/doc-experiment/results/round-32/T10-last-h2/trial-2/candidate.php @@ -0,0 +1,21 @@ +next_tag( 'H2' ) ) { + $processor->set_bookmark( 'last-h2' ); + } + + if ( ! $processor->has_bookmark( 'last-h2' ) ) { + return $html; + } + + if ( ! $processor->seek( 'last-h2' ) ) { + return $html; + } + + $processor->add_class( 'final-section' ); + $processor->release_bookmark( 'last-h2' ); + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-32/T10-last-h2/trial-2/execution.json b/doc-experiment/results/round-32/T10-last-h2/trial-2/execution.json new file mode 100644 index 0000000000000..c41a7ea3e970b --- /dev/null +++ b/doc-experiment/results/round-32/T10-last-h2/trial-2/execution.json @@ -0,0 +1,62 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T10-last-h2/trial-2/candidate.php", + "function": "mark_last_h2", + "passed": 6, + "total": 6, + "cases": [ + { + "id": "two-headings", + "status": "pass", + "expected": "

                      One

                      a

                      Two

                      b

                      ", + "actual": "

                      One

                      a

                      Two

                      b

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "single-heading", + "status": "pass", + "expected": "

                      Only

                      ", + "actual": "

                      Only

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-headings-unchanged", + "status": "pass", + "expected": "

                      nothing

                      ", + "actual": "

                      nothing

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "many-headings", + "status": "pass", + "expected": "

                      1

                      2

                      3

                      4

                      5

                      6

                      7

                      8

                      9

                      10

                      11

                      12

                      ", + "actual": "

                      1

                      2

                      3

                      4

                      5

                      6

                      7

                      8

                      9

                      10

                      11

                      12

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comment-h2-not-counted", + "status": "pass", + "expected": "

                      Real

                      ", + "actual": "

                      Real

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class", + "status": "pass", + "expected": "

                      A

                      B

                      ", + "actual": "

                      A

                      B

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T10-last-h2/trial-2/response.json b/doc-experiment/results/round-32/T10-last-h2/trial-2/response.json new file mode 100644 index 0000000000000..68f1d91f82f9c --- /dev/null +++ b/doc-experiment/results/round-32/T10-last-h2/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a single linear scan of the HTML, repeatedly moving the same bookmark to each matched `H2` with `next_tag()` and `set_bookmark()`. After the scan, if a bookmark exists, it seeks back to that last `H2`, adds the `final-section` class with `add_class()`, and returns the modified markup with `get_updated_html()`.", + "confidence": 96 +} diff --git a/doc-experiment/results/round-32/T10-last-h2/trial-3/candidate.php b/doc-experiment/results/round-32/T10-last-h2/trial-3/candidate.php new file mode 100644 index 0000000000000..f87b3e865c40e --- /dev/null +++ b/doc-experiment/results/round-32/T10-last-h2/trial-3/candidate.php @@ -0,0 +1,21 @@ +next_tag( 'H2' ) ) { + $processor->set_bookmark( 'last-h2' ); + } + + if ( ! $processor->has_bookmark( 'last-h2' ) ) { + return $html; + } + + if ( ! $processor->seek( 'last-h2' ) ) { + return $html; + } + + $processor->add_class( 'final-section' ); + $processor->release_bookmark( 'last-h2' ); + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-32/T10-last-h2/trial-3/execution.json b/doc-experiment/results/round-32/T10-last-h2/trial-3/execution.json new file mode 100644 index 0000000000000..82d466a70993f --- /dev/null +++ b/doc-experiment/results/round-32/T10-last-h2/trial-3/execution.json @@ -0,0 +1,62 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T10-last-h2/trial-3/candidate.php", + "function": "mark_last_h2", + "passed": 6, + "total": 6, + "cases": [ + { + "id": "two-headings", + "status": "pass", + "expected": "

                      One

                      a

                      Two

                      b

                      ", + "actual": "

                      One

                      a

                      Two

                      b

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "single-heading", + "status": "pass", + "expected": "

                      Only

                      ", + "actual": "

                      Only

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-headings-unchanged", + "status": "pass", + "expected": "

                      nothing

                      ", + "actual": "

                      nothing

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "many-headings", + "status": "pass", + "expected": "

                      1

                      2

                      3

                      4

                      5

                      6

                      7

                      8

                      9

                      10

                      11

                      12

                      ", + "actual": "

                      1

                      2

                      3

                      4

                      5

                      6

                      7

                      8

                      9

                      10

                      11

                      12

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comment-h2-not-counted", + "status": "pass", + "expected": "

                      Real

                      ", + "actual": "

                      Real

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class", + "status": "pass", + "expected": "

                      A

                      B

                      ", + "actual": "

                      A

                      B

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T10-last-h2/trial-3/response.json b/doc-experiment/results/round-32/T10-last-h2/trial-3/response.json new file mode 100644 index 0000000000000..60201a2cdd41d --- /dev/null +++ b/doc-experiment/results/round-32/T10-last-h2/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat, position-based scan: `next_tag( 'H2' )` finds each `H2`, `set_bookmark()` repeatedly moves the same bookmark to the most recent match, `has_bookmark()` and `seek()` return to the final `H2`, `add_class()` appends `final-section`, and `get_updated_html()` returns the modified HTML while preserving all untouched bytes.", + "confidence": 97 +} diff --git a/doc-experiment/results/round-32/T11-strip-tracking-attributes/judge.json b/doc-experiment/results/round-32/T11-strip-tracking-attributes/judge.json new file mode 100644 index 0000000000000..19ae7ac09f37c --- /dev/null +++ b/doc-experiment/results/round-32/T11-strip-tracking-attributes/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Tag_Processor for a flat attribute-editing task. All called APIs are documented in the supplied markdown: constructor usage, next_tag(), get_attribute_names_with_prefix(), remove_attribute(), and get_updated_html(). The loop follows the documented tag-walking/update pattern, handles the documented null return from get_attribute_names_with_prefix(), relies on documented case-insensitive prefix matching, and returns byte-preserving updated HTML." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation as trial-1. Correct processor choice, no undocumented API calls, no _doing_it_wrong records, and idiomatic use of next_tag(), prefix attribute discovery, remove_attribute(), and get_updated_html(). Edge behavior around case-insensitive attributes, no matching prefix, comments, and preserving untouched bytes is aligned with the docs." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation as trial-1. Uses the documented Tag Processor path for per-tag attribute edits, avoids Processor serialization/normalization APIs that would be inappropriate here, and makes only documented calls. Execution recorded no warnings or misuse." + } + ], + "failure_analysis": "All three trials passed all hidden cases: single-link, multiple-tags, multiple-matching-attributes, similar-prefixes-kept, uppercase-source-attribute, comments-untouched, and no-matches. The docs worked well for this task because the Tag Processor overview explicitly says to use it for flat attribute/class edits with byte-preserving output, the usage section shows new WP_HTML_Tag_Processor($html) plus next_tag(), get_attribute_names_with_prefix() documents case-insensitive prefix matching and lowercase returned names, remove_attribute() documents safe attribute removal, and get_updated_html() documents that untouched bytes are preserved. The main near-miss is that get_attribute_names_with_prefix() is clearer about null when no tag opener is matched than about the matched-tag/no-prefix case; the candidates wrote a null guard and relied on foreach over an empty array, which worked, but the exact empty-array contract should be more explicit.", + "doc_gaps": [ + { + "location": "html-tag-processor.md#get_attribute_names_with_prefix", + "problem": "The return contract distinguishes null for no matched opener, but does not prominently state what is returned when a tag is matched and no attributes have the requested prefix.", + "suggestion": "State explicitly that a matched opener with no prefix matches returns an empty array, while null means the processor is not currently matched on a tag opener. Add a small example for both cases." + }, + { + "location": "html-tag-processor.md#remove_attribute", + "problem": "The method docs are thin and do not directly connect normalized lowercase names returned from get_attribute_names_with_prefix() with removing source attributes that used different casing.", + "suggestion": "Document that attribute-name matching for removal is ASCII case-insensitive in HTML context and that names returned by get_attribute_names_with_prefix() are safe to pass to remove_attribute()." + }, + { + "location": "html-tag-processor.md#Modifying HTML attributes for a found tag", + "problem": "The section explains setting/removing individual known attributes, but does not include a general pattern for bulk edits over discovered attribute names.", + "suggestion": "Add a generic bulk-attribute-edit recipe showing: walk tags with next_tag(), collect matching attribute names via a documented discovery API, mutate each attribute, then return get_updated_html(). Keep the example domain-neutral rather than task-specific." + } + ] +} diff --git a/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-1/candidate.php b/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-1/candidate.php new file mode 100644 index 0000000000000..71af3ac08c942 --- /dev/null +++ b/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-1/candidate.php @@ -0,0 +1,19 @@ +next_tag() ) { + $attributes = $processor->get_attribute_names_with_prefix( 'data-track-' ); + + if ( null === $attributes ) { + continue; + } + + foreach ( $attributes as $attribute_name ) { + $processor->remove_attribute( $attribute_name ); + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-1/execution.json b/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-1/execution.json new file mode 100644 index 0000000000000..b21439cb76b34 --- /dev/null +++ b/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-1/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-1/candidate.php", + "function": "strip_tracking_attributes", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "single-link", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-tags", + "status": "pass", + "expected": "

                      Text

                      ", + "actual": "

                      Text

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-matching-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "similar-prefixes-kept", + "status": "pass", + "expected": "x", + "actual": "x", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-source-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comments-untouched", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-matches", + "status": "pass", + "expected": "

                      Text

                      ", + "actual": "

                      Text

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-1/response.json b/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-1/response.json new file mode 100644 index 0000000000000..0bed7728a9c72 --- /dev/null +++ b/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat scan of every tag with `next_tag()`, finds matching attributes via `get_attribute_names_with_prefix( 'data-track-' )`, removes each one with `remove_attribute()`, and returns the minimally changed result with `get_updated_html()`.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-2/candidate.php b/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-2/candidate.php new file mode 100644 index 0000000000000..71af3ac08c942 --- /dev/null +++ b/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-2/candidate.php @@ -0,0 +1,19 @@ +next_tag() ) { + $attributes = $processor->get_attribute_names_with_prefix( 'data-track-' ); + + if ( null === $attributes ) { + continue; + } + + foreach ( $attributes as $attribute_name ) { + $processor->remove_attribute( $attribute_name ); + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-2/execution.json b/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-2/execution.json new file mode 100644 index 0000000000000..555009d042e2f --- /dev/null +++ b/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-2/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-2/candidate.php", + "function": "strip_tracking_attributes", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "single-link", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-tags", + "status": "pass", + "expected": "

                      Text

                      ", + "actual": "

                      Text

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-matching-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "similar-prefixes-kept", + "status": "pass", + "expected": "x", + "actual": "x", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-source-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comments-untouched", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-matches", + "status": "pass", + "expected": "

                      Text

                      ", + "actual": "

                      Text

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-2/response.json b/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-2/response.json new file mode 100644 index 0000000000000..01f22713ed864 --- /dev/null +++ b/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat scan of every tag with `next_tag()`, then calls the documented `get_attribute_names_with_prefix( 'data-track-' )` helper to find only attributes whose names begin with that exact prefix and removes each one with `remove_attribute()`. The final HTML is returned with `get_updated_html()`, preserving all untouched bytes.", + "confidence": 97 +} diff --git a/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-3/candidate.php b/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-3/candidate.php new file mode 100644 index 0000000000000..df0f95c384e15 --- /dev/null +++ b/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-3/candidate.php @@ -0,0 +1,18 @@ +next_tag() ) { + $attributes = $processor->get_attribute_names_with_prefix( 'data-track-' ); + + if ( null === $attributes ) { + continue; + } + + foreach ( $attributes as $attribute_name ) { + $processor->remove_attribute( $attribute_name ); + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-3/execution.json b/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-3/execution.json new file mode 100644 index 0000000000000..2d481e2e6a04c --- /dev/null +++ b/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-3/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-3/candidate.php", + "function": "strip_tracking_attributes", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "single-link", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-tags", + "status": "pass", + "expected": "

                      Text

                      ", + "actual": "

                      Text

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-matching-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "similar-prefixes-kept", + "status": "pass", + "expected": "x", + "actual": "x", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-source-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comments-untouched", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-matches", + "status": "pass", + "expected": "

                      Text

                      ", + "actual": "

                      Text

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-3/response.json b/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-3/response.json new file mode 100644 index 0000000000000..62b5025adb5a9 --- /dev/null +++ b/doc-experiment/results/round-32/T11-strip-tracking-attributes/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat scan of every tag with `next_tag()`, then uses the documented `get_attribute_names_with_prefix( 'data-track-' )` helper to find exactly the matching attributes on each opener and removes them with `remove_attribute()`. The final HTML is returned with `get_updated_html()`, preserving all untouched bytes.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-32/T12-unwrap-spans/judge.json b/doc-experiment/results/round-32/T12-unwrap-spans/judge.json new file mode 100644 index 0000000000000..b725083a5889e --- /dev/null +++ b/doc-experiment/results/round-32/T12-unwrap-spans/judge.json @@ -0,0 +1,45 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 99, + "hallucinated_methods": [], + "notes": "Used the correct `WP_HTML_Processor::create_fragment()` path, walked with `next_token()`, skipped `SPAN` tokens using documented `get_tag()`, and built normalized output with `serialize_token()`. All called methods are present in the rendered docs and no `_doing_it_wrong` records appeared. Minor deduction only for using `''` as an undocumented rejection sentinel on parser abort." + }, + { + "trial_id": "trial-2", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Same documented token-serialization approach as the reference and passed all cases. All API calls are documented. The weaker point is fallback policy: returning raw original `$html` on factory failure or parser abort is a fallback, but it can silently keep spans and non-normalized markup, so it is less aligned with the task contract than rejecting with a clear sentinel." + }, + { + "trial_id": "trial-3", + "adherence": 99, + "hallucinated_methods": [], + "notes": "Matches the documented HTML Processor rewrite pattern: fragment parser, `next_token()`, skip tag tokens by `get_tag()`, append `serialize_token()`, then check `get_last_error()`. No hallucinated methods or runtime misuse. Same small sentinel-policy caveat as trial-1." + } + ], + "failure_analysis": "No hidden case failed in any trial. The docs did well in three places: the processor-choice guidance says to use the HTML Processor for structure and normalized output; the `next_token()` docs explain that closers, including implicit/end-of-input closers, are visited; and the `serialize_token()` section gives a near-isomorphic example: remove every element of a given tag while keeping contents by skipping both opener and closer and appending serialized tokens. The only near-miss was error policy. The candidates split between returning an empty string and returning original HTML on `get_last_error()`, which reflects that the docs say to reject or fall back but do not define a clear contract for typed string-returning rewrite helpers.", + "doc_gaps": [ + { + "location": "`WP_HTML_Processor::serialize_token()` docblock / rewrite recipe", + "problem": "The docs say to reject or fall back when `get_last_error()` is non-null, but do not clarify that accumulated output before an unsupported-parser abort is only partial, nor what fallback means for transforms that promise normalized rewritten HTML.", + "suggestion": "Add a short contract note: after parser abort, token-accumulated output must not be treated as a complete normalized serialization; callers should choose an explicit policy such as return `null`, throw, or return original only if unchanged raw fallback is acceptable." + }, + { + "location": "`WP_HTML_Processor::get_tag()` docblock", + "problem": "The method says it returns the uppercase matched tag name, but the closer behavior is only obvious from the `serialize_token()` example, not from the method contract itself.", + "suggestion": "Add an example showing `next_token()` on `x` where both opener and closer return `SPAN`, `is_tag_closer()` distinguishes them, and non-tag tokens return `null`." + }, + { + "location": "`WP_HTML_Processor::next_token()` / `paused_at_incomplete_token()` docs", + "problem": "The distinction between an unclosed element that receives virtual closers and a truly incomplete trailing syntax token is easy to miss.", + "suggestion": "Add paired examples: `

                      x` serializes with virtual closers and is not paused; `

                      next_token() ) { + if ( 'SPAN' === $processor->get_tag() ) { + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + return $output; +} diff --git a/doc-experiment/results/round-32/T12-unwrap-spans/trial-1/execution.json b/doc-experiment/results/round-32/T12-unwrap-spans/trial-1/execution.json new file mode 100644 index 0000000000000..e845ebe6bcea7 --- /dev/null +++ b/doc-experiment/results/round-32/T12-unwrap-spans/trial-1/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T12-unwrap-spans/trial-1/candidate.php", + "function": "unwrap_spans", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

                      a b c d

                      ", + "actual": "

                      a b c d

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-spans", + "status": "pass", + "expected": "

                      outer inner tail

                      ", + "actual": "

                      outer inner tail

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-spans-normalized-passthrough", + "status": "pass", + "expected": "

                      plain & simple

                      ", + "actual": "

                      plain & simple

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attributes-discarded", + "status": "pass", + "expected": "styled", + "actual": "styled", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "adjacent-spans", + "status": "pass", + "expected": "

                      ab

                      ", + "actual": "

                      ab

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "span-with-block-content", + "status": "pass", + "expected": "
                      before after
                      ", + "actual": "
                      before after
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-span", + "status": "pass", + "expected": "

                      runs to end

                      ", + "actual": "

                      runs to end

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T12-unwrap-spans/trial-1/response.json b/doc-experiment/results/round-32/T12-unwrap-spans/trial-1/response.json new file mode 100644 index 0000000000000..8651345fb139e --- /dev/null +++ b/doc-experiment/results/round-32/T12-unwrap-spans/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with full HTML structure awareness, then walks every token via `next_token()`. For any `SPAN` token, it skips serialization so both the opener and closer are removed, while all other tokens are appended with `serialize_token()`, which produces the normalized HTML serialization directly.", + "confidence": 79 +} diff --git a/doc-experiment/results/round-32/T12-unwrap-spans/trial-2/candidate.php b/doc-experiment/results/round-32/T12-unwrap-spans/trial-2/candidate.php new file mode 100644 index 0000000000000..91a98469c1c14 --- /dev/null +++ b/doc-experiment/results/round-32/T12-unwrap-spans/trial-2/candidate.php @@ -0,0 +1,25 @@ +next_token() ) { + if ( 'SPAN' === $processor->get_tag() ) { + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + return $output; +} diff --git a/doc-experiment/results/round-32/T12-unwrap-spans/trial-2/execution.json b/doc-experiment/results/round-32/T12-unwrap-spans/trial-2/execution.json new file mode 100644 index 0000000000000..e37f3570e5b76 --- /dev/null +++ b/doc-experiment/results/round-32/T12-unwrap-spans/trial-2/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T12-unwrap-spans/trial-2/candidate.php", + "function": "unwrap_spans", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

                      a b c d

                      ", + "actual": "

                      a b c d

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-spans", + "status": "pass", + "expected": "

                      outer inner tail

                      ", + "actual": "

                      outer inner tail

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-spans-normalized-passthrough", + "status": "pass", + "expected": "

                      plain & simple

                      ", + "actual": "

                      plain & simple

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attributes-discarded", + "status": "pass", + "expected": "styled", + "actual": "styled", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "adjacent-spans", + "status": "pass", + "expected": "

                      ab

                      ", + "actual": "

                      ab

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "span-with-block-content", + "status": "pass", + "expected": "
                      before after
                      ", + "actual": "
                      before after
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-span", + "status": "pass", + "expected": "

                      runs to end

                      ", + "actual": "

                      runs to end

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T12-unwrap-spans/trial-2/response.json b/doc-experiment/results/round-32/T12-unwrap-spans/trial-2/response.json new file mode 100644 index 0000000000000..4b734ba6cbbba --- /dev/null +++ b/doc-experiment/results/round-32/T12-unwrap-spans/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, then walks every token with `next_token()`. For each token, it skips `SPAN` tags entirely and appends every other token\u2019s normalized serialization with `serialize_token()`, which preserves contents while removing both span openers and closers and produces normalized HTML output.", + "confidence": 86 +} diff --git a/doc-experiment/results/round-32/T12-unwrap-spans/trial-3/candidate.php b/doc-experiment/results/round-32/T12-unwrap-spans/trial-3/candidate.php new file mode 100644 index 0000000000000..1f382bbd3a117 --- /dev/null +++ b/doc-experiment/results/round-32/T12-unwrap-spans/trial-3/candidate.php @@ -0,0 +1,23 @@ +next_token() ) { + if ( 'SPAN' === $processor->get_tag() ) { + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + return $output; +} diff --git a/doc-experiment/results/round-32/T12-unwrap-spans/trial-3/execution.json b/doc-experiment/results/round-32/T12-unwrap-spans/trial-3/execution.json new file mode 100644 index 0000000000000..5228a056f057d --- /dev/null +++ b/doc-experiment/results/round-32/T12-unwrap-spans/trial-3/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-32/T12-unwrap-spans/trial-3/candidate.php", + "function": "unwrap_spans", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

                      a b c d

                      ", + "actual": "

                      a b c d

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-spans", + "status": "pass", + "expected": "

                      outer inner tail

                      ", + "actual": "

                      outer inner tail

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-spans-normalized-passthrough", + "status": "pass", + "expected": "

                      plain & simple

                      ", + "actual": "

                      plain & simple

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attributes-discarded", + "status": "pass", + "expected": "styled", + "actual": "styled", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "adjacent-spans", + "status": "pass", + "expected": "

                      ab

                      ", + "actual": "

                      ab

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "span-with-block-content", + "status": "pass", + "expected": "
                      before after
                      ", + "actual": "
                      before after
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-span", + "status": "pass", + "expected": "

                      runs to end

                      ", + "actual": "

                      runs to end

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-32/T12-unwrap-spans/trial-3/response.json b/doc-experiment/results/round-32/T12-unwrap-spans/trial-3/response.json new file mode 100644 index 0000000000000..48ca66a2b2338 --- /dev/null +++ b/doc-experiment/results/round-32/T12-unwrap-spans/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the input as a body fragment, then walks the token stream with `next_token()`. For every token except `SPAN` openers and closers, it appends the normalized token text from `serialize_token()`, which preserves the fragment in normalized HTML form while unwrapping all spans and keeping their contents in place.", + "confidence": 86 +} diff --git a/doc-experiment/results/round-32/codex-judges-output.json b/doc-experiment/results/round-32/codex-judges-output.json new file mode 100644 index 0000000000000..aa7832d822239 --- /dev/null +++ b/doc-experiment/results/round-32/codex-judges-output.json @@ -0,0 +1,654 @@ +{ + "result": [ + { + "id": "N03-first-list-count", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), documented token walking, depth-bounded subtree scanning, bookmarks, seek(), set_attribute(), and get_updated_html(). All called methods appear in the rendered docs, and execution recorded no _doing_it_wrong misuse." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same correct structural approach as the reference: HTML Processor, bookmark the list opener, walk tokens by depth, count direct LI openers, reject incomplete/unsupported scans, seek back, and update with get_updated_html(). get_token_type() use is documented and appropriate." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correct processor and documented API usage throughout. The bookmark/depth/token-walk pattern follows the rendered recipe closely, handles incomplete and unsupported markup, and uses get_updated_html() rather than serialization for the queued attribute update." + } + ], + "failure_analysis": "No hidden cases failed in any trial. The docs did especially well in four places: the processor-choice guidance says to use WP_HTML_Processor when structure matters; next_tag() explicitly says tag_name is not a list of alternatives and shows the scan-and-branch pattern for UL/OL; the \"scan a region before editing its opener\" recipe describes bookmark, walk, clean-scan check, seek, and edit; and get_current_depth()/next_token() explain why bounded subtree walks use >= and must still check paused_at_incomplete_token() and get_last_error(). Near-misses: trial-1 followed the recipe's get_tag()-inside-next_token() style without first checking get_token_type(), which is valid here but could be ambiguous for less obvious token loops. Also, paused_at_incomplete_token() is heavily relied on from HTML Processor examples while its method documentation lives under the Tag Processor, so users may need to connect inherited APIs across files.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::next_token() / WP_HTML_Processor::get_current_depth()", + "problem": "The docs explain bounded subtree walks, but the direct-child predicate is implicit. Users must infer that a direct child opener is a tag opener at parent_depth + 1 while deeper matching tags are descendants.", + "suggestion": "Add a small generic example showing how to distinguish direct child elements from deeper descendants using a recorded parent depth, get_token_type() == '#tag', ! is_tag_closer(), and get_current_depth() === parent_depth + 1." + }, + { + "location": "WP_HTML_Processor inherited methods / paused_at_incomplete_token() references", + "problem": "HTML Processor examples rely on paused_at_incomplete_token(), but the primary method entry is in the Tag Processor docs. The HTML Processor method index does not make this inherited availability obvious enough.", + "suggestion": "Add an inherited-method cross-reference or short HTML Processor subsection for paused_at_incomplete_token(), clarifying that it is available on WP_HTML_Processor and should be paired with get_last_error() after bounded scans that drive mutations." + }, + { + "location": "WP_HTML_Processor::next_token() clean-scan guidance", + "problem": "The docs say to reject truncated or unsupported scans, but they could more explicitly distinguish completing the target region from validating the entire remaining document.", + "suggestion": "State that after a depth-bounded walk exits because the target element closed, paused_at_incomplete_token() and get_last_error() reflect parser state reached during that walk; unvisited trailing markup does not need to invalidate a mutation whose contract only depends on the scanned region." + } + ] + } + }, + { + "id": "N04-normalize-or-placeholder", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the correct documented API, `WP_HTML_Processor::normalize()`, and handled its `null` return with a strict check. This is the intended HTML Processor path for BODY-context fragment normalization; no undocumented calls or `_doing_it_wrong` records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same canonical implementation: correct processor choice, documented method use, idiomatic normalization path, and correct `null` fallback handling. No unnecessary token walking or mutation APIs." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same canonical implementation: `WP_HTML_Processor::normalize()` is documented in the rendered HTML Processor docs and directly matches the task. Handles unsupported input via `null` and preserves empty-string normalization behavior." + } + ], + "failure_analysis": "No hidden case failed in any trial. The documentation did well on the important decision points: the Tag Processor docs say to use the HTML Processor for implied or missing closing tags and normalized output, and the HTML Processor `normalize()` docs state that it normalizes BODY-context fragments, double-quotes attributes, inserts omitted tags, re-encodes text, omits incomplete trailing syntax, and returns `string|null` with `null` when unable to normalize. The unsupported misnesting cases were handled because candidates trusted that `null` contract. The only near-miss is that the rendered docs do not make the warning side effect obvious: the unsupported cases passed but execution recorded `WP_HTML_Processor::serialize` warnings emitted internally by `normalize()` before returning `null`. That did not indicate candidate misuse here, but it is a behavior callers may need to understand.", + "doc_gaps": [ + { + "location": "`WP_HTML_Processor::normalize()` docblock", + "problem": "The `null` return is documented, but unsupported-markup behavior is abstract and examples only show successful normalization or incomplete trailing syntax being omitted.", + "suggestion": "Add a short general example where unsupported structural markup returns `null`, and cross-reference `get_last_error()` / `get_unsupported_exception()` for diagnosing why normalization could not complete." + }, + { + "location": "`WP_HTML_Processor::normalize()` and `serialize()` docblocks", + "problem": "The rendered docs say output methods return `null` when unable to normalize, but do not state that the `serialize()` path may emit a user warning before returning `null`. Hidden execution surfaced this side effect on unsupported input.", + "suggestion": "Document the warning behavior on the `null` path, or explicitly state whether callers should expect `normalize()` / `serialize()` to be warning-emitting APIs when unsupported markup is encountered." + }, + { + "location": "HTML Processor overview / normalization docs", + "problem": "The docs correctly distinguish normalization from byte-preserving updates, but the distinction is split across class overview, `serialize()`, and Tag Processor `get_updated_html()` docs.", + "suggestion": "Add one concise cross-reference near `normalize()` saying normalization produces a new browser-style serialization and is not the API for retrieving queued attribute/class/text edits; use `get_updated_html()` for those edits." + } + ] + } + }, + { + "id": "N06-extract-toc", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Uses WP_HTML_Processor::create_fragment(), scans heading openers, records depth, and collects only descendant #text tokens with get_modifiable_text(). This closely matches the documented subtree-text recipe and handles decoded entities, empty headings, case normalization, implied heading closes, and incomplete trailing syntax." + }, + { + "trial_id": "trial-2", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Uses the correct processor and only documented APIs. The single-pass next_token() state machine is supported by the docs' closer-driven repeated-region pattern. Minor reservation: it relies on a single current-heading state rather than an explicit depth/breadcrumb boundary, but virtual closers make it work for the tested malformed heading cases." + }, + { + "trial_id": "trial-3", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Uses the correct processor and only documented APIs, with a documented closer-driven token walk. Slightly weaker edge posture than trial-2 because it only flushes on a heading closer and has no final/error fallback; normal incomplete headings still work because the HTML Processor emits virtual closers, but an unsupported-parser abort inside a heading would drop the partial heading." + } + ], + "failure_analysis": "No hidden case failed in execution.json: all three trials passed 7/7 with no _doing_it_wrong records. The docs appear to have worked well for this task: the processor-selection guidance explicitly says to use WP_HTML_Processor for collecting element text and handling implied/missing closing tags; the subtree text recipe shows next_tag(), get_current_depth(), next_token(), get_token_type() === '#text', and get_modifiable_text(); the next_token() docs explain virtual closers and malformed input; get_modifiable_text() explains decoded text, which prevented double-decoding entities. Near-misses: trial-1 included an unnecessary is_tag_closer() check after plain next_tag(), suggesting the default closer-skipping behavior may be easy to miss; trials 2 and 3 used the documented single-pass closer pattern instead of depth bounds, which is valid here but depends on readers understanding virtual closer guarantees; trial-3 would lose a heading if parsing aborts on unsupported markup before a closer is emitted.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::next_tag()", + "problem": "The fact that plain next_tag() visits only openers is present in the parameter table, but easy to miss.", + "suggestion": "Move a short sentence near the method summary and usage examples: by default next_tag() skips tag closers; pass array( 'tag_closers' => 'visit' ) only when closer events are part of the algorithm." + }, + { + "location": "WP_HTML_Processor::next_token() and get_current_depth()", + "problem": "The docs include both a warning about nested token walks and examples of depth-bounded subtree walks; the boundary between safe repeated subtree scans and unsafe nested scans could be clearer.", + "suggestion": "Add a general note explaining when an outer next_tag() plus one depth-bounded inner next_token() scan is safe, and when a single-pass state machine is preferred because sibling boundary tokens must be observed." + }, + { + "location": "WP_HTML_Processor::get_modifiable_text() / collect DOM-style text recipe", + "problem": "The docs say 'DOM-style text' while recommending #text-only collection that excludes special-element opener text such as SCRIPT, STYLE, TITLE, and TEXTAREA unless opted in.", + "suggestion": "Name the policies explicitly: ordinary element text uses only #text tokens; full textContent-like extraction must also whitelist special element openers and read their get_modifiable_text()." + }, + { + "location": "WP_HTML_Processor incomplete/unsupported input guidance", + "problem": "The docs explain paused_at_incomplete_token() and get_last_error() mostly for mutations and rewrites, leaving read-only extractors without an explicit default policy.", + "suggestion": "Add guidance for extractors: either return best-effort data from visited tokens or reject/return null when completeness matters, and show checking paused_at_incomplete_token() and get_last_error() in that context." + } + ] + } + }, + { + "id": "T01-add-image-class", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the documented Tag Processor path: `new WP_HTML_Tag_Processor`, `next_tag( 'img' )`, `add_class()`, and `get_updated_html()`. This matches the docs' flat, byte-preserving attribute/class-edit pattern. No `_doing_it_wrong` records; all 8 hidden cases passed." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Identical correct use of the documented API. Processor choice, loop shape, class helper, and final serialization are all idiomatic for this task. No undocumented methods or runtime misuse; all 8 hidden cases passed." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Identical correct implementation. It relied on documented behavior for case-insensitive tag queries, comment/raw-text exclusion, class appending, incomplete-token non-matching, and byte-preserving `get_updated_html()`. No hallucinated API; all 8 hidden cases passed." + } + ], + "failure_analysis": "All trials passed every hidden case. The docs did well on the exact decision points this task required: the Tag Processor overview explicitly recommends it for flat tag/class edits and byte-precise preservation; `next_tag()` documents the shorthand string query, ASCII case-insensitive tag-name matching, exclusion of tag-like text inside comments/raw-text elements, and incomplete-token pausing; `add_class()` documents creating a class attribute when absent, appending without removing or reordering existing classes, and avoiding duplicates; `get_updated_html()` documents that untouched bytes are preserved exactly. Near-miss: the high-level class-modification section says removing the only class removes the whole attribute, which is about `remove_class()` but appears in a paragraph about adding/removing generally. The later `add_class()` method detail clarifies this, so the trials were not misled.", + "doc_gaps": [ + { + "location": "WP_HTML_Tag_Processor > Modifying CSS classes for a found tag", + "problem": "The section-level prose combines add and remove semantics, and the sentence about removing the only class could be misread as applying to class helpers generally.", + "suggestion": "Split the add and remove contracts into separate short paragraphs: `add_class()` creates/appends/no-ops on duplicates and never removes; `remove_class()` removes matching classes and removes the attribute only when the final class is removed." + }, + { + "location": "WP_HTML_Tag_Processor > Finding tags", + "problem": "The quick query table shows `next_tag( 'img' )`, but the edge-case guarantees that made this task safe are mainly in the later method detail.", + "suggestion": "Add one sentence after the quick table: string tag-name queries are ASCII case-insensitive and match only real tag tokens, not comments, text, or raw-text contents." + }, + { + "location": "WP_HTML_Tag_Processor > get_updated_html()", + "problem": "The byte-preservation contract is documented, but it is distant from the common `while next_tag/add_class` pattern.", + "suggestion": "Add a compact end-to-end class-edit example that ends with `get_updated_html()` and states that only the edited attribute bytes are rewritten while unrelated markup remains unchanged." + } + ] + } + }, + { + "id": "T02-link-targets", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly chose WP_HTML_Tag_Processor for a byte-preserving flat attribute edit. All called APIs are documented: constructor, next_tag(), get_attribute(), set_attribute(), and get_updated_html(). Uses the documented null check for attribute presence, so empty-string and valueless attributes are handled." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation pattern as the reference: linear A-tag scan, null-only missing-attribute test, set_attribute() overwrite/insert, and get_updated_html() for byte-preserving output. No undocumented API usage or _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correct and idiomatic Tag Processor use. The explanation explicitly recognizes boolean href as true and empty href as present. No hallucinated methods; all frozen cases passed without API misuse records." + } + ], + "failure_analysis": "No hidden case failed in any trial. The rendered docs worked well for this task: the Tag Processor overview says it is for flat attribute/class edits that preserve bytes; the Usage section shows construction with new WP_HTML_Tag_Processor($html), next_tag(), set_attribute(), and get_updated_html(); get_attribute() documents null for missing attributes, empty string for present-empty attributes, and true for valueless/boolean attributes; set_attribute() documents overwriting existing attributes and insertion placement; next_tag() documents case-insensitive tag-name matching and ignoring tag-like text in comments/raw text. The main near-miss is that the correct presence idiom depends on comparing against null rather than using truthiness, but the docs were explicit enough that all subjects followed it.", + "doc_gaps": [ + { + "location": "WP_HTML_Tag_Processor::get_attribute()", + "problem": "The return-value contract is present, but the safest general presence-test idiom is not emphasized as a standalone rule.", + "suggestion": "Add a short note: to test whether an attribute exists, compare the return value with null; do not use truthiness because empty strings and true both represent present attributes." + }, + { + "location": "WP_HTML_Tag_Processor::set_attribute() / get_updated_html()", + "problem": "Byte preservation and attribute placement are documented, but they are split across sections, which can make expected before/after ordering harder to infer quickly.", + "suggestion": "Add a compact before/after example showing a new attribute inserted after the tag name while untouched attributes keep original spelling, quoting, and order." + } + ] + } + }, + { + "id": "T03-first-h1-text", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 94, + "hallucinated_methods": [], + "notes": "Used the correct tree-aware WP_HTML_Processor with create_fragment(), next_tag('H1'), a recorded get_current_depth(), and a depth-bounded next_token() walk. Every called method is present in the rendered docs and execution recorded no _doing_it_wrong notices. Minor deduction: it also whitelists SCRIPT, STYLE, TEXTAREA, and TITLE opener modifiable text. The docs' DOM-style text recipe says ordinary subtree text should append only #text tokens unless the caller explicitly opts into special-element contents; this task did not require that. Passed 8/8 frozen cases." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "This matches the documented and canonical pattern exactly: create a fragment processor, find the first H1, record its depth, walk tokens while depth stays >= the opener depth, and append get_modifiable_text() only for #text tokens. It handles decoded text, image-only empty string, missing H1 as null, nested markup, and the unclosed H1 case without undocumented calls. Passed 8/8 frozen cases with no _doing_it_wrong notices." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same high-adherence solution as trial 2. It chooses WP_HTML_Processor for structure, uses only documented methods, applies the documented subtree text walk with the correct >= depth guard, and relies on get_modifiable_text() for decoded #text content. Passed 8/8 frozen cases with no _doing_it_wrong notices." + } + ], + "failure_analysis": "No hidden case failed in any trial; all candidates passed all 8 frozen expectations. The docs did well in several places: Tag Processor > Which processor should I use? explicitly directs text-content extraction and subtree walking to WP_HTML_Processor; HTML Processor > Recipe: collect DOM-style text from a subtree gives almost exactly the needed pattern; next_token() and get_current_depth() explain why the walk must be bounded and why the guard must be >=; get_modifiable_text() documents decoded #text output; and the depth/virtual-closer behavior supports the unclosed-H1 case. The only near-miss is trial-1's special-element handling. It likely overgeneralized HTML Processor > next_token(), which says SCRIPT, STYLE, TITLE, and TEXTAREA have no #text child tokens and their text is carried on the opener. The more controlling passage is HTML Processor > Recipe: collect DOM-style text from a subtree, especially the default policy saying ordinary subtree text is only reached #text tokens and special-element opener text should be opt-in. A test such as an H1 containing SCRIPT or TEXTAREA would distinguish that interpretation from the canonical policy.", + "doc_gaps": [ + { + "location": "html-processor.md > next_token() special-element exception", + "problem": "The paragraph correctly explains that special elements carry modifiable text on their opener token, but outside the subtree-text recipe it can read like a general instruction to include that text during element text extraction.", + "suggestion": "Add a cross-reference sentence: read special-element opener text only when the caller explicitly wants those element contents; for ordinary DOM-style subtree text, continue collecting only #text tokens as shown in the recipe." + }, + { + "location": "html-processor.md > Recipe: collect DOM-style text from a subtree", + "problem": "The recipe is strong, but the contract could be named more explicitly so readers can distinguish ordinary descendant text from visible text, all modifiable text, comments, and special-element raw/plaintext contents.", + "suggestion": "Precede the example with a compact contract statement: ordinary subtree text means descendant #text tokens reached by a depth- or breadcrumb-bounded HTML Processor walk; comments, processing instructions, and special-element opener text are excluded unless deliberately whitelisted." + }, + { + "location": "html-processor.md > get_current_depth() / subtree walk guidance", + "problem": "Incomplete input is discussed mainly for mutations and clean scans, while read-only extraction readers may not know whether an unclosed container should be rejected or parsed best-effort.", + "suggestion": "Add a read-only note: a bounded walk can return best-effort text from the parsed tree even when trailing markup is unclosed; check paused_at_incomplete_token only when the caller requires proof of complete source or before applying mutations." + } + ] + } + }, + { + "id": "T04-build-figure", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the correct `WP_HTML_Tag_Processor` for byte-exact template filling. Every called method is documented: `next_tag`, `set_attribute`, `next_token`, `get_token_type`, `set_modifiable_text`, and `get_updated_html`. The approach follows the documented template pattern, preserves attribute order by predeclaring attributes, and relies on API encoding for attributes and text." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation as trial-1. Correct processor choice, no undocumented API calls, idiomatic token walk to the placeholder `#text` node, and correct use of `get_updated_html()` after queued edits." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation as trial-1. Handles the documented escaping edge cases through `set_attribute()` and `set_modifiable_text()` with plain, unescaped input values; no `_doing_it_wrong` records were emitted." + } + ], + "failure_analysis": "All trials passed all 7 hidden cases, so there were no functional failures to attribute to documentation gaps. The docs did especially well in `WP_HTML_Tag_Processor` > `Building markup from a template`, which directly explained using a literal shape, preexisting empty attributes for stable attribute order, placeholder text for later replacement, `next_token()` plus `#text`, and `get_updated_html()`. The `set_attribute()` section also clearly states that callers provide plain unescaped values and that new attributes sort by name, while existing attributes retain position. The `set_modifiable_text()` section clearly says it accepts plaintext and encodes as needed, and warns that empty elements have no text token to replace. Near-miss: all candidates ignored the documented advice to check `set_modifiable_text()`'s boolean return value. In this fixed-template case the `#text` guard makes failure unlikely, but the examples themselves also omit the check, so models may learn to ignore the return contract in riskier contexts.", + "doc_gaps": [ + { + "location": "html-tag-processor.md: `WP_HTML_Tag_Processor::set_modifiable_text()` examples and `Building markup from a template` recipe", + "problem": "The prose says to always check the boolean return value, but the nearby examples call `set_modifiable_text()` without checking it. This weakens the contract even though the submitted solutions happened to be safe for the fixed template.", + "suggestion": "Make example code consistent with the contract: either check the return value or explicitly state when a prior `#text` token guard plus known template makes omission acceptable." + }, + { + "location": "html-tag-processor.md: `Building markup from a template` recipe", + "problem": "The recipe scans for the first `#text` token. That is fine for compact single-placeholder templates, but general templates with whitespace, multiple placeholders, or preexisting text nodes can make 'first text token' the wrong target.", + "suggestion": "Add a general note that placeholder text should be uniquely reachable, and that more complex templates should first navigate to the intended region or use structural checks rather than replacing the first text token blindly." + } + ] + } + }, + { + "id": "T05-text-excerpt", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Processor::create_fragment() for a body fragment, walked tokens with documented next_token(), gated ordinary text by get_token_type() === '#text', and explicitly whitelisted TITLE/TEXTAREA opener tokens before calling get_modifiable_text(). All API calls appear in the rendered docs; execution had no _doing_it_wrong records. Accumulating the full text before truncating is less efficient than necessary but not an API-adherence problem." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same correct documented API pattern as the reference, with an efficient running mb_strlen()/mb_substr() truncation path. It follows the docs' distinction between ordinary #text tokens and opt-in special element text, and avoids raw SCRIPT/STYLE modifiable text. No undocumented methods or misuse records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Uses only documented methods, including get_last_error(), and otherwise follows the documented fragment/token/text walk pattern. The final get_last_error() fallback is conservative and not required by the task, but it is a documented post-scan concern rather than a hallucinated API use. No _doing_it_wrong records." + } + ], + "failure_analysis": "No failed hidden cases across trials. All three passed 10/10 with no _doing_it_wrong or trigger_error entries. The docs did well in three places: the Tag Processor overview explicitly says to use the HTML Processor for collecting an element's text content; WP_HTML_Processor::next_token() explains that text may be split across #text tokens and that SCRIPT, STYLE, TITLE, and TEXTAREA carry text on the element token instead of child #text tokens; and get_modifiable_text() states that #text, TITLE, and TEXTAREA are decoded UTF-8 while SCRIPT/STYLE are raw. The HTML Processor recipe also warns not to append get_modifiable_text() from every token and instead to whitelist token types. The only near-miss was trial-3's empty-string fallback on get_last_error(): reasonable from the docs' scan-safety language, but the docs do not fully define the expected policy for read-only text extraction after unsupported markup or incomplete trailing syntax.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_modifiable_text()", + "problem": "The method accurately describes all tokens with modifiable text, but that broad contract can still tempt callers to treat it as DOM textContent.", + "suggestion": "Add a prominent note that get_modifiable_text() is not a text-content predicate: callers should first decide eligible token types, usually #text plus explicit special-element opener opt-ins." + }, + { + "location": "WP_HTML_Processor::next_token() and scan recipes", + "problem": "The docs mention get_last_error() and paused_at_incomplete_token(), but do not clearly separate policies for mutations/rewrites from best-effort read-only extraction.", + "suggestion": "Document post-scan policy choices: when partial accumulated data is valid, when callers should reject or fallback, and what is guaranteed after unsupported markup or incomplete trailing syntax." + }, + { + "location": "Text handling examples around next_token()/get_modifiable_text()", + "problem": "The docs recommend mb_substr(..., 'UTF-8') but do not fully spell out length measurement and code-point versus grapheme-cluster expectations.", + "suggestion": "Pair truncation examples with mb_strlen(..., 'UTF-8') and clarify that mb_* slicing is suitable for Unicode code-point limits, while grapheme_* APIs are needed for user-perceived character limits." + } + ] + } + }, + { + "id": "T06-collect-links", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), walked tokens, filtered href with is_string(), appended only #text get_modifiable_text(), and relied on documented virtual/end-of-input closers. All HTML API methods used are present in the rendered docs; no _doing_it_wrong records; passed 8/8." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Essentially matches the documented subtree-text recipe and canonical reference: next_tag('A'), get_attribute(), get_current_depth(), bounded next_token() walk with >= depth, #text guard, get_modifiable_text(). All API calls are documented; no _doing_it_wrong records; passed 8/8." + }, + { + "trial_id": "trial-3", + "adherence": 99, + "hallucinated_methods": [], + "notes": "Used the correct HTML Processor and a documented single-pass token walk with depth state. get_tag(), is_tag_closer(), get_current_depth(), get_attribute(), get_token_type(), and get_modifiable_text() are all documented. Minor reservation: it records the link on opener rather than flushing on structural close, but its depth reset follows the documented closer-depth contract. No _doing_it_wrong records; passed 8/8." + } + ], + "failure_analysis": "No hidden case failed in any trial. The docs were effective for this task because they directly covered the required decisions: the Tag Processor overview says to use WP_HTML_Processor for collecting element text and missing/implied closers; the HTML Processor subtree-text recipe shows the key next_tag + get_current_depth + next_token + #text + get_modifiable_text pattern; get_attribute documents string|true|null so subjects used is_string() and excluded missing/boolean href; get_modifiable_text documents decoded text for #text nodes; and next_token/get_current_depth document virtual/end-of-input closers and >= depth bounds, which explains the unclosed-link case. Near misses: trial-1 depended on closer-driven flushing, but the next_token section’s DT example and closer guarantee made that a documented pattern. trial-2 used an inner bounded walk despite the broader warning about nested next_token loops; it is safe here because the outer scan is next_tag('A'), but the warning could be read too broadly. trial-3 used a depth-drop state machine rather than the exact recipe, and get_current_depth’s closer-depth explanation was enough to make it correct.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_attribute() docblock", + "problem": "The HTML Processor method entry lists string|true|null but omits the decoded-value explanation that appears in the Tag Processor docs. Readers using only the method entry may not know attribute strings are already entity-decoded.", + "suggestion": "Repeat the inherited contract in the HTML Processor entry: string values are decoded; valueless attributes return true; absent/unavailable attributes return null; callers that require a real value should test is_string()." + }, + { + "location": "WP_HTML_Processor::next_token() section, nested-loop warning", + "problem": "The warning correctly discourages nested next_token loops for repeated regions, but it does not distinguish that a next_tag() outer scan plus a bounded next_token() subtree walk can be appropriate for independent matched elements.", + "suggestion": "Add a short clarification of when bounded subtree walks compose safely with next_tag(), and when repeated extraction should instead use a single token loop with state." + }, + { + "location": "WP_HTML_Processor subtree-text recipe", + "problem": "The recipe says ordinary text is only #text tokens, but examples do not explicitly call out that descendant element attributes such as img alt are not DOM text content.", + "suggestion": "Add one general example showing inline markup text is concatenated while void/replaced elements and their attributes contribute no text unless the caller explicitly reads attributes." + }, + { + "location": "Incomplete-input guidance in next_token()/get_current_depth docs", + "problem": "The docs mention checking paused_at_incomplete_token() when a result must reject truncated input, but the distinction between structural best-effort extraction and complete-source validation is easy to miss.", + "suggestion": "State explicitly that virtual closers make read-only structural extraction possible for unclosed elements, while paused_at_incomplete_token() is a policy check for callers that require complete source or are about to mutate/serialize output." + } + ] + } + }, + { + "id": "T07-nested-lists", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Used the correct tree-aware `WP_HTML_Processor::create_fragment()` path, walked open tags with `next_tag()`, checked `get_breadcrumbs()` excluding the current element, used documented `add_class()`, and returned via `get_updated_html()`. Also checked `get_last_error()`. Minor edge-case gap: it does not check `paused_at_incomplete_token()`, though that is not needed for the frozen cases." + }, + { + "trial_id": "trial-2", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Essentially the same high-adherence implementation as trial 1. Processor choice, breadcrumb ancestor logic, class mutation, and output retrieval all match documented API patterns. No undocumented calls or `_doing_it_wrong` records. Same small omission around incomplete-token detection." + }, + { + "trial_id": "trial-3", + "adherence": 97, + "hallucinated_methods": [], + "notes": "All API calls are documented, including inherited `paused_at_incomplete_token()`. Correctly uses `WP_HTML_Processor`, breadcrumbs, `add_class()`, and `get_updated_html()`. The preliminary full-document pass is conservative and documented-adjacent, but slightly over-broad for this task because it rejects any incomplete trailing syntax instead of editing complete visited tokens." + } + ], + "failure_analysis": "All three trials passed all 7 frozen cases, so there are no failed hidden cases to attribute to misconceptions. The docs did well on the central decision: the Tag Processor overview explicitly says it has no tree awareness and that `get_breadcrumbs()` belongs to `WP_HTML_Processor`, while the HTML Processor overview points to structure-aware parsing. The `next_tag()` docs also clearly warn that `tag_name` is not a list of alternatives, which likely pushed candidates toward scanning all tags and branching on `get_tag()`. The `get_breadcrumbs()` docs were sufficient for candidates to infer that the current element is included and must be excluded for ancestor-only checks. The main near-miss is incomplete input: trials 1 and 2 ignore `paused_at_incomplete_token()`, while trial 3 preflights and rejects incomplete input wholesale. That variance suggests the docs describe the mechanism but not the recommended mutation policy for byte-preserving filters.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_breadcrumbs() / Breadcrumbs overview", + "problem": "The docs state that breadcrumbs include the current matched node, but they do not explicitly name the common ancestor-only idiom. Implementers must infer that containment checks should ignore the final breadcrumb.", + "suggestion": "Add a short note and generic example: for ancestor checks, inspect `array_slice( $processor->get_breadcrumbs(), 0, -1 )`; the final item is the current token, not an ancestor." + }, + { + "location": "WP_HTML_Tag_Processor::paused_at_incomplete_token() and HTML Processor recipes", + "problem": "The docs explain how to detect truncated syntax, but not how that state should affect class/attribute mutation workflows that otherwise preserve untouched bytes.", + "suggestion": "Document the policy distinction: `get_updated_html()` preserves unvisited trailing incomplete syntax, while callers needing all-or-nothing or complete-subtree results should check `paused_at_incomplete_token()` after draining the processor and fall back." + }, + { + "location": "WP_HTML_Processor::get_last_error() / HTML Support", + "problem": "The unsupported-markup guidance says the parser aborts and exposes `get_last_error()`, but it is not explicit whether queued edits before the abort should be returned or discarded by mutating filters.", + "suggestion": "Add guidance for mutating callbacks: after a scan, check `get_last_error()` if partial edits are unacceptable; otherwise `get_updated_html()` returns queued edits plus untouched input bytes." + } + ] + } + }, + { + "id": "T08-table-extract", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 94, + "hallucinated_methods": [], + "notes": "Used the right structural API: `WP_HTML_Processor::create_fragment()`, `next_tag('TABLE')`, a single depth-bounded `next_token()` loop, tag closer handling, and `get_modifiable_text()` only on `#text` tokens. All called methods are documented in the two rendered files and no `_doing_it_wrong` records appeared. Minor issue: the incomplete-input check only runs when the table boundary was not observed; docs note virtual closers can still appear before `paused_at_incomplete_token()` is true." + }, + { + "trial_id": "trial-2", + "adherence": 89, + "hallucinated_methods": [], + "notes": "Correct processor choice and no undocumented API usage. The main walk is idiomatic and depth-bounded. The main near-miss is including `SCRIPT`, `STYLE`, `TEXTAREA`, and `TITLE` opener modifiable text inside cells. The docs describe that as an opt-in policy, while the task/reference use ordinary `#text` descendants only; for `SCRIPT`/`STYLE` this also appends raw, undecoded text. It also has no explicit incomplete-input policy." + }, + { + "trial_id": "trial-3", + "adherence": 93, + "hallucinated_methods": [], + "notes": "Used the documented HTML Processor APIs correctly with a single table-depth walk and decoded `#text` extraction. All method calls are documented and execution produced no misuse records. Slightly less explicit than trial 1 because it relies on `get_tag()` nullness rather than checking `#tag`, and its `paused_at_incomplete_token()`/`get_last_error()` check is bypassed once virtual table closers are observed." + } + ], + "failure_analysis": "All trials passed all 8 frozen cases: simple table, THEAD/TBODY, omitted closers, inline markup in cells, decoded entities, no table, first table only, and empty cells. The docs did well on the central decisions: the Tag Processor overview explicitly says to use the HTML Processor when structure, text collection, or omitted closing tags matter; the HTML Processor `next_token()` docs explain implied/virtual tokens, synthesized table structure such as TBODY, single-loop state tracking for repeated regions, and `>=` depth-bounded walks; `get_modifiable_text()` documents decoded text for `#text` nodes. Near-misses were outside the frozen suite. Trial 2 appears to have over-applied the special-element exception from `next_token()`/`get_modifiable_text()`, appending opener text for SCRIPT/STYLE/TEXTAREA/TITLE even though the ordinary subtree text recipe says to include only `#text` tokens unless the caller explicitly opts in. Trials 1 and 3 attempted incomplete-input handling, but in a way the docs make easy to get subtly wrong: a depth-bounded walk can see virtual closers and still leave `paused_at_incomplete_token()` true, so tying the check to a local `completed`/`finished_table` flag does not actually reject truncation if that was the intended policy.", + "doc_gaps": [ + { + "location": "html-processor.md: `next_token()` and `get_current_depth()` incomplete-input notes", + "problem": "The docs say to check `paused_at_incomplete_token()` when completeness matters, but do not make it concrete that virtual closers may be visited and the subtree boundary may be reached while the processor is still paused at truncated input.", + "suggestion": "Add a short trace example such as `
                      ok

returns ABCDEF, while the reference returns AF." + } + ], + "failure_analysis": "All trials passed every frozen hidden case. The docs were effective on the main contract: html-processor.md's 'Recipe: collect DOM-style text from a subtree' gives the exact shape needed, and html-tag-processor.md's 'Which processor should I use?' warns that the Tag Processor has no tree awareness. The get_modifiable_text() section clearly states that #text values are decoded, which prevented double-decoding in the entities case. The next_token() and get_current_depth() passages explain virtual closers, implied structure, and the >= recorded-depth boundary, which covered nested markup, deep nesting, first-of-two, image-only, and the unclosed-H1 case. Near-misses: trial 1 copied get_last_error() cleanup from clean mutation/rewrite patterns, although the extraction task did not ask to reject unsupported parser aborts. Trial 3 overgeneralized the special-elements passage: the docs mention opener-carried text for SCRIPT/STYLE/TEXTAREA/TITLE, but the relevant ordinary text recipe says to append only #text unless the caller explicitly opts into those token types.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_modifiable_text() docblock", + "problem": "The method-level text explains that special elements carry modifiable text, but a reader can overgeneralize that into ordinary subtree text extraction.", + "suggestion": "Repeat the opt-in warning in the docblock with a compact example showing ordinary #text extraction excluding SCRIPT/STYLE/TEXTAREA/TITLE opener text, and a separate example for callers that intentionally include special-element contents." + }, + { + "location": "WP_HTML_Processor::get_last_error() docblock and clean-scan recipe references", + "problem": "Clean-scan checks are easy to copy into read-only extraction tasks, changing a caller's not-found semantics into parser-error semantics.", + "suggestion": "Add guidance that get_last_error() is a policy check: use it when the caller requires a complete supported parse or before applying mutations, but read-only best-effort extraction may choose a different contract." + }, + { + "location": "WP_HTML_Processor::next_token() / get_current_depth() docblocks", + "problem": "The subtree-boundary idiom is crucial and was learned here, but it is spread across overview recipes and method docs.", + "suggestion": "Include a short method-level subtree walk example that records opener depth and continues while current depth is >= that depth, explicitly noting that this also works for implied or virtual closers in malformed input." + } + ] +} diff --git a/doc-experiment/results/round-38/T03-first-h1-text/trial-1/candidate.php b/doc-experiment/results/round-38/T03-first-h1-text/trial-1/candidate.php new file mode 100644 index 0000000000000..083cdbc29734c --- /dev/null +++ b/doc-experiment/results/round-38/T03-first-h1-text/trial-1/candidate.php @@ -0,0 +1,27 @@ +next_tag( 'H1' ) ) { + return null; + } + + $h1_depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + if ( null !== $processor->get_last_error() ) { + return null; + } + + return $text; +} diff --git a/doc-experiment/results/round-38/T03-first-h1-text/trial-1/execution.json b/doc-experiment/results/round-38/T03-first-h1-text/trial-1/execution.json new file mode 100644 index 0000000000000..9ad8e6b879a69 --- /dev/null +++ b/doc-experiment/results/round-38/T03-first-h1-text/trial-1/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-38/T03-first-h1-text/trial-1/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-38/T03-first-h1-text/trial-1/response.json b/doc-experiment/results/round-38/T03-first-h1-text/trial-1/response.json new file mode 100644 index 0000000000000..f15ec44b5a710 --- /dev/null +++ b/doc-experiment/results/round-38/T03-first-h1-text/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation, then finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, and walks its subtree with `next_token()`, appending only `#text` tokens via `get_modifiable_text()`. That follows the documented HTML Processor recipe for DOM-style subtree text and relies on its decoded text handling for character references.", + "confidence": 87 +} diff --git a/doc-experiment/results/round-38/T03-first-h1-text/trial-2/candidate.php b/doc-experiment/results/round-38/T03-first-h1-text/trial-2/candidate.php new file mode 100644 index 0000000000000..9853515781c4d --- /dev/null +++ b/doc-experiment/results/round-38/T03-first-h1-text/trial-2/candidate.php @@ -0,0 +1,23 @@ +next_tag( 'H1' ) ) { + return null; + } + + $h1_depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-38/T03-first-h1-text/trial-2/execution.json b/doc-experiment/results/round-38/T03-first-h1-text/trial-2/execution.json new file mode 100644 index 0000000000000..af8bd483bc392 --- /dev/null +++ b/doc-experiment/results/round-38/T03-first-h1-text/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-38/T03-first-h1-text/trial-2/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-38/T03-first-h1-text/trial-2/response.json b/doc-experiment/results/round-38/T03-first-h1-text/trial-2/response.json new file mode 100644 index 0000000000000..60908206981b7 --- /dev/null +++ b/doc-experiment/results/round-38/T03-first-h1-text/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation, then finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, and walks its subtree with `next_token()`, appending only `#text` tokens via `get_token_type()` and `get_modifiable_text()`. That yields decoded ordinary text content across nested elements while ignoring markup-only content like images.", + "confidence": 94 +} diff --git a/doc-experiment/results/round-38/T03-first-h1-text/trial-3/candidate.php b/doc-experiment/results/round-38/T03-first-h1-text/trial-3/candidate.php new file mode 100644 index 0000000000000..92ebc112e3810 --- /dev/null +++ b/doc-experiment/results/round-38/T03-first-h1-text/trial-3/candidate.php @@ -0,0 +1,37 @@ +next_tag( 'H1' ) ) { + return null; + } + + $depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + continue; + } + + if ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) { + $token_name = $processor->get_token_name(); + + if ( + 'SCRIPT' === $token_name || + 'STYLE' === $token_name || + 'TEXTAREA' === $token_name || + 'TITLE' === $token_name + ) { + $text .= $processor->get_modifiable_text(); + } + } + } + + return $text; +} diff --git a/doc-experiment/results/round-38/T03-first-h1-text/trial-3/execution.json b/doc-experiment/results/round-38/T03-first-h1-text/trial-3/execution.json new file mode 100644 index 0000000000000..c08269d89ae83 --- /dev/null +++ b/doc-experiment/results/round-38/T03-first-h1-text/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-38/T03-first-h1-text/trial-3/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-38/T03-first-h1-text/trial-3/response.json b/doc-experiment/results/round-38/T03-first-h1-text/trial-3/response.json new file mode 100644 index 0000000000000..ed1787b9ac33c --- /dev/null +++ b/doc-experiment/results/round-38/T03-first-h1-text/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, `next_tag( 'H1' )` to find the first heading, `get_current_depth()` plus a `next_token()` walk to stay inside that H1 subtree, and `get_modifiable_text()` to concatenate decoded text from `#text` tokens. Because the HTML Processor documents that `SCRIPT`, `STYLE`, `TEXTAREA`, and `TITLE` carry their text on the opening tag instead of separate `#text` children, the function explicitly includes those opener-carried texts as well.", + "confidence": 85 +} diff --git a/doc-experiment/results/round-38/T05-text-excerpt/judge.json b/doc-experiment/results/round-38/T05-text-excerpt/judge.json new file mode 100644 index 0000000000000..0b7e1feebe23c --- /dev/null +++ b/doc-experiment/results/round-38/T05-text-excerpt/judge.json @@ -0,0 +1,45 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 92, + "hallucinated_methods": [], + "notes": "Used the correct primary API: WP_HTML_Processor::create_fragment(), next_token(), get_token_type(), is_tag_closer(), get_tag(), and get_modifiable_text(), all documented. Correctly whitelisted #text plus TITLE/TEXTAREA opener text and used UTF-8 codepoint truncation. Minor adherence loss: the fallback to WP_HTML_Tag_Processor is documented but discouraged for DOM-style fragment text extraction, because it loses HTML Processor tree semantics on unsupported input." + }, + { + "trial_id": "trial-2", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Clean, documented HTML Processor token walk. Correctly chose create_fragment(), included only #text and whitelisted TITLE/TEXTAREA openers, excluded SCRIPT/STYLE by not broadly appending modifiable text, and truncated with UTF-8-aware APIs. Minor near-miss: it does not inspect get_last_error() after a scan, so unsupported markup would silently produce whatever text was seen before the abort." + }, + { + "trial_id": "trial-3", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Strongest documentation adherence. It uses only documented APIs, chooses WP_HTML_Processor::create_fragment(), walks tokens directly, distinguishes token type from token name, whitelists TITLE/TEXTAREA opener-carried decoded text, and rejects unsupported-parser aborts with get_last_error(). Only small gap is that it does not separately consider paused_at_incomplete_token(), though the task and reference did not require rejection of incomplete trailing syntax." + } + ], + "failure_analysis": "All three trials passed all 10 hidden/frozen cases, with no _doing_it_wrong records. The docs worked well for the core challenge: the processor-selection guidance says to use the HTML Processor when collecting text content and handling implied or missing closing tags; next_token() documents that text may be split across multiple #text tokens and that malformed input still produces structural closers; get_modifiable_text() documents decoded UTF-8 text for #text, TITLE, and TEXTAREA, and raw text for SCRIPT/STYLE. Those passages led every trial to use create_fragment(), walk tokens, append #text, specially include TITLE/TEXTAREA opener text, and avoid double-decoding entities.\n\nNear-misses were policy-related rather than test failures. Trial 1 added a lexical Tag Processor fallback even though the Tag Processor docs explicitly say it is not parsed BODY-fragment text-content extraction. Trial 2 omitted get_last_error(), so an unsupported-parser abort would look like successful end-of-input. Trial 3 returned an empty string on get_last_error(), which is defensible but not clearly mandated for read-only extraction. None of the trials checked paused_at_incomplete_token(); probes confirmed incomplete trailing syntax can pause with get_last_error() still null, so the docs need to keep those states distinct for extraction code, not only for mutation or serialization code.", + "doc_gaps": [ + { + "location": "html-processor.md / WP_HTML_Processor::next_token() and the text-collection recipe", + "problem": "The docs explain subtree text and special-element text, but they do not present a compact general pattern for fragment-wide text-like extraction where ordinary #text is included and specific special-element opener text is opt-in.", + "suggestion": "Add a general decision table or short example showing how to choose token categories: #text for ordinary DOM text; TITLE/TEXTAREA opener text when the caller explicitly wants those decoded contents; SCRIPT/STYLE only when raw script/style text is explicitly desired." + }, + { + "location": "html-processor.md / get_last_error(), next_token(), paused_at_incomplete_token references", + "problem": "Unsupported-parser aborts and incomplete trailing syntax are documented, but read-only extraction policy is unclear. Candidates made different choices: ignore errors, reject on get_last_error(), or fall back lexically.", + "suggestion": "State that next_token() returning false can mean normal end, unsupported abort, or paused incomplete input; document the separate checks and give general policy guidance for best-effort extraction versus complete-input-required extraction." + }, + { + "location": "html-tag-processor.md / Tokens and finer-grained processing", + "problem": "The lexical text-scan example is close enough to DOM text extraction that a reader may copy it as a fallback, despite nearby warnings that Tag Processor does not apply BODY fragment parsing or implied-closing semantics.", + "suggestion": "Label the example as lexical-only in the heading or code comment, and cross-link to the HTML Processor text-walk recipe for parsed fragment text extraction." + }, + { + "location": "html-processor.md / WP_HTML_Tag_Processor::get_modifiable_text inherited docs", + "problem": "The method correctly warns that modifiable text is broader than DOM text, but the contract is spread across paragraphs and can be missed when readers are solving extraction tasks.", + "suggestion": "Add a concise table listing token name/type, whether get_modifiable_text() returns decoded or raw text, and whether it should normally count as DOM text." + } + ] +} diff --git a/doc-experiment/results/round-38/T05-text-excerpt/trial-1/candidate.php b/doc-experiment/results/round-38/T05-text-excerpt/trial-1/candidate.php new file mode 100644 index 0000000000000..c4b8ae8007773 --- /dev/null +++ b/doc-experiment/results/round-38/T05-text-excerpt/trial-1/candidate.php @@ -0,0 +1,64 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + continue; + } + + if ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) { + $tag_name = $processor->get_tag(); + if ( 'TITLE' === $tag_name || 'TEXTAREA' === $tag_name ) { + $text .= $processor->get_modifiable_text(); + } + } + } + + if ( null === $processor->get_last_error() ) { + return $truncate( $text ); + } + } + + $text = ''; + $processor = new WP_HTML_Tag_Processor( $html ); + + while ( $processor->next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + continue; + } + + if ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) { + $tag_name = $processor->get_tag(); + if ( 'TITLE' === $tag_name || 'TEXTAREA' === $tag_name ) { + $text .= $processor->get_modifiable_text(); + } + } + } + + return $truncate( $text ); +} diff --git a/doc-experiment/results/round-38/T05-text-excerpt/trial-1/execution.json b/doc-experiment/results/round-38/T05-text-excerpt/trial-1/execution.json new file mode 100644 index 0000000000000..47002bf5e5369 --- /dev/null +++ b/doc-experiment/results/round-38/T05-text-excerpt/trial-1/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-38/T05-text-excerpt/trial-1/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-38/T05-text-excerpt/trial-1/response.json b/doc-experiment/results/round-38/T05-text-excerpt/trial-1/response.json new file mode 100644 index 0000000000000..351039d356103 --- /dev/null +++ b/doc-experiment/results/round-38/T05-text-excerpt/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This walks the HTML with `WP_HTML_Processor::create_fragment()` and `next_token()`, appending only ordinary `#text` token text plus opener-carried text from `TITLE` and `TEXTAREA`, using `get_modifiable_text()` so character references are already decoded where HTML defines that. If the HTML Processor aborts on unsupported markup, it falls back to a lexical `WP_HTML_Tag_Processor` token scan with the same inclusion rules, then truncates the UTF-8 result to the requested number of Unicode code points.", + "confidence": 84 +} diff --git a/doc-experiment/results/round-38/T05-text-excerpt/trial-2/candidate.php b/doc-experiment/results/round-38/T05-text-excerpt/trial-2/candidate.php new file mode 100644 index 0000000000000..3bdf0174904dc --- /dev/null +++ b/doc-experiment/results/round-38/T05-text-excerpt/trial-2/candidate.php @@ -0,0 +1,38 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + continue; + } + + if ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) { + $tag = $processor->get_tag(); + if ( 'TEXTAREA' === $tag || 'TITLE' === $tag ) { + $text .= $processor->get_modifiable_text(); + } + } + } + + if ( function_exists( 'mb_substr' ) ) { + return mb_substr( $text, 0, $max_codepoints, 'UTF-8' ); + } + + if ( preg_match_all( '/./us', $text, $matches ) ) { + return implode( '', array_slice( $matches[0], 0, $max_codepoints ) ); + } + + return ''; +} diff --git a/doc-experiment/results/round-38/T05-text-excerpt/trial-2/execution.json b/doc-experiment/results/round-38/T05-text-excerpt/trial-2/execution.json new file mode 100644 index 0000000000000..32997943896ff --- /dev/null +++ b/doc-experiment/results/round-38/T05-text-excerpt/trial-2/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-38/T05-text-excerpt/trial-2/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-38/T05-text-excerpt/trial-2/response.json b/doc-experiment/results/round-38/T05-text-excerpt/trial-2/response.json new file mode 100644 index 0000000000000..68dd8fded6166 --- /dev/null +++ b/doc-experiment/results/round-38/T05-text-excerpt/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the input as a BODY fragment, then walks tokens with `next_token()`, appending ordinary text from `#text` tokens via `get_modifiable_text()`. Because `TITLE` and `TEXTAREA` carry their decoded text on the opening tag token rather than separate text-node children, it also whitelists those opening tags and appends their `get_modifiable_text()`. The final string is truncated to the requested number of UTF-8 code points with `mb_substr()` and a regex fallback.", + "confidence": 87 +} diff --git a/doc-experiment/results/round-38/T05-text-excerpt/trial-3/candidate.php b/doc-experiment/results/round-38/T05-text-excerpt/trial-3/candidate.php new file mode 100644 index 0000000000000..7f35c4372126a --- /dev/null +++ b/doc-experiment/results/round-38/T05-text-excerpt/trial-3/candidate.php @@ -0,0 +1,54 @@ +next_token() ) { + $token_type = $processor->get_token_type(); + + if ( '#text' === $token_type ) { + $text .= $processor->get_modifiable_text(); + continue; + } + + if ( '#tag' !== $token_type || $processor->is_tag_closer() ) { + continue; + } + + $tag_name = $processor->get_token_name(); + if ( 'TEXTAREA' === $tag_name || 'TITLE' === $tag_name ) { + $text .= $processor->get_modifiable_text(); + } + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + if ( function_exists( 'mb_substr' ) && function_exists( 'mb_strlen' ) ) { + if ( mb_strlen( $text, 'UTF-8' ) <= $max_codepoints ) { + return $text; + } + + return mb_substr( $text, 0, $max_codepoints, 'UTF-8' ); + } + + if ( 1 === preg_match_all( '/./us', $text, $matches ) ) { + if ( count( $matches[0] ) <= $max_codepoints ) { + return $text; + } + + return implode( '', array_slice( $matches[0], 0, $max_codepoints ) ); + } + + return ''; +} diff --git a/doc-experiment/results/round-38/T05-text-excerpt/trial-3/execution.json b/doc-experiment/results/round-38/T05-text-excerpt/trial-3/execution.json new file mode 100644 index 0000000000000..aa0605da1f408 --- /dev/null +++ b/doc-experiment/results/round-38/T05-text-excerpt/trial-3/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-38/T05-text-excerpt/trial-3/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-38/T05-text-excerpt/trial-3/response.json b/doc-experiment/results/round-38/T05-text-excerpt/trial-3/response.json new file mode 100644 index 0000000000000..a010ddbe6b6d2 --- /dev/null +++ b/doc-experiment/results/round-38/T05-text-excerpt/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the input as a BODY fragment, then walks tokens with `next_token()`, appending ordinary `#text` token text plus opener-carried text from `TITLE` and `TEXTAREA` via `get_modifiable_text()`, while naturally excluding `SCRIPT` and `STYLE`. It then truncates by Unicode code points with `mb_*` in UTF-8, with a regex fallback that also preserves whole UTF-8 characters.", + "confidence": 87 +} diff --git a/doc-experiment/results/round-38/T08-table-extract/judge.json b/doc-experiment/results/round-38/T08-table-extract/judge.json new file mode 100644 index 0000000000000..fd2d7f9bb85a2 --- /dev/null +++ b/doc-experiment/results/round-38/T08-table-extract/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 93, + "hallucinated_methods": [], + "notes": "Passed 8/8. Correctly chose WP_HTML_Processor::create_fragment(), used documented methods only, and followed the documented single next_token() depth-bounded state-machine pattern. Deduction: it also appends opener-carried SCRIPT/STYLE/TEXTAREA/TITLE modifiable text inside cells, despite the docs warning that ordinary subtree text should be #text tokens only unless the contract explicitly asks for special-element contents." + }, + { + "trial_id": "trial-2", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Passed 8/8. Best adherence: correct processor, no undocumented methods, single cursor walk, table-depth boundary, virtual closer handling, and decoded #text collection via get_modifiable_text(). Minor residual risk: it does not state or enforce a strict policy for unsupported/truncated input after the scan, though that was not required by the task and matches the reference's best-effort behavior." + }, + { + "trial_id": "trial-3", + "adherence": 92, + "hallucinated_methods": [], + "notes": "Passed 8/8. Correct processor and documented API usage, with an idiomatic single-pass token walk bounded by current depth. Same semantic near-miss as trial-1: it opts into special-element opener text for SCRIPT/STYLE/TEXTAREA/TITLE, which the docs describe as separate from ordinary #text subtree extraction." + } + ], + "failure_analysis": "No frozen hidden case failed in any trial: each execution report shows 8/8 passing and no _doing_it_wrong records. The docs appear to have done the important things well: they steer structural work away from WP_HTML_Tag_Processor and toward WP_HTML_Processor; create_fragment() is clearly positioned for BODY fragments; next_token() explains why text extraction needs a token walk; get_current_depth() documents the >= depth-bound pattern; and get_modifiable_text() explains decoded #text output, which prevented double-decoding of entities. The main near-miss is special text-bearing elements. Trials 1 and 3 included SCRIPT/STYLE/TEXTAREA/TITLE opener-carried text inside cells. A probe with AC returns ABC for those trials but AC for the reference. This was not caused by a missing method doc: the rendered docs explicitly warn under 'Recipe: collect DOM-style text from a subtree', next_token(), and get_modifiable_text() that ordinary subtree text is #text only and special-element modifiable text must be opt-in. The remaining documentation weakness is that this distinction is easy to override when a task uses the phrase 'text content'.", + "doc_gaps": [ + { + "location": "html-processor.md: 'Recipe: collect DOM-style text from a subtree' and WP_HTML_Processor::next_token()", + "problem": "Two subjects still treated special-element opener modifiable text as part of ordinary subtree text, even though the warning exists.", + "suggestion": "Add a compact decision table: ordinary text descendants => visit only #text tokens; caller explicitly wants SCRIPT/STYLE/TEXTAREA/TITLE contents => whitelist opener tokens and note decoded-vs-raw behavior." + }, + { + "location": "html-tag-processor.md / html-processor.md: WP_HTML_Tag_Processor::get_modifiable_text()", + "problem": "The method's availability on comments and special-element openers can be mistaken for semantic membership in an element's text result.", + "suggestion": "Add an admonition that non-empty modifiable text is not a predicate for DOM-style text inclusion, with a tiny contrasting example showing #text-only collection versus explicit special-element collection." + }, + { + "location": "WP_HTML_Processor::get_last_error() and WP_HTML_Tag_Processor::paused_at_incomplete_token() sections", + "problem": "Candidates did not consistently express an input-completeness policy; one checked get_last_error() only, while others checked neither.", + "suggestion": "Add a short 'After a bounded scan' note clarifying that get_last_error() detects unsupported-parser aborts, paused_at_incomplete_token() detects trailing incomplete syntax, and read-only extractors should choose explicitly between best-effort output and strict rejection." + } + ] +} diff --git a/doc-experiment/results/round-38/T08-table-extract/trial-1/candidate.php b/doc-experiment/results/round-38/T08-table-extract/trial-1/candidate.php new file mode 100644 index 0000000000000..cbea08c6ba231 --- /dev/null +++ b/doc-experiment/results/round-38/T08-table-extract/trial-1/candidate.php @@ -0,0 +1,72 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = null; + + while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) { + if ( '#tag' === $processor->get_token_type() ) { + $tag_name = $processor->get_tag(); + + if ( 'TR' === $tag_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_row ) { + $rows[] = $current_row; + $current_row = null; + } + } else { + $current_row = array(); + } + + continue; + } + + if ( 'TD' === $tag_name || 'TH' === $tag_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_row && null !== $current_cell ) { + $current_row[] = $current_cell; + } + $current_cell = null; + } else { + if ( null === $current_row ) { + $current_row = array(); + } + $current_cell = ''; + } + + continue; + } + + if ( + null !== $current_cell && + ! $processor->is_tag_closer() && + in_array( $tag_name, array( 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE' ), true ) + ) { + $current_cell .= $processor->get_modifiable_text(); + } + + continue; + } + + if ( null !== $current_cell && '#text' === $processor->get_token_type() ) { + $current_cell .= $processor->get_modifiable_text(); + } + } + + if ( null !== $processor->get_last_error() ) { + return array(); + } + + return $rows; +} diff --git a/doc-experiment/results/round-38/T08-table-extract/trial-1/execution.json b/doc-experiment/results/round-38/T08-table-extract/trial-1/execution.json new file mode 100644 index 0000000000000..40322e6f3091c --- /dev/null +++ b/doc-experiment/results/round-38/T08-table-extract/trial-1/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-38/T08-table-extract/trial-1/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-38/T08-table-extract/trial-1/response.json b/doc-experiment/results/round-38/T08-table-extract/trial-1/response.json new file mode 100644 index 0000000000000..000d51b932d2b --- /dev/null +++ b/doc-experiment/results/round-38/T08-table-extract/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because table extraction depends on browser-like structure, including implied table elements. It finds the first `TABLE` with `next_tag()`, then makes a single `next_token()` pass bounded by the table depth from `get_current_depth()`, opening and closing rows on `TR` tokens, cells on `TD`/`TH` tokens, and appending cell text from `#text` tokens via `get_modifiable_text()`; it also includes opener-carried text for special text-only elements (`SCRIPT`, `STYLE`, `TEXTAREA`, `TITLE`) when they appear inside a cell.", + "confidence": 86 +} diff --git a/doc-experiment/results/round-38/T08-table-extract/trial-2/candidate.php b/doc-experiment/results/round-38/T08-table-extract/trial-2/candidate.php new file mode 100644 index 0000000000000..2f6cb0fe4ee89 --- /dev/null +++ b/doc-experiment/results/round-38/T08-table-extract/trial-2/candidate.php @@ -0,0 +1,57 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = null; + + while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) { + if ( '#tag' === $processor->get_token_type() ) { + $tag_name = $processor->get_tag(); + + if ( 'TR' === $tag_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_row ) { + $rows[] = $current_row; + $current_row = null; + } + } else { + $current_row = array(); + } + + continue; + } + + if ( 'TD' === $tag_name || 'TH' === $tag_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_row && null !== $current_cell ) { + $current_row[] = $current_cell; + } + $current_cell = null; + } else { + if ( null === $current_row ) { + $current_row = array(); + } + $current_cell = ''; + } + + continue; + } + } + + if ( null !== $current_cell && '#text' === $processor->get_token_type() ) { + $current_cell .= $processor->get_modifiable_text(); + } + } + + return $rows; +} diff --git a/doc-experiment/results/round-38/T08-table-extract/trial-2/execution.json b/doc-experiment/results/round-38/T08-table-extract/trial-2/execution.json new file mode 100644 index 0000000000000..3844c2095af6e --- /dev/null +++ b/doc-experiment/results/round-38/T08-table-extract/trial-2/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-38/T08-table-extract/trial-2/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-38/T08-table-extract/trial-2/response.json b/doc-experiment/results/round-38/T08-table-extract/trial-2/response.json new file mode 100644 index 0000000000000..db3ba2a08d00a --- /dev/null +++ b/doc-experiment/results/round-38/T08-table-extract/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with browser-style table handling, `next_tag( 'TABLE' )` to find the first table, and a single `next_token()` walk bounded by the table depth. It tracks `TR`, `TD`, and `TH` openers/closers with `get_tag()`, `is_tag_closer()`, `get_current_depth()`, and concatenates only `#text` token content via `get_modifiable_text()`, which already returns decoded text.", + "confidence": 91 +} diff --git a/doc-experiment/results/round-38/T08-table-extract/trial-3/candidate.php b/doc-experiment/results/round-38/T08-table-extract/trial-3/candidate.php new file mode 100644 index 0000000000000..4c87274f6ed5d --- /dev/null +++ b/doc-experiment/results/round-38/T08-table-extract/trial-3/candidate.php @@ -0,0 +1,72 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = null; + $text_like_tags = array( 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE' ); + + while ( $processor->next_token() ) { + if ( $processor->get_current_depth() < $table_depth ) { + break; + } + + if ( '#text' === $processor->get_token_type() ) { + if ( null !== $current_cell ) { + $current_cell .= $processor->get_modifiable_text(); + } + continue; + } + + if ( '#tag' !== $processor->get_token_type() ) { + continue; + } + + $tag = $processor->get_tag(); + if ( null === $tag ) { + continue; + } + + if ( $processor->is_tag_closer() ) { + if ( ( 'TD' === $tag || 'TH' === $tag ) && null !== $current_cell && null !== $current_row ) { + $current_row[] = $current_cell; + $current_cell = null; + } elseif ( 'TR' === $tag && null !== $current_row ) { + $rows[] = $current_row; + $current_row = null; + } + + continue; + } + + if ( 'TR' === $tag ) { + if ( null === $current_row ) { + $current_row = array(); + } + continue; + } + + if ( 'TD' === $tag || 'TH' === $tag ) { + if ( null === $current_row ) { + $current_row = array(); + } + $current_cell = ''; + continue; + } + + if ( null !== $current_cell && in_array( $tag, $text_like_tags, true ) ) { + $current_cell .= $processor->get_modifiable_text(); + } + } + + return $rows; +} diff --git a/doc-experiment/results/round-38/T08-table-extract/trial-3/execution.json b/doc-experiment/results/round-38/T08-table-extract/trial-3/execution.json new file mode 100644 index 0000000000000..ae58c13a31830 --- /dev/null +++ b/doc-experiment/results/round-38/T08-table-extract/trial-3/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-38/T08-table-extract/trial-3/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-38/T08-table-extract/trial-3/response.json b/doc-experiment/results/round-38/T08-table-extract/trial-3/response.json new file mode 100644 index 0000000000000..893dc1f89b2bc --- /dev/null +++ b/doc-experiment/results/round-38/T08-table-extract/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the fragment with browser-like table handling, then finds the first `TABLE` via `next_tag()`. From that point it performs a single `next_token()` walk bounded by the table depth from `get_current_depth()`, collecting rows on `TR` open/close, cells on `TD`/`TH` open/close, and appending decoded text from `#text` tokens via `get_modifiable_text()`; it also includes opener-carried text for special text-bearing tags like `TEXTAREA`, `SCRIPT`, `STYLE`, and `TITLE` when they appear inside a cell.", + "confidence": 87 +} diff --git a/doc-experiment/results/round-38/T09-mark-keyword/judge.json b/doc-experiment/results/round-38/T09-mark-keyword/judge.json new file mode 100644 index 0000000000000..7b5b95fc92b39 --- /dev/null +++ b/doc-experiment/results/round-38/T09-mark-keyword/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 95, + "hallucinated_methods": [], + "notes": "Uses the right processor and documented methods: create_fragment(), next_token(), get_token_type(), get_modifiable_text(), serialize_token(), get_last_error(), and normalize(). Main loop is idiomatic and handles decoded #text matching, comments, attributes, split text nodes, special-element text, and normalization. Deductions: on parser error it calls normalize($html) after building rewritten output, which the serialize_token() docs explicitly warn will discard emitted changes; if normalization fails it returns raw input. It also returns raw input if create_fragment() returns null." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Uses WP_HTML_Processor::create_fragment() for a body fragment, walks tokens with next_token(), limits matching to ordinary #text tokens, reads decoded text via get_modifiable_text(), and emits normalized output with serialize_token(). All API calls are documented, there are no _doing_it_wrong records, and the get_last_error() rejection path matches the documented rewrite-loop guidance." + }, + { + "trial_id": "trial-3", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Same correct documented token-serialization pattern as the reference: HTML Processor, #text guard, decoded get_modifiable_text(), serialize_token() for normalized output. Deductions are for the same near-miss as trial-1: after a rewrite loop it falls back to normalize($html) on parser error, which intentionally drops any wrappers already emitted. Returning empty string if normalization fails is safer than trial-1's raw-input fallback, but the normalize-after-rewrite pattern is still non-idiomatic." + } + ], + "failure_analysis": "All three trials passed all 8 frozen cases, so there are no hidden-case failures to attribute. The docs did well on the core decisions: the processor-choice sections in both docs point users to WP_HTML_Processor for body fragments, structure, implied/missing closing tags, and normalized output; next_token() explains why text requires token walking and why special elements do not expose ordinary #text children; get_modifiable_text() clearly states that #text is decoded and that the method is not a predicate for ordinary text; serialize_token() explains the exact rewrite pattern of appending each current token's normalized serialization while inserting extra markup around selected tokens. The near-miss was error handling: trials 1 and 3 called normalize($html) after accumulating rewritten output. The serialize_token() docs already warn against this, but the models still invented that fallback. It was not exercised by the hidden cases; on unsupported markup it would abandon the emitted wrappers, and trial-1 can return raw unnormalized input.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::serialize_token() and the 'Recipe: rewrite while serializing tokens' section", + "problem": "The docs contain the necessary warning, but models still added normalize($html) as an error fallback after a token-by-token rewrite, which silently discards emitted edits.", + "suggestion": "Add an explicit error-path note: after a rewrite loop, normalize($original_html) or serialize() on a fresh processor produces an unmodified normalized copy, not the accumulated rewrite. Show acceptable generic policies such as returning null/empty, throwing, or returning the accumulated best-effort output only when the caller contract allows it." + }, + { + "location": "WP_HTML_Processor::create_fragment() return value", + "problem": "The method says it can return null but does not make the common null causes and recommended caller policy prominent. Trial fallbacks varied between raw input and empty string.", + "suggestion": "Document the practical null cases for the default BODY/UTF-8/string path and add guidance for string-returning filters: choose an explicit fallback consistent with the function contract, and avoid returning raw input when the caller promises normalized output." + }, + { + "location": "WP_HTML_Processor::get_last_error() / paused_at_incomplete_token() cross-reference", + "problem": "The distinction between unsupported parser aborts, supported malformed HTML with virtual closers, and trailing incomplete syntax is spread across sections. This can lead to unnecessary or inconsistent fallback logic.", + "suggestion": "Add a compact decision table: supported malformed markup still serializes normally; get_last_error() means the processor aborted on unsupported markup; paused_at_incomplete_token() is the separate signal for truncated trailing syntax when the caller requires complete input." + } + ] +} diff --git a/doc-experiment/results/round-38/T09-mark-keyword/trial-1/candidate.php b/doc-experiment/results/round-38/T09-mark-keyword/trial-1/candidate.php new file mode 100644 index 0000000000000..079afadd741d8 --- /dev/null +++ b/doc-experiment/results/round-38/T09-mark-keyword/trial-1/candidate.php @@ -0,0 +1,26 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() && str_contains( $processor->get_modifiable_text(), $keyword ) ) { + $output .= '' . $processor->serialize_token() . ''; + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + $normalized = WP_HTML_Processor::normalize( $html ); + return null === $normalized ? $html : $normalized; + } + + return $output; +} diff --git a/doc-experiment/results/round-38/T09-mark-keyword/trial-1/execution.json b/doc-experiment/results/round-38/T09-mark-keyword/trial-1/execution.json new file mode 100644 index 0000000000000..1234c162e2866 --- /dev/null +++ b/doc-experiment/results/round-38/T09-mark-keyword/trial-1/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-38/T09-mark-keyword/trial-1/candidate.php", + "function": "mark_keyword", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple-unclosed", + "status": "pass", + "expected": "

hello world

", + "actual": "

hello world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-text-nodes", + "status": "pass", + "expected": "

alpha beta

beta gamma

delta

", + "actual": "

alpha beta

beta gamma

delta

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-attribute-not-wrapped", + "status": "pass", + "expected": "somewhere world", + "actual": "somewhere world", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-encoded-keyword-matches", + "status": "pass", + "expected": "

world peace

", + "actual": "

world peace

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "split-across-elements-no-match", + "status": "pass", + "expected": "

world

", + "actual": "

world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-comment-not-wrapped", + "status": "pass", + "expected": "

world

", + "actual": "

world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive", + "status": "pass", + "expected": "

World world

", + "actual": "

World world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "normalization-side-effects", + "status": "pass", + "expected": "
bold world

unclosed & markup

", + "actual": "
bold world

unclosed & markup

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-38/T09-mark-keyword/trial-1/response.json b/doc-experiment/results/round-38/T09-mark-keyword/trial-1/response.json new file mode 100644 index 0000000000000..d6ed3dd966877 --- /dev/null +++ b/doc-experiment/results/round-38/T09-mark-keyword/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-context fragment, walks every token with `next_token()`, checks only ordinary `#text` tokens via `get_token_type()` and decoded text from `get_modifiable_text()`, and serializes the normalized result token-by-token with `serialize_token()`, inserting `...` around matching text tokens.", + "confidence": 83 +} diff --git a/doc-experiment/results/round-38/T09-mark-keyword/trial-2/candidate.php b/doc-experiment/results/round-38/T09-mark-keyword/trial-2/candidate.php new file mode 100644 index 0000000000000..4d1bc5af50712 --- /dev/null +++ b/doc-experiment/results/round-38/T09-mark-keyword/trial-2/candidate.php @@ -0,0 +1,26 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() && str_contains( $processor->get_modifiable_text(), $keyword ) ) { + $output .= '' . $processor->serialize_token() . ''; + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + return $output; +} diff --git a/doc-experiment/results/round-38/T09-mark-keyword/trial-2/execution.json b/doc-experiment/results/round-38/T09-mark-keyword/trial-2/execution.json new file mode 100644 index 0000000000000..466feb5370263 --- /dev/null +++ b/doc-experiment/results/round-38/T09-mark-keyword/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-38/T09-mark-keyword/trial-2/candidate.php", + "function": "mark_keyword", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple-unclosed", + "status": "pass", + "expected": "

hello world

", + "actual": "

hello world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-text-nodes", + "status": "pass", + "expected": "

alpha beta

beta gamma

delta

", + "actual": "

alpha beta

beta gamma

delta

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-attribute-not-wrapped", + "status": "pass", + "expected": "somewhere world", + "actual": "somewhere world", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-encoded-keyword-matches", + "status": "pass", + "expected": "

world peace

", + "actual": "

world peace

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "split-across-elements-no-match", + "status": "pass", + "expected": "

world

", + "actual": "

world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-comment-not-wrapped", + "status": "pass", + "expected": "

world

", + "actual": "

world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive", + "status": "pass", + "expected": "

World world

", + "actual": "

World world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "normalization-side-effects", + "status": "pass", + "expected": "
bold world

unclosed & markup

", + "actual": "
bold world

unclosed & markup

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-38/T09-mark-keyword/trial-2/response.json b/doc-experiment/results/round-38/T09-mark-keyword/trial-2/response.json new file mode 100644 index 0000000000000..129c93f7b398f --- /dev/null +++ b/doc-experiment/results/round-38/T09-mark-keyword/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, then walks tokens with `next_token()`. For each ordinary `#text` token, it checks the decoded text via `get_modifiable_text()` and, when the case-sensitive keyword is present, emits `` around that token\u2019s normalized serialization from `serialize_token()`; all other tokens are emitted unchanged through `serialize_token()`, yielding normalized HTML with only the added wrappers.", + "confidence": 74 +} diff --git a/doc-experiment/results/round-38/T09-mark-keyword/trial-3/candidate.php b/doc-experiment/results/round-38/T09-mark-keyword/trial-3/candidate.php new file mode 100644 index 0000000000000..a3a29be8d5b29 --- /dev/null +++ b/doc-experiment/results/round-38/T09-mark-keyword/trial-3/candidate.php @@ -0,0 +1,30 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text = $processor->get_modifiable_text(); + + if ( str_contains( $text, $keyword ) ) { + $output .= '' . $processor->serialize_token() . ''; + continue; + } + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + $normalized = WP_HTML_Processor::normalize( $html ); + return null === $normalized ? '' : $normalized; + } + + return $output; +} diff --git a/doc-experiment/results/round-38/T09-mark-keyword/trial-3/execution.json b/doc-experiment/results/round-38/T09-mark-keyword/trial-3/execution.json new file mode 100644 index 0000000000000..2d217e502d99b --- /dev/null +++ b/doc-experiment/results/round-38/T09-mark-keyword/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-38/T09-mark-keyword/trial-3/candidate.php", + "function": "mark_keyword", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple-unclosed", + "status": "pass", + "expected": "

hello world

", + "actual": "

hello world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-text-nodes", + "status": "pass", + "expected": "

alpha beta

beta gamma

delta

", + "actual": "

alpha beta

beta gamma

delta

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-attribute-not-wrapped", + "status": "pass", + "expected": "somewhere world", + "actual": "somewhere world", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-encoded-keyword-matches", + "status": "pass", + "expected": "

world peace

", + "actual": "

world peace

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "split-across-elements-no-match", + "status": "pass", + "expected": "

world

", + "actual": "

world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-comment-not-wrapped", + "status": "pass", + "expected": "

world

", + "actual": "

world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive", + "status": "pass", + "expected": "

World world

", + "actual": "

World world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "normalization-side-effects", + "status": "pass", + "expected": "
bold world

unclosed & markup

", + "actual": "
bold world

unclosed & markup

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-38/T09-mark-keyword/trial-3/response.json b/doc-experiment/results/round-38/T09-mark-keyword/trial-3/response.json new file mode 100644 index 0000000000000..4291087483a03 --- /dev/null +++ b/doc-experiment/results/round-38/T09-mark-keyword/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, walks every token with `next_token()`, checks only ordinary `#text` tokens via `get_token_type()` and decoded text from `get_modifiable_text()`, and emits normalized output token-by-token with `serialize_token()`, inserting `...` around any matching text token so special text-bearing elements like `script`, `style`, `title`, and `textarea` are not wrapped.", + "confidence": 78 +} diff --git a/doc-experiment/results/round-38/VARIANT.md b/doc-experiment/results/round-38/VARIANT.md new file mode 100644 index 0000000000000..2daa10a71a43b --- /dev/null +++ b/doc-experiment/results/round-38/VARIANT.md @@ -0,0 +1,32 @@ +# Round 38 Scratch Variant + +Variant name: `html-processor-method-local-text-policy-clarification` + +Control round: `round-37` + +Edited rendered file: `/tmp/html-api-docs-eval/round-38/html-processor.md` + +Source docblocks were not edited. This is a scratch-only rendered-doc A/B +variant. The staged `html-processor.md` SHA-256 recorded in +`round-metadata.json` is: + +```text +3f695d2cb2d43f14de27b3824edcbe600bb4d4f14c8650424840a0b4d9fe0b5b +``` + +Changed the method-local `WP_HTML_Processor::next_token()` special-elements +paragraph from an "important exception" framing to an explicit caller-policy +framing: special elements do not produce ordinary `#text` child tokens, and +their opener-carried text should be included only when the caller explicitly +asks for special-element contents. + +Added a method-local warning to `WP_HTML_Processor::get_modifiable_text()`: +the method is not a predicate for ordinary text content; ordinary DOM-style +element text should first require `get_token_type() === '#text'`, while +comments, processing instructions, and special-element openers should be +included only by explicit caller policy. + +Purpose: test whether moving the ordinary-text versus special-element +opt-in boundary to the method sections reduces special-element over-inclusion +in text extraction and text-node-only serialization tasks without editing +source docblocks. diff --git a/doc-experiment/results/round-38/codex-judges-output.json b/doc-experiment/results/round-38/codex-judges-output.json new file mode 100644 index 0000000000000..0882740f1d491 --- /dev/null +++ b/doc-experiment/results/round-38/codex-judges-output.json @@ -0,0 +1,224 @@ +{ + "result": [ + { + "id": "T03-first-h1-text", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Passed 8/8. Correctly used WP_HTML_Processor::create_fragment(), next_tag('H1'), depth-bounded next_token(), get_token_type() === '#text', and get_modifiable_text(). All called methods are documented. Minor deduction: the final get_last_error() guard is documented but slightly over-applies clean-scan guidance from mutation/rewrite contexts to a read-only extractor whose spec says null only means no H1." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Passed 8/8. This is essentially the canonical documented pattern: fragment parser, first H1 opener, record current depth, walk tokens while depth stays within the subtree, append only #text modifiable text. It handles nested markup, decoded entities, image-only headings, multiple H1s, deep nesting, and unclosed H1 input idiomatically." + }, + { + "trial_id": "trial-3", + "adherence": 91, + "hallucinated_methods": [], + "notes": "Passed 8/8 and all called methods are documented. The core traversal is correct, but it adds SCRIPT, STYLE, TEXTAREA, and TITLE opener-carried modifiable text. The docs say to opt into those only when the caller explicitly wants special-element contents; for ordinary subtree text this is too broad. A probe on

AEF

returns ABCDEF, while the reference returns AF." + } + ], + "failure_analysis": "All trials passed every frozen hidden case. The docs were effective on the main contract: html-processor.md's 'Recipe: collect DOM-style text from a subtree' gives the exact shape needed, and html-tag-processor.md's 'Which processor should I use?' warns that the Tag Processor has no tree awareness. The get_modifiable_text() section clearly states that #text values are decoded, which prevented double-decoding in the entities case. The next_token() and get_current_depth() passages explain virtual closers, implied structure, and the >= recorded-depth boundary, which covered nested markup, deep nesting, first-of-two, image-only, and the unclosed-H1 case. Near-misses: trial 1 copied get_last_error() cleanup from clean mutation/rewrite patterns, although the extraction task did not ask to reject unsupported parser aborts. Trial 3 overgeneralized the special-elements passage: the docs mention opener-carried text for SCRIPT/STYLE/TEXTAREA/TITLE, but the relevant ordinary text recipe says to append only #text unless the caller explicitly opts into those token types.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_modifiable_text() docblock", + "problem": "The method-level text explains that special elements carry modifiable text, but a reader can overgeneralize that into ordinary subtree text extraction.", + "suggestion": "Repeat the opt-in warning in the docblock with a compact example showing ordinary #text extraction excluding SCRIPT/STYLE/TEXTAREA/TITLE opener text, and a separate example for callers that intentionally include special-element contents." + }, + { + "location": "WP_HTML_Processor::get_last_error() docblock and clean-scan recipe references", + "problem": "Clean-scan checks are easy to copy into read-only extraction tasks, changing a caller's not-found semantics into parser-error semantics.", + "suggestion": "Add guidance that get_last_error() is a policy check: use it when the caller requires a complete supported parse or before applying mutations, but read-only best-effort extraction may choose a different contract." + }, + { + "location": "WP_HTML_Processor::next_token() / get_current_depth() docblocks", + "problem": "The subtree-boundary idiom is crucial and was learned here, but it is spread across overview recipes and method docs.", + "suggestion": "Include a short method-level subtree walk example that records opener depth and continues while current depth is >= that depth, explicitly noting that this also works for implied or virtual closers in malformed input." + } + ] + } + }, + { + "id": "T05-text-excerpt", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 92, + "hallucinated_methods": [], + "notes": "Used the correct primary API: WP_HTML_Processor::create_fragment(), next_token(), get_token_type(), is_tag_closer(), get_tag(), and get_modifiable_text(), all documented. Correctly whitelisted #text plus TITLE/TEXTAREA opener text and used UTF-8 codepoint truncation. Minor adherence loss: the fallback to WP_HTML_Tag_Processor is documented but discouraged for DOM-style fragment text extraction, because it loses HTML Processor tree semantics on unsupported input." + }, + { + "trial_id": "trial-2", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Clean, documented HTML Processor token walk. Correctly chose create_fragment(), included only #text and whitelisted TITLE/TEXTAREA openers, excluded SCRIPT/STYLE by not broadly appending modifiable text, and truncated with UTF-8-aware APIs. Minor near-miss: it does not inspect get_last_error() after a scan, so unsupported markup would silently produce whatever text was seen before the abort." + }, + { + "trial_id": "trial-3", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Strongest documentation adherence. It uses only documented APIs, chooses WP_HTML_Processor::create_fragment(), walks tokens directly, distinguishes token type from token name, whitelists TITLE/TEXTAREA opener-carried decoded text, and rejects unsupported-parser aborts with get_last_error(). Only small gap is that it does not separately consider paused_at_incomplete_token(), though the task and reference did not require rejection of incomplete trailing syntax." + } + ], + "failure_analysis": "All three trials passed all 10 hidden/frozen cases, with no _doing_it_wrong records. The docs worked well for the core challenge: the processor-selection guidance says to use the HTML Processor when collecting text content and handling implied or missing closing tags; next_token() documents that text may be split across multiple #text tokens and that malformed input still produces structural closers; get_modifiable_text() documents decoded UTF-8 text for #text, TITLE, and TEXTAREA, and raw text for SCRIPT/STYLE. Those passages led every trial to use create_fragment(), walk tokens, append #text, specially include TITLE/TEXTAREA opener text, and avoid double-decoding entities.\n\nNear-misses were policy-related rather than test failures. Trial 1 added a lexical Tag Processor fallback even though the Tag Processor docs explicitly say it is not parsed BODY-fragment text-content extraction. Trial 2 omitted get_last_error(), so an unsupported-parser abort would look like successful end-of-input. Trial 3 returned an empty string on get_last_error(), which is defensible but not clearly mandated for read-only extraction. None of the trials checked paused_at_incomplete_token(); probes confirmed incomplete trailing syntax can pause with get_last_error() still null, so the docs need to keep those states distinct for extraction code, not only for mutation or serialization code.", + "doc_gaps": [ + { + "location": "html-processor.md / WP_HTML_Processor::next_token() and the text-collection recipe", + "problem": "The docs explain subtree text and special-element text, but they do not present a compact general pattern for fragment-wide text-like extraction where ordinary #text is included and specific special-element opener text is opt-in.", + "suggestion": "Add a general decision table or short example showing how to choose token categories: #text for ordinary DOM text; TITLE/TEXTAREA opener text when the caller explicitly wants those decoded contents; SCRIPT/STYLE only when raw script/style text is explicitly desired." + }, + { + "location": "html-processor.md / get_last_error(), next_token(), paused_at_incomplete_token references", + "problem": "Unsupported-parser aborts and incomplete trailing syntax are documented, but read-only extraction policy is unclear. Candidates made different choices: ignore errors, reject on get_last_error(), or fall back lexically.", + "suggestion": "State that next_token() returning false can mean normal end, unsupported abort, or paused incomplete input; document the separate checks and give general policy guidance for best-effort extraction versus complete-input-required extraction." + }, + { + "location": "html-tag-processor.md / Tokens and finer-grained processing", + "problem": "The lexical text-scan example is close enough to DOM text extraction that a reader may copy it as a fallback, despite nearby warnings that Tag Processor does not apply BODY fragment parsing or implied-closing semantics.", + "suggestion": "Label the example as lexical-only in the heading or code comment, and cross-link to the HTML Processor text-walk recipe for parsed fragment text extraction." + }, + { + "location": "html-processor.md / WP_HTML_Tag_Processor::get_modifiable_text inherited docs", + "problem": "The method correctly warns that modifiable text is broader than DOM text, but the contract is spread across paragraphs and can be missed when readers are solving extraction tasks.", + "suggestion": "Add a concise table listing token name/type, whether get_modifiable_text() returns decoded or raw text, and whether it should normally count as DOM text." + } + ] + } + }, + { + "id": "N06-extract-toc", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Used the correct `WP_HTML_Processor::create_fragment()` path for BODY-fragment structure, then followed the documented depth-bounded subtree text walk with `next_tag()`, `get_current_depth()`, `next_token()`, `get_token_type()`, and `get_modifiable_text()`. All called API methods are present in the rendered docs, and execution recorded no `_doing_it_wrong`. Minor edge-policy gap: it checks `get_last_error()` but does not check `paused_at_incomplete_token()` when a caller might care about truncated input." + }, + { + "trial_id": "trial-2", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Used the correct HTML Processor and only documented APIs. The single `next_token()` loop with explicit heading state is idiomatic per the docs' repeated-region guidance, and relying on `is_tag_closer()` is supported because the HTML Processor emits virtual closers for implied/end-of-input closures. It correctly limits text to `#text` tokens. Minor gap: no explicit unsupported/truncated-input policy after the scan." + }, + { + "trial_id": "trial-3", + "adherence": 93, + "hallucinated_methods": [], + "notes": "Used the correct processor and only documented methods. The one-pass token loop is generally idiomatic and handles decoded text, empty headings, and implied heading closes. The main adherence weakness is edge handling: after any `get_last_error()` it returns `array()`, which conflates unsupported input with a real no-match result and discards partial findings; a read-only probe with unsupported table repair returned `[]` while the reference returned the partial heading text. It also does not check `paused_at_incomplete_token()`." + } + ], + "failure_analysis": "No hidden case failed across the three trials: every `execution.json` reports 7/7 passed, with empty `_doing_it_wrong` and `trigger_error` records. The docs did well on the central contracts: the `Which processor should I use?` guidance pushed models to `WP_HTML_Processor` for structure and text extraction; `Recipe: collect DOM-style text from a subtree` showed appending only `#text` tokens; `get_modifiable_text()` documented decoded text; `next_token()` documented virtual closers for implicit/unclosed elements; and `get_current_depth()` documented the `>=` subtree boundary rule. Near misses were around policy rather than API discovery: none of the trials checked `paused_at_incomplete_token()`, and trial-3 used `get_last_error()` in a way that turns unsupported markup into an empty TOC. The docs mention both mechanisms, but they do not give a clear read-only extraction policy for partial results versus explicit failure when the function's return type cannot signal errors.", + "doc_gaps": [ + { + "location": "html-processor.md / `WP_HTML_Processor::get_last_error()`", + "problem": "The docs explain how to detect unsupported-parser aborts, but not how read-only extraction code should avoid conflating an abort with a valid empty result.", + "suggestion": "Add a short extraction-oriented note: after a scan stops with non-null `get_last_error()`, callers should make an explicit policy choice such as returning partial results, returning `null`/an error wrapper, or falling back to another parser; they should not silently report the same value used for 'no matches' unless that is intentional." + }, + { + "location": "html-processor.md / `next_token()` and `get_current_depth()`", + "problem": "The docs separately describe virtual closers and incomplete trailing syntax, but the distinction is easy to miss: unclosed elements still produce closing tokens, while an incomplete final syntax token is omitted and only detectable after draining the scan.", + "suggestion": "Add a compact example contrasting `

text` with `

text

' + : $normalized; +} diff --git a/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-1/execution.json b/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-1/execution.json new file mode 100644 index 0000000000000..a1eadf3447299 --- /dev/null +++ b/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-1/execution.json @@ -0,0 +1,83 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-1/candidate.php", + "function": "normalize_or_placeholder", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "unclosed-tags-normalize", + "status": "pass", + "expected": "

Hello

", + "actual": "

Hello

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "table-normalizes", + "status": "pass", + "expected": "
ok
", + "actual": "
ok
", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attribute-quoting-normalizes", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-normalize", + "status": "pass", + "expected": "

Fish & chips

", + "actual": "

Fish & chips

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unsupported-misnested-formatting", + "status": "pass", + "expected": "

Unsupported HTML

", + "actual": "

Unsupported HTML

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [ + { + "function": "WP_HTML_Processor::serialize", + "message": "Cannot serialize HTML Processor with parsing error: unsupported.", + "level": 512 + } + ] + }, + { + "id": "unsupported-anchor-misnesting", + "status": "pass", + "expected": "

Unsupported HTML

", + "actual": "

Unsupported HTML

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [ + { + "function": "WP_HTML_Processor::serialize", + "message": "Cannot serialize HTML Processor with parsing error: unsupported.", + "level": 512 + } + ] + }, + { + "id": "empty-fragment", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-1/response.json b/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-1/response.json new file mode 100644 index 0000000000000..ffac496142a1c --- /dev/null +++ b/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses the documented `WP_HTML_Processor::normalize()` API, which normalizes a BODY-context HTML fragment and returns `null` when the HTML Processor cannot normalize unsupported input. The function returns that normalized serialization when available, otherwise the exact fallback placeholder HTML.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-2/candidate.php b/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-2/candidate.php new file mode 100644 index 0000000000000..0a1a6ba870259 --- /dev/null +++ b/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-2/candidate.php @@ -0,0 +1,11 @@ +Unsupported HTML

'; + } + + return $normalized; +} diff --git a/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-2/execution.json b/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-2/execution.json new file mode 100644 index 0000000000000..6e34eb5e33996 --- /dev/null +++ b/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-2/execution.json @@ -0,0 +1,83 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-2/candidate.php", + "function": "normalize_or_placeholder", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "unclosed-tags-normalize", + "status": "pass", + "expected": "

Hello

", + "actual": "

Hello

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "table-normalizes", + "status": "pass", + "expected": "
ok
", + "actual": "
ok
", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attribute-quoting-normalizes", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-normalize", + "status": "pass", + "expected": "

Fish & chips

", + "actual": "

Fish & chips

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unsupported-misnested-formatting", + "status": "pass", + "expected": "

Unsupported HTML

", + "actual": "

Unsupported HTML

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [ + { + "function": "WP_HTML_Processor::serialize", + "message": "Cannot serialize HTML Processor with parsing error: unsupported.", + "level": 512 + } + ] + }, + { + "id": "unsupported-anchor-misnesting", + "status": "pass", + "expected": "

Unsupported HTML

", + "actual": "

Unsupported HTML

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [ + { + "function": "WP_HTML_Processor::serialize", + "message": "Cannot serialize HTML Processor with parsing error: unsupported.", + "level": 512 + } + ] + }, + { + "id": "empty-fragment", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-2/response.json b/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-2/response.json new file mode 100644 index 0000000000000..0029aaae1aafa --- /dev/null +++ b/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses the documented `WP_HTML_Processor::normalize()` API, which normalizes an HTML fragment in BODY context and returns `null` when the HTML Processor cannot normalize unsupported input. The function simply returns that normalized serialization or the exact fallback placeholder when normalization fails.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-3/candidate.php b/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-3/candidate.php new file mode 100644 index 0000000000000..8c34c70408724 --- /dev/null +++ b/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-3/candidate.php @@ -0,0 +1,10 @@ +Unsupported HTML

'; + } + + return $normalized; +} diff --git a/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-3/execution.json b/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-3/execution.json new file mode 100644 index 0000000000000..31f75dc07a173 --- /dev/null +++ b/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-3/execution.json @@ -0,0 +1,83 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-3/candidate.php", + "function": "normalize_or_placeholder", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "unclosed-tags-normalize", + "status": "pass", + "expected": "

Hello

", + "actual": "

Hello

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "table-normalizes", + "status": "pass", + "expected": "
ok
", + "actual": "
ok
", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attribute-quoting-normalizes", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-normalize", + "status": "pass", + "expected": "

Fish & chips

", + "actual": "

Fish & chips

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unsupported-misnested-formatting", + "status": "pass", + "expected": "

Unsupported HTML

", + "actual": "

Unsupported HTML

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [ + { + "function": "WP_HTML_Processor::serialize", + "message": "Cannot serialize HTML Processor with parsing error: unsupported.", + "level": 512 + } + ] + }, + { + "id": "unsupported-anchor-misnesting", + "status": "pass", + "expected": "

Unsupported HTML

", + "actual": "

Unsupported HTML

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [ + { + "function": "WP_HTML_Processor::serialize", + "message": "Cannot serialize HTML Processor with parsing error: unsupported.", + "level": 512 + } + ] + }, + { + "id": "empty-fragment", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-3/response.json b/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-3/response.json new file mode 100644 index 0000000000000..ce8634968ba87 --- /dev/null +++ b/doc-experiment/results/round-40/N04-normalize-or-placeholder/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::normalize()` because the task is to normalize an HTML fragment in BODY context, which the documentation says this static method handles directly and returns `null` when normalization is not possible due to unsupported markup. The function therefore returns the documented fallback placeholder only when `normalize()` returns `null`.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-40/T09-mark-keyword/judge.json b/doc-experiment/results/round-40/T09-mark-keyword/judge.json new file mode 100644 index 0000000000000..4ec924749b537 --- /dev/null +++ b/doc-experiment/results/round-40/T09-mark-keyword/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly chose WP_HTML_Processor::create_fragment(), walked tokens with next_token(), limited matching to #text tokens, used get_modifiable_text() for decoded text, and rebuilt normalized output with serialize_token(). All HTML API methods used are documented; execution recorded no _doing_it_wrong entries." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Essentially the canonical pattern: HTML Processor fragment parser, token walk, #text guard, decoded text via get_modifiable_text(), token-by-token serialization with inserted wrappers, and get_last_error() fallback. All called API methods appear in the rendered docs; no misuse records." + }, + { + "trial_id": "trial-3", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Uses the right processor and documented token-serialization pattern. Returning the original input on create_fragment() failure or get_last_error() is a defensible fallback but slightly less aligned with the task’s normalized-output contract than rejecting with an empty string or another explicit failure policy. No undocumented API calls or _doing_it_wrong records." + } + ], + "failure_analysis": "All three trials passed all 8 hidden/frozen cases, so there are no failed cases to attribute to API misunderstanding. The docs did well in four places: the HTML Processor overview explicitly says to use WP_HTML_Processor, not WP_HTML_Tag_Processor, when structure or normalized output matters; the “collect DOM-style text from a subtree” recipe says ordinary text is only #text tokens and warns that comments and special-element opener tokens can also have modifiable text; get_modifiable_text() documents that #text, TITLE, and TEXTAREA text is already decoded; and serialize_token() documents the exact rewrite pattern the candidates needed, including wrapping selected tokens while accumulating normalized serialization. Near-misses: all candidates added get_last_error() fallback logic even though the task did not specify unsupported-markup behavior, and none checked paused_at_incomplete_token(); the serialize_token() docs say this is a caller policy decision, which likely prevented a functional issue here.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::serialize_token() / rewrite-while-serializing recipe", + "problem": "The docs explain token-by-token rewrites but do not give a minimal example that inserts wrapper markup around ordinary text tokens specifically.", + "suggestion": "Add a general example showing how to wrap or annotate selected #text tokens while serializing, emphasizing that the emitted wrapper string plus serialize_token() is the output and get_updated_html() is not involved." + }, + { + "location": "WP_HTML_Processor::get_last_error() and serialize_token() incomplete-input notes", + "problem": "The docs say to reject or fall back on get_last_error() and separately decide on paused_at_incomplete_token(), but they do not define common output policies for normalizing filters versus strict validators.", + "suggestion": "Add a short policy table: best-effort normalizer may omit unvisited incomplete trailing syntax; strict transformations should reject when paused_at_incomplete_token() is true; unsupported-parser aborts require an explicit fallback." + }, + { + "location": "WP_HTML_Tag_Processor::get_modifiable_text() / WP_HTML_Processor text recipes", + "problem": "The distinction between ordinary DOM text and other modifiable text is documented well, but it is spread across both class docs.", + "suggestion": "Add a compact cross-link near get_modifiable_text(): “For DOM text matching, first require get_token_type() === '#text'; otherwise comments and special elements may match too.”" + } + ] +} diff --git a/doc-experiment/results/round-40/T09-mark-keyword/trial-1/candidate.php b/doc-experiment/results/round-40/T09-mark-keyword/trial-1/candidate.php new file mode 100644 index 0000000000000..98b313460bb2b --- /dev/null +++ b/doc-experiment/results/round-40/T09-mark-keyword/trial-1/candidate.php @@ -0,0 +1,29 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text = $processor->get_modifiable_text(); + + if ( '' !== $keyword && str_contains( $text, $keyword ) ) { + $output .= '' . $processor->serialize_token() . ''; + continue; + } + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + return $output; +} diff --git a/doc-experiment/results/round-40/T09-mark-keyword/trial-1/execution.json b/doc-experiment/results/round-40/T09-mark-keyword/trial-1/execution.json new file mode 100644 index 0000000000000..885c82c52faae --- /dev/null +++ b/doc-experiment/results/round-40/T09-mark-keyword/trial-1/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-40/T09-mark-keyword/trial-1/candidate.php", + "function": "mark_keyword", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple-unclosed", + "status": "pass", + "expected": "

hello world

", + "actual": "

hello world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-text-nodes", + "status": "pass", + "expected": "

alpha beta

beta gamma

delta

", + "actual": "

alpha beta

beta gamma

delta

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-attribute-not-wrapped", + "status": "pass", + "expected": "somewhere world", + "actual": "somewhere world", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-encoded-keyword-matches", + "status": "pass", + "expected": "

world peace

", + "actual": "

world peace

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "split-across-elements-no-match", + "status": "pass", + "expected": "

world

", + "actual": "

world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-comment-not-wrapped", + "status": "pass", + "expected": "

world

", + "actual": "

world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive", + "status": "pass", + "expected": "

World world

", + "actual": "

World world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "normalization-side-effects", + "status": "pass", + "expected": "
bold world

unclosed & markup

", + "actual": "
bold world

unclosed & markup

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-40/T09-mark-keyword/trial-1/response.json b/doc-experiment/results/round-40/T09-mark-keyword/trial-1/response.json new file mode 100644 index 0000000000000..d3ac9b9c02876 --- /dev/null +++ b/doc-experiment/results/round-40/T09-mark-keyword/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, walks every token with `next_token()`, detects only ordinary `#text` tokens via `get_token_type()`, checks the decoded text from `get_modifiable_text()` with a case-sensitive `str_contains()`, and emits normalized output by concatenating `serialize_token()`, inserting `...` only around matching text tokens. Because special elements like `script`, `style`, `title`, and `textarea` do not produce ordinary `#text` child tokens, they are left unwrapped as required.", + "confidence": 86 +} diff --git a/doc-experiment/results/round-40/T09-mark-keyword/trial-2/candidate.php b/doc-experiment/results/round-40/T09-mark-keyword/trial-2/candidate.php new file mode 100644 index 0000000000000..085c53f9a41f3 --- /dev/null +++ b/doc-experiment/results/round-40/T09-mark-keyword/trial-2/candidate.php @@ -0,0 +1,29 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text = $processor->get_modifiable_text(); + + if ( str_contains( $text, $keyword ) ) { + $output .= '' . $processor->serialize_token() . ''; + continue; + } + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + return $output; +} diff --git a/doc-experiment/results/round-40/T09-mark-keyword/trial-2/execution.json b/doc-experiment/results/round-40/T09-mark-keyword/trial-2/execution.json new file mode 100644 index 0000000000000..b605f696a781d --- /dev/null +++ b/doc-experiment/results/round-40/T09-mark-keyword/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-40/T09-mark-keyword/trial-2/candidate.php", + "function": "mark_keyword", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple-unclosed", + "status": "pass", + "expected": "

hello world

", + "actual": "

hello world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-text-nodes", + "status": "pass", + "expected": "

alpha beta

beta gamma

delta

", + "actual": "

alpha beta

beta gamma

delta

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-attribute-not-wrapped", + "status": "pass", + "expected": "somewhere world", + "actual": "somewhere world", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-encoded-keyword-matches", + "status": "pass", + "expected": "

world peace

", + "actual": "

world peace

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "split-across-elements-no-match", + "status": "pass", + "expected": "

world

", + "actual": "

world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-comment-not-wrapped", + "status": "pass", + "expected": "

world

", + "actual": "

world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive", + "status": "pass", + "expected": "

World world

", + "actual": "

World world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "normalization-side-effects", + "status": "pass", + "expected": "
bold world

unclosed & markup

", + "actual": "
bold world

unclosed & markup

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-40/T09-mark-keyword/trial-2/response.json b/doc-experiment/results/round-40/T09-mark-keyword/trial-2/response.json new file mode 100644 index 0000000000000..16e90b4ffbb29 --- /dev/null +++ b/doc-experiment/results/round-40/T09-mark-keyword/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment, then walks every token with `next_token()` and rebuilds normalized output with `serialize_token()`. Only ordinary `#text` tokens are tested, using decoded text from `get_modifiable_text()` and a case-sensitive `str_contains()` check; matching text tokens are emitted inside `...`, while special text-bearing elements like `script`, `style`, `title`, and `textarea` are left alone because they are not `#text` tokens.", + "confidence": 86 +} diff --git a/doc-experiment/results/round-40/T09-mark-keyword/trial-3/candidate.php b/doc-experiment/results/round-40/T09-mark-keyword/trial-3/candidate.php new file mode 100644 index 0000000000000..dd5f45c241055 --- /dev/null +++ b/doc-experiment/results/round-40/T09-mark-keyword/trial-3/candidate.php @@ -0,0 +1,29 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text = $processor->get_modifiable_text(); + + if ( '' !== $text && str_contains( $text, $keyword ) ) { + $output .= '' . $processor->serialize_token() . ''; + continue; + } + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + return $output; +} diff --git a/doc-experiment/results/round-40/T09-mark-keyword/trial-3/execution.json b/doc-experiment/results/round-40/T09-mark-keyword/trial-3/execution.json new file mode 100644 index 0000000000000..b8e73b3d28a7c --- /dev/null +++ b/doc-experiment/results/round-40/T09-mark-keyword/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-40/T09-mark-keyword/trial-3/candidate.php", + "function": "mark_keyword", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple-unclosed", + "status": "pass", + "expected": "

hello world

", + "actual": "

hello world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-text-nodes", + "status": "pass", + "expected": "

alpha beta

beta gamma

delta

", + "actual": "

alpha beta

beta gamma

delta

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-attribute-not-wrapped", + "status": "pass", + "expected": "somewhere world", + "actual": "somewhere world", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-encoded-keyword-matches", + "status": "pass", + "expected": "

world peace

", + "actual": "

world peace

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "split-across-elements-no-match", + "status": "pass", + "expected": "

world

", + "actual": "

world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-comment-not-wrapped", + "status": "pass", + "expected": "

world

", + "actual": "

world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive", + "status": "pass", + "expected": "

World world

", + "actual": "

World world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "normalization-side-effects", + "status": "pass", + "expected": "
bold world

unclosed & markup

", + "actual": "
bold world

unclosed & markup

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-40/T09-mark-keyword/trial-3/response.json b/doc-experiment/results/round-40/T09-mark-keyword/trial-3/response.json new file mode 100644 index 0000000000000..27dbf64dd5a15 --- /dev/null +++ b/doc-experiment/results/round-40/T09-mark-keyword/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment, walks every token with `next_token()`, checks only ordinary `#text` tokens via `get_token_type()` and decoded text from `get_modifiable_text()`, and rebuilds normalized HTML with `serialize_token()`, inserting `...` around any matching text token so special text-bearing elements like `script`, `style`, `title`, and `textarea` are left untouched.", + "confidence": 76 +} diff --git a/doc-experiment/results/round-40/T12-unwrap-spans/judge.json b/doc-experiment/results/round-40/T12-unwrap-spans/judge.json new file mode 100644 index 0000000000000..3e499271af52f --- /dev/null +++ b/doc-experiment/results/round-40/T12-unwrap-spans/judge.json @@ -0,0 +1,45 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Processor::create_fragment() for a BODY fragment, walked all tokens with next_token(), skipped SPAN opener and closer tokens via documented get_tag() behavior, and rebuilt normalized output with serialize_token(). All called methods are present in the rendered docs and execution recorded no _doing_it_wrong entries." + }, + { + "trial_id": "trial-2", + "adherence": 95, + "hallucinated_methods": [], + "notes": "Main path is correct and fully documented: HTML Processor fragment parsing, token walking, get_tag(), serialize_token(), and get_last_error(). The only adherence issue is the error fallback: calling WP_HTML_Processor::normalize( $html ) on the original input after a rewrite is exactly the pattern the serialize_token() docs warn can discard emitted changes, although it did not affect these tests." + }, + { + "trial_id": "trial-3", + "adherence": 94, + "hallucinated_methods": [], + "notes": "Correct processor choice and idiomatic token serialization; the #tag guard is documented and conservative. All methods are documented and no _doing_it_wrong entries occurred. The weakness is error handling: returning raw $html on create_fragment() failure or get_last_error() violates the normalized-output contract and is not a graceful fallback for unsupported markup." + } + ], + "failure_analysis": "No hidden case failed: every trial passed 7/7. The docs did well on the core path. The HTML Processor overview and HTML Support sections clearly point users to WP_HTML_Processor for structure and normalized output; create_fragment() identifies BODY-fragment parsing; next_token() explains visiting text, openers, closers, implied closers, and unclosed elements; serialize_token() gives a near-direct general recipe for token-by-token rewrites that skip element tokens while preserving contents. The near-misses were around fallback policy. Trial 2 used normalize() on the original input in an error branch despite the serialize_token() warning that this discards loop changes. Trial 3 returned raw input on parser failure, which the docs discourage indirectly but do not make concrete enough for string-returning filters that promise normalized output.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::serialize_token() / Recipe: rewrite while serializing tokens", + "problem": "The docs say to reject or fall back on get_last_error(), but 'fall back' is underspecified. Models may normalize the original input or return raw input, both of which abandon the token-rewrite decisions.", + "suggestion": "Add a short fallback contract: after a token-rewrite loop, fallback must either signal failure according to the caller contract or reproduce the same transformation with another parser; returning the original input or normalizing the original input discards the rewrite." + }, + { + "location": "WP_HTML_Processor::create_fragment() return docs", + "problem": "The null return is documented but not tied to caller output obligations. This encouraged ad hoc raw-input fallback in one trial.", + "suggestion": "Document the conditions under which null can be returned and state that callers promising normalized output should not treat unprocessed input as a normalized fallback." + }, + { + "location": "WP_HTML_Processor::get_tag()", + "problem": "The get_tag() contract says it returns the uppercase matched tag, but the opener/closer behavior is clearer in the serialize_token() example than in the method contract itself.", + "suggestion": "State directly that get_tag() returns the element name for both opener and closer tag tokens, and null for non-tag tokens; pair this with is_tag_closer() only when opener/closer distinction matters." + }, + { + "location": "WP_HTML_Processor::next_token() / incomplete input guidance", + "problem": "The docs discuss virtual closers and incomplete trailing syntax, but the distinction is easy to miss: unclosed elements can still produce closing tokens, while truly incomplete trailing syntax may never be visited.", + "suggestion": "Add a compact example contrasting an unclosed but tokenizable element with an incomplete trailing token, showing serialize_token() output and when paused_at_incomplete_token() changes the caller's policy." + } + ] +} diff --git a/doc-experiment/results/round-40/T12-unwrap-spans/trial-1/candidate.php b/doc-experiment/results/round-40/T12-unwrap-spans/trial-1/candidate.php new file mode 100644 index 0000000000000..6e7cc87e73eed --- /dev/null +++ b/doc-experiment/results/round-40/T12-unwrap-spans/trial-1/candidate.php @@ -0,0 +1,25 @@ +next_token() ) { + if ( 'SPAN' === $processor->get_tag() ) { + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + return $output; +} diff --git a/doc-experiment/results/round-40/T12-unwrap-spans/trial-1/execution.json b/doc-experiment/results/round-40/T12-unwrap-spans/trial-1/execution.json new file mode 100644 index 0000000000000..e4d79f5937db4 --- /dev/null +++ b/doc-experiment/results/round-40/T12-unwrap-spans/trial-1/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-40/T12-unwrap-spans/trial-1/candidate.php", + "function": "unwrap_spans", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

a b c d

", + "actual": "

a b c d

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-spans", + "status": "pass", + "expected": "

outer inner tail

", + "actual": "

outer inner tail

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-spans-normalized-passthrough", + "status": "pass", + "expected": "

plain & simple

", + "actual": "

plain & simple

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attributes-discarded", + "status": "pass", + "expected": "styled", + "actual": "styled", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "adjacent-spans", + "status": "pass", + "expected": "

ab

", + "actual": "

ab

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "span-with-block-content", + "status": "pass", + "expected": "
before after
", + "actual": "
before after
", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-span", + "status": "pass", + "expected": "

runs to end

", + "actual": "

runs to end

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-40/T12-unwrap-spans/trial-1/response.json b/doc-experiment/results/round-40/T12-unwrap-spans/trial-1/response.json new file mode 100644 index 0000000000000..154cf6040c71b --- /dev/null +++ b/doc-experiment/results/round-40/T12-unwrap-spans/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-context fragment, then walks every token with `next_token()` and rebuilds normalized HTML with `serialize_token()`, skipping both opening and closing `SPAN` tokens so their contents remain in place. It also checks `get_last_error()` because the HTML Processor aborts on unsupported markup.", + "confidence": 84 +} diff --git a/doc-experiment/results/round-40/T12-unwrap-spans/trial-2/candidate.php b/doc-experiment/results/round-40/T12-unwrap-spans/trial-2/candidate.php new file mode 100644 index 0000000000000..477a02063697d --- /dev/null +++ b/doc-experiment/results/round-40/T12-unwrap-spans/trial-2/candidate.php @@ -0,0 +1,26 @@ +next_token() ) { + if ( 'SPAN' === $processor->get_tag() ) { + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + $normalized = WP_HTML_Processor::normalize( $html ); + return null === $normalized ? '' : $normalized; + } + + return $output; +} diff --git a/doc-experiment/results/round-40/T12-unwrap-spans/trial-2/execution.json b/doc-experiment/results/round-40/T12-unwrap-spans/trial-2/execution.json new file mode 100644 index 0000000000000..a1bec70ac8ace --- /dev/null +++ b/doc-experiment/results/round-40/T12-unwrap-spans/trial-2/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-40/T12-unwrap-spans/trial-2/candidate.php", + "function": "unwrap_spans", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

a b c d

", + "actual": "

a b c d

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-spans", + "status": "pass", + "expected": "

outer inner tail

", + "actual": "

outer inner tail

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-spans-normalized-passthrough", + "status": "pass", + "expected": "

plain & simple

", + "actual": "

plain & simple

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attributes-discarded", + "status": "pass", + "expected": "styled", + "actual": "styled", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "adjacent-spans", + "status": "pass", + "expected": "

ab

", + "actual": "

ab

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "span-with-block-content", + "status": "pass", + "expected": "
before after
", + "actual": "
before after
", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-span", + "status": "pass", + "expected": "

runs to end

", + "actual": "

runs to end

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-40/T12-unwrap-spans/trial-2/response.json b/doc-experiment/results/round-40/T12-unwrap-spans/trial-2/response.json new file mode 100644 index 0000000000000..6e871903eb4c1 --- /dev/null +++ b/doc-experiment/results/round-40/T12-unwrap-spans/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment, walks every token with `next_token()`, skips both opening and closing `SPAN` tokens by checking `get_tag()`, and appends all other tokens via `serialize_token()`. That produces normalized HTML while preserving the contents formerly inside spans in place.", + "confidence": 86 +} diff --git a/doc-experiment/results/round-40/T12-unwrap-spans/trial-3/candidate.php b/doc-experiment/results/round-40/T12-unwrap-spans/trial-3/candidate.php new file mode 100644 index 0000000000000..86babdb1d3f96 --- /dev/null +++ b/doc-experiment/results/round-40/T12-unwrap-spans/trial-3/candidate.php @@ -0,0 +1,24 @@ +next_token() ) { + if ( '#tag' === $processor->get_token_type() && 'SPAN' === $processor->get_tag() ) { + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + return $output; +} diff --git a/doc-experiment/results/round-40/T12-unwrap-spans/trial-3/execution.json b/doc-experiment/results/round-40/T12-unwrap-spans/trial-3/execution.json new file mode 100644 index 0000000000000..eaa741b9c840e --- /dev/null +++ b/doc-experiment/results/round-40/T12-unwrap-spans/trial-3/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-40/T12-unwrap-spans/trial-3/candidate.php", + "function": "unwrap_spans", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

a b c d

", + "actual": "

a b c d

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-spans", + "status": "pass", + "expected": "

outer inner tail

", + "actual": "

outer inner tail

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-spans-normalized-passthrough", + "status": "pass", + "expected": "

plain & simple

", + "actual": "

plain & simple

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attributes-discarded", + "status": "pass", + "expected": "styled", + "actual": "styled", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "adjacent-spans", + "status": "pass", + "expected": "

ab

", + "actual": "

ab

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "span-with-block-content", + "status": "pass", + "expected": "
before after
", + "actual": "
before after
", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-span", + "status": "pass", + "expected": "

runs to end

", + "actual": "

runs to end

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-40/T12-unwrap-spans/trial-3/response.json b/doc-experiment/results/round-40/T12-unwrap-spans/trial-3/response.json new file mode 100644 index 0000000000000..de6da4dd4d9b1 --- /dev/null +++ b/doc-experiment/results/round-40/T12-unwrap-spans/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-context fragment, walks every token with `next_token()`, skips any token whose tag name is `SPAN` so both span openers and closers are removed, and appends every other token\u2019s normalized serialization with `serialize_token()`. It also checks `get_last_error()` because the HTML Processor documentation says unsupported markup should be rejected or handled by fallback.", + "confidence": 84 +} diff --git a/doc-experiment/results/round-40/codex-judges-output.json b/doc-experiment/results/round-40/codex-judges-output.json new file mode 100644 index 0000000000000..4aa6f84796837 --- /dev/null +++ b/doc-experiment/results/round-40/codex-judges-output.json @@ -0,0 +1,143 @@ +{ + "result": [ + { + "id": "T09-mark-keyword", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly chose WP_HTML_Processor::create_fragment(), walked tokens with next_token(), limited matching to #text tokens, used get_modifiable_text() for decoded text, and rebuilt normalized output with serialize_token(). All HTML API methods used are documented; execution recorded no _doing_it_wrong entries." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Essentially the canonical pattern: HTML Processor fragment parser, token walk, #text guard, decoded text via get_modifiable_text(), token-by-token serialization with inserted wrappers, and get_last_error() fallback. All called API methods appear in the rendered docs; no misuse records." + }, + { + "trial_id": "trial-3", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Uses the right processor and documented token-serialization pattern. Returning the original input on create_fragment() failure or get_last_error() is a defensible fallback but slightly less aligned with the task’s normalized-output contract than rejecting with an empty string or another explicit failure policy. No undocumented API calls or _doing_it_wrong records." + } + ], + "failure_analysis": "All three trials passed all 8 hidden/frozen cases, so there are no failed cases to attribute to API misunderstanding. The docs did well in four places: the HTML Processor overview explicitly says to use WP_HTML_Processor, not WP_HTML_Tag_Processor, when structure or normalized output matters; the “collect DOM-style text from a subtree” recipe says ordinary text is only #text tokens and warns that comments and special-element opener tokens can also have modifiable text; get_modifiable_text() documents that #text, TITLE, and TEXTAREA text is already decoded; and serialize_token() documents the exact rewrite pattern the candidates needed, including wrapping selected tokens while accumulating normalized serialization. Near-misses: all candidates added get_last_error() fallback logic even though the task did not specify unsupported-markup behavior, and none checked paused_at_incomplete_token(); the serialize_token() docs say this is a caller policy decision, which likely prevented a functional issue here.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::serialize_token() / rewrite-while-serializing recipe", + "problem": "The docs explain token-by-token rewrites but do not give a minimal example that inserts wrapper markup around ordinary text tokens specifically.", + "suggestion": "Add a general example showing how to wrap or annotate selected #text tokens while serializing, emphasizing that the emitted wrapper string plus serialize_token() is the output and get_updated_html() is not involved." + }, + { + "location": "WP_HTML_Processor::get_last_error() and serialize_token() incomplete-input notes", + "problem": "The docs say to reject or fall back on get_last_error() and separately decide on paused_at_incomplete_token(), but they do not define common output policies for normalizing filters versus strict validators.", + "suggestion": "Add a short policy table: best-effort normalizer may omit unvisited incomplete trailing syntax; strict transformations should reject when paused_at_incomplete_token() is true; unsupported-parser aborts require an explicit fallback." + }, + { + "location": "WP_HTML_Tag_Processor::get_modifiable_text() / WP_HTML_Processor text recipes", + "problem": "The distinction between ordinary DOM text and other modifiable text is documented well, but it is spread across both class docs.", + "suggestion": "Add a compact cross-link near get_modifiable_text(): “For DOM text matching, first require get_token_type() === '#text'; otherwise comments and special elements may match too.”" + } + ] + } + }, + { + "id": "T12-unwrap-spans", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Processor::create_fragment() for a BODY fragment, walked all tokens with next_token(), skipped SPAN opener and closer tokens via documented get_tag() behavior, and rebuilt normalized output with serialize_token(). All called methods are present in the rendered docs and execution recorded no _doing_it_wrong entries." + }, + { + "trial_id": "trial-2", + "adherence": 95, + "hallucinated_methods": [], + "notes": "Main path is correct and fully documented: HTML Processor fragment parsing, token walking, get_tag(), serialize_token(), and get_last_error(). The only adherence issue is the error fallback: calling WP_HTML_Processor::normalize( $html ) on the original input after a rewrite is exactly the pattern the serialize_token() docs warn can discard emitted changes, although it did not affect these tests." + }, + { + "trial_id": "trial-3", + "adherence": 94, + "hallucinated_methods": [], + "notes": "Correct processor choice and idiomatic token serialization; the #tag guard is documented and conservative. All methods are documented and no _doing_it_wrong entries occurred. The weakness is error handling: returning raw $html on create_fragment() failure or get_last_error() violates the normalized-output contract and is not a graceful fallback for unsupported markup." + } + ], + "failure_analysis": "No hidden case failed: every trial passed 7/7. The docs did well on the core path. The HTML Processor overview and HTML Support sections clearly point users to WP_HTML_Processor for structure and normalized output; create_fragment() identifies BODY-fragment parsing; next_token() explains visiting text, openers, closers, implied closers, and unclosed elements; serialize_token() gives a near-direct general recipe for token-by-token rewrites that skip element tokens while preserving contents. The near-misses were around fallback policy. Trial 2 used normalize() on the original input in an error branch despite the serialize_token() warning that this discards loop changes. Trial 3 returned raw input on parser failure, which the docs discourage indirectly but do not make concrete enough for string-returning filters that promise normalized output.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::serialize_token() / Recipe: rewrite while serializing tokens", + "problem": "The docs say to reject or fall back on get_last_error(), but 'fall back' is underspecified. Models may normalize the original input or return raw input, both of which abandon the token-rewrite decisions.", + "suggestion": "Add a short fallback contract: after a token-rewrite loop, fallback must either signal failure according to the caller contract or reproduce the same transformation with another parser; returning the original input or normalizing the original input discards the rewrite." + }, + { + "location": "WP_HTML_Processor::create_fragment() return docs", + "problem": "The null return is documented but not tied to caller output obligations. This encouraged ad hoc raw-input fallback in one trial.", + "suggestion": "Document the conditions under which null can be returned and state that callers promising normalized output should not treat unprocessed input as a normalized fallback." + }, + { + "location": "WP_HTML_Processor::get_tag()", + "problem": "The get_tag() contract says it returns the uppercase matched tag, but the opener/closer behavior is clearer in the serialize_token() example than in the method contract itself.", + "suggestion": "State directly that get_tag() returns the element name for both opener and closer tag tokens, and null for non-tag tokens; pair this with is_tag_closer() only when opener/closer distinction matters." + }, + { + "location": "WP_HTML_Processor::next_token() / incomplete input guidance", + "problem": "The docs discuss virtual closers and incomplete trailing syntax, but the distinction is easy to miss: unclosed elements can still produce closing tokens, while truly incomplete trailing syntax may never be visited.", + "suggestion": "Add a compact example contrasting an unclosed but tokenizable element with an incomplete trailing token, showing serialize_token() output and when paused_at_incomplete_token() changes the caller's policy." + } + ] + } + }, + { + "id": "N04-normalize-or-placeholder", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the correct structural processor and the documented `WP_HTML_Processor::normalize()` shortcut for BODY-context fragment normalization. The method exists in `html-processor.md`; no undocumented calls or `_doing_it_wrong` records. Correctly treats only `null` as unsupported, preserving valid empty-string output." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same documented approach as the reference: `WP_HTML_Processor::normalize()` followed by a strict `null` fallback check. No hallucinated API usage, no `_doing_it_wrong`, and the implementation relies on the documented normalization contract instead of unnecessary token walking." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly selected `WP_HTML_Processor` for normalized output and used the documented static `normalize()` method. No undocumented methods. The strict `null === $normalized` check handles unsupported markup without confusing empty normalized output with failure." + } + ], + "failure_analysis": "All trials passed all 7 hidden cases. The docs succeeded mainly because `html-tag-processor.md` explicitly says to use the HTML Processor for normalized output, while `html-processor.md` documents `WP_HTML_Processor::normalize()` as a BODY-context fragment normalizer returning `string|null`. The `normalize()` section also lists normalization effects such as quoted attributes, inserted omitted tags, text re-encoding, and omitted incomplete trailing syntax, which directly covers the successful table, attribute, entity, and unclosed-tag cases. The unsupported-markup overview explains that unsupported input aborts processing and output-producing methods such as `serialize()` and `normalize()` return `null`, which explains the fallback behavior for misnested formatting and anchor misnesting. Near miss: unsupported cases emitted `trigger_error` records from internal serialization, but there were no `_doing_it_wrong` records and the candidates handled the returned `null` correctly. The docs could be clearer that these warnings may accompany a `null` result.", + "doc_gaps": [ + { + "location": "`html-processor.md` `normalize()` docblock", + "problem": "The return contract says `string|null`, but it does not explicitly warn that an empty input can legitimately normalize to `''` while unsupported input returns `null`. Less careful readers could use a truthiness check and incorrectly replace empty valid output with a fallback.", + "suggestion": "Add a contract note: callers should test `null === WP_HTML_Processor::normalize( $html )` for failure; empty strings are valid normalized output." + }, + { + "location": "`html-processor.md` unsupported-markup overview and `normalize()`/`serialize()` docs", + "problem": "Execution shows unsupported normalization may also emit a `trigger_error` from serialization while returning `null`. The rendered docs describe the `null` result but not the possible warning side effect.", + "suggestion": "Document whether `normalize()`/`serialize()` may emit a warning on unsupported markup, and how callers should handle that if they intentionally use `null` as a recoverable fallback signal." + }, + { + "location": "`html-processor.md` `normalize()` examples", + "problem": "The examples demonstrate successful normalization but not the distinct unsupported-output path. The unsupported behavior is described elsewhere, so readers must connect two sections.", + "suggestion": "Add a general unsupported-markup example showing that normalization can return `null`, without prescribing any task-specific fallback HTML." + }, + { + "location": "`html-processor.md` `normalize()` docblock", + "problem": "The distinction between recoverable malformed HTML, incomplete trailing syntax, and unsupported parser aborts is spread across bullets and the broader overview.", + "suggestion": "Add a compact summary: ordinary omitted tags are completed, incomplete trailing syntax may be omitted, and only unsupported parser states cause a `null` result." + } + ] + } + } + ] +} diff --git a/doc-experiment/results/round-40/codex-trials-output.json b/doc-experiment/results/round-40/codex-trials-output.json new file mode 100644 index 0000000000000..c741728e85995 --- /dev/null +++ b/doc-experiment/results/round-40/codex-trials-output.json @@ -0,0 +1,95 @@ +{ + "subject_isolation": { + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-40/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." + }, + "result": [ + { + "id": "T09-mark-keyword", + "trial": 1, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text = $processor->get_modifiable_text();\n\n if ( '' !== $keyword && str_contains( $text, $keyword ) ) {\n $output .= '' . $processor->serialize_token() . '';\n continue;\n }\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return '';\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, walks every token with `next_token()`, detects only ordinary `#text` tokens via `get_token_type()`, checks the decoded text from `get_modifiable_text()` with a case-sensitive `str_contains()`, and emits normalized output by concatenating `serialize_token()`, inserting `...` only around matching text tokens. Because special elements like `script`, `style`, `title`, and `textarea` do not produce ordinary `#text` child tokens, they are left unwrapped as required.", + "confidence": 86 + }, + { + "id": "T09-mark-keyword", + "trial": 2, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text = $processor->get_modifiable_text();\n\n if ( str_contains( $text, $keyword ) ) {\n $output .= '' . $processor->serialize_token() . '';\n continue;\n }\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return '';\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment, then walks every token with `next_token()` and rebuilds normalized output with `serialize_token()`. Only ordinary `#text` tokens are tested, using decoded text from `get_modifiable_text()` and a case-sensitive `str_contains()` check; matching text tokens are emitted inside `...`, while special text-bearing elements like `script`, `style`, `title`, and `textarea` are left alone because they are not `#text` tokens.", + "confidence": 86 + }, + { + "id": "T09-mark-keyword", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text = $processor->get_modifiable_text();\n\n if ( '' !== $text && str_contains( $text, $keyword ) ) {\n $output .= '' . $processor->serialize_token() . '';\n continue;\n }\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return $html;\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment, walks every token with `next_token()`, checks only ordinary `#text` tokens via `get_token_type()` and decoded text from `get_modifiable_text()`, and rebuilds normalized HTML with `serialize_token()`, inserting `...` around any matching text token so special text-bearing elements like `script`, `style`, `title`, and `textarea` are left untouched.", + "confidence": 76 + }, + { + "id": "T12-unwrap-spans", + "trial": 1, + "ok": true, + "code": "next_token() ) {\n if ( 'SPAN' === $processor->get_tag() ) {\n continue;\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return '';\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-context fragment, then walks every token with `next_token()` and rebuilds normalized HTML with `serialize_token()`, skipping both opening and closing `SPAN` tokens so their contents remain in place. It also checks `get_last_error()` because the HTML Processor aborts on unsupported markup.", + "confidence": 84 + }, + { + "id": "T12-unwrap-spans", + "trial": 2, + "ok": true, + "code": "next_token() ) {\n if ( 'SPAN' === $processor->get_tag() ) {\n continue;\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n $normalized = WP_HTML_Processor::normalize( $html );\n return null === $normalized ? '' : $normalized;\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment, walks every token with `next_token()`, skips both opening and closing `SPAN` tokens by checking `get_tag()`, and appends all other tokens via `serialize_token()`. That produces normalized HTML while preserving the contents formerly inside spans in place.", + "confidence": 86 + }, + { + "id": "T12-unwrap-spans", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n if ( '#tag' === $processor->get_token_type() && 'SPAN' === $processor->get_tag() ) {\n continue;\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return $html;\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-context fragment, walks every token with `next_token()`, skips any token whose tag name is `SPAN` so both span openers and closers are removed, and appends every other token’s normalized serialization with `serialize_token()`. It also checks `get_last_error()` because the HTML Processor documentation says unsupported markup should be rejected or handled by fallback.", + "confidence": 84 + }, + { + "id": "N04-normalize-or-placeholder", + "trial": 1, + "ok": true, + "code": "Unsupported HTML

'\n\t\t: $normalized;\n}\n", + "explanation": "This uses the documented `WP_HTML_Processor::normalize()` API, which normalizes a BODY-context HTML fragment and returns `null` when the HTML Processor cannot normalize unsupported input. The function returns that normalized serialization when available, otherwise the exact fallback placeholder HTML.", + "confidence": 98 + }, + { + "id": "N04-normalize-or-placeholder", + "trial": 2, + "ok": true, + "code": "Unsupported HTML

';\n }\n\n return $normalized;\n}\n", + "explanation": "This uses the documented `WP_HTML_Processor::normalize()` API, which normalizes an HTML fragment in BODY context and returns `null` when the HTML Processor cannot normalize unsupported input. The function simply returns that normalized serialization or the exact fallback placeholder when normalization fails.", + "confidence": 98 + }, + { + "id": "N04-normalize-or-placeholder", + "trial": 3, + "ok": true, + "code": "Unsupported HTML

';\n\t}\n\n\treturn $normalized;\n}\n", + "explanation": "This uses `WP_HTML_Processor::normalize()` because the task is to normalize an HTML fragment in BODY context, which the documentation says this static method handles directly and returns `null` when normalization is not possible due to unsupported markup. The function therefore returns the documented fallback placeholder only when `normalize()` returns `null`.", + "confidence": 98 + } + ] +} diff --git a/doc-experiment/results/round-40/round-metadata.json b/doc-experiment/results/round-40/round-metadata.json new file mode 100644 index 0000000000000..b07982751f6f0 --- /dev/null +++ b/doc-experiment/results/round-40/round-metadata.json @@ -0,0 +1,125 @@ +{ + "round": "round-40", + "mode": "shadow-doc-a/b", + "task_ids": [ + "T09-mark-keyword", + "T12-unwrap-spans", + "N04-normalize-or-placeholder" + ], + "task_count": 3, + "splits": { + "train": 3 + }, + "concepts": { + "normalization": 1, + "serialization": 2 + }, + "trials_per_task": 3, + "subject": { + "model": "gpt-5.4", + "reasoning_effort": "medium", + "service_tier": "priority" + }, + "judge": { + "model": "gpt-5.5", + "reasoning_effort": "xhigh", + "service_tier": "priority" + }, + "git_head": "95739cdec1a49c4d597caa10d028dedde3251ae8", + "git_status_short": "", + "source_file_digests": { + "ref": "95739cdec1a49c4d597caa10d028dedde3251ae8", + "algorithm": "sha256", + "php_behavior_fingerprint": "token_get_all with T_COMMENT, T_DOC_COMMENT, and T_WHITESPACE removed; JSON-encoded token names and text", + "files": { + "src/wp-includes/html-api/class-wp-html-tag-processor.php": { + "source_sha256": "9e431d345cc6d5563a65e3456b8a944fe1def8ceb8db299e89f20e8e9f560058", + "php_without_comments_sha256": "d073f5772a64826878e4132ed9bb30717fb1b6f3ddb5e23a014a4fa9db218ad7", + "php_without_comments_token_count": 9881 + }, + "src/wp-includes/html-api/class-wp-html-processor.php": { + "source_sha256": "b115e956af65f69b4e07c7e761ccc9a49464ba3caf1f66944ed8eb3794dce472", + "php_without_comments_sha256": "2da9aa482f295c89a7018c5b74b622b4add7e942599c97785434408cfa72c083", + "php_without_comments_token_count": 16806 + } + } + }, + "corpus_file_digests": { + "ref": "95739cdec1a49c4d597caa10d028dedde3251ae8", + "algorithm": "sha256", + "tasks": { + "T09-mark-keyword": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "serialization", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T09-mark-keyword/task.md": "5abba63d508cb56284ce7a02e374d35e4cdd45958d664174eb3002c13f1c3bce", + "doc-experiment/corpus/T09-mark-keyword/reference.php": "0141ececbe459f50729e657f59dd75e8b74911cd1d70940b8fd5072d2f1efb60", + "doc-experiment/corpus/T09-mark-keyword/tests.json": "946bb96c922fea63cda3c31b94840b05ffded9f2e1dfa4b1547c755b99cda0c5" + } + }, + "T12-unwrap-spans": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "serialization", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T12-unwrap-spans/task.md": "e95fcebf3ce7e2fd9004c4ff82b66a77d712b81b1317f3d020c4c89ace25eb6b", + "doc-experiment/corpus/T12-unwrap-spans/reference.php": "93d6828161817ec9a0b90765d91e7dae1c48f3b3aac35f6cad6f785b1c715797", + "doc-experiment/corpus/T12-unwrap-spans/tests.json": "fcade4b9f46bbb824b0095f58b99d4c71c8b3e696093ab0d0ae25d666aae5f53" + } + }, + "N04-normalize-or-placeholder": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "normalization", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/N04-normalize-or-placeholder/task.md": "cef7cd04d639b6b7b435be2d33f3435b8b76f0a4d83748aa7c232204746ad3d0", + "doc-experiment/corpus/N04-normalize-or-placeholder/reference.php": "e47feb6d3be887e1ee5df77e39160ffe812cb8322ca8600898de2a34e56ddeed", + "doc-experiment/corpus/N04-normalize-or-placeholder/tests.json": "417d2f770b341fc46bb8b7b46df899f1fd1e65d343aa8cc6c4e39307afcb4d18" + } + } + } + }, + "created_at_utc": "2026-06-13T15:07:08+00:00", + "isolation": { + "scratch_contains": [ + "html-tag-processor.md", + "html-processor.md", + "tasks/.md" + ], + "subjects_must_not_read": [ + "reference.php", + "tests.json", + "source files", + "logs", + "plans", + "hypothesis docs" + ] + }, + "scratch": "/tmp/html-api-docs-eval/round-40", + "staged_task_files": [ + "tasks/T09-mark-keyword.md", + "tasks/T12-unwrap-spans.md", + "tasks/N04-normalize-or-placeholder.md" + ], + "scratch_isolation_check": "OK: /tmp/html-api-docs-eval/round-40 exposes 2 docs and 3 task prompt(s), with no forbidden files.", + "scratch_file_sha256": { + "html-processor.md": "4a4e64bbb3c43c248cb948ca752a01674a3dedc4eb77843d6fb7e63ea0a1f6ea", + "html-tag-processor.md": "8ca4ec32aa54723fe16a0a9f45386ca6ebeecde7a06575d8a782e53fb395a664", + "tasks/N04-normalize-or-placeholder.md": "cef7cd04d639b6b7b435be2d33f3435b8b76f0a4d83748aa7c232204746ad3d0", + "tasks/T09-mark-keyword.md": "5abba63d508cb56284ce7a02e374d35e4cdd45958d664174eb3002c13f1c3bce", + "tasks/T12-unwrap-spans.md": "e95fcebf3ce7e2fd9004c4ff82b66a77d712b81b1317f3d020c4c89ace25eb6b" + } +} diff --git a/doc-experiment/results/round-40/round-summary.json b/doc-experiment/results/round-40/round-summary.json new file mode 100644 index 0000000000000..f69bda6a0b7c7 --- /dev/null +++ b/doc-experiment/results/round-40/round-summary.json @@ -0,0 +1,154 @@ +{ + "round_score": 99.57, + "core_score": 99.57, + "by_split": { + "train": 99.57 + }, + "by_concept": { + "normalization": 100.0, + "serialization": 99.35 + }, + "tasks": { + "T09-mark-keyword": { + "score": 99.8, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 98, + "score": 99.4 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "serialization", + "processor": "html", + "split": "train" + } + }, + "T12-unwrap-spans": { + "score": 98.9, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 95, + "score": 98.5 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 94, + "score": 98.2 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "serialization", + "processor": "html", + "split": "train" + } + }, + "N04-normalize-or-placeholder": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "normalization", + "processor": "html", + "split": "train" + } + } + }, + "round_metadata": { + "round": "round-40", + "mode": "shadow-doc-a/b", + "task_ids": [ + "T09-mark-keyword", + "T12-unwrap-spans", + "N04-normalize-or-placeholder" + ], + "task_count": 3, + "trials_per_task": 3, + "subject": { + "model": "gpt-5.4", + "reasoning_effort": "medium", + "service_tier": "priority" + }, + "judge": { + "model": "gpt-5.5", + "reasoning_effort": "xhigh", + "service_tier": "priority" + }, + "git_head": "95739cdec1a49c4d597caa10d028dedde3251ae8", + "git_status_short": "" + }, + "subject_isolation": { + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-40/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." + } +} diff --git a/doc-experiment/results/round-40/subject-isolation.json b/doc-experiment/results/round-40/subject-isolation.json new file mode 100644 index 0000000000000..f74229fb07592 --- /dev/null +++ b/doc-experiment/results/round-40/subject-isolation.json @@ -0,0 +1,19 @@ +{ + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-40/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." +} diff --git a/doc-experiment/results/round-41/N04-normalize-or-placeholder/judge.json b/doc-experiment/results/round-41/N04-normalize-or-placeholder/judge.json new file mode 100644 index 0000000000000..9e68f04d74446 --- /dev/null +++ b/doc-experiment/results/round-41/N04-normalize-or-placeholder/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly chose `WP_HTML_Processor` and the documented static `normalize()` API for BODY-fragment normalization. The strict `null` check preserves valid empty output. No undocumented calls or `_doing_it_wrong` records; unsupported-case warnings came from the API's internal serialization path, not candidate misuse." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same fully adherent implementation as trial-1. Uses the documented `WP_HTML_Processor::normalize(string): string|null` contract directly and handles `null` separately from `''`." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same fully adherent implementation as trial-1. Correct processor choice, no hallucinated methods, and idiomatic use of the documented whole-fragment normalization shortcut." + } + ], + "failure_analysis": "All trials passed all 7 hidden cases, so there were no failed hidden cases to attribute to misconceptions. The docs did well in three places: the HTML Processor overview says to choose it for normalized output; the unsupported-markup section says output-producing methods such as `serialize()` and `normalize()` return `null`; and the `normalize()` method section documents BODY-fragment context, normalization effects such as quoted attributes and omitted tags, incomplete trailing syntax omission, and the `string|null` return. The main near-miss is that the successful path depends on readers finding the `normalize()` method rather than over-applying the general create/find/change workflow. Another near-miss is the distinction between `null` failure and valid empty-string output: the candidates handled it correctly, but the docs rely on the return type rather than an explicit example.", + "doc_gaps": [ + { + "location": "`WP_HTML_Processor::normalize()` docblock", + "problem": "The `string|null` return contract is documented, but there is no explicit example showing failure handling or distinguishing `null` from valid empty-string normalized output.", + "suggestion": "Add a short general example that checks `null === WP_HTML_Processor::normalize( $html )` before using the result, and state that an empty input fragment normalizes to `''`, not `null`." + }, + { + "location": "`WP_HTML_Processor` Usage overview", + "problem": "The overview's three-step workflow emphasizes creating a processor, finding a token, and requesting changes, which can obscure the simpler whole-fragment normalization API.", + "suggestion": "Add one sentence near Usage saying that callers who only need a normalized BODY-fragment serialization can call `WP_HTML_Processor::normalize()` directly." + }, + { + "location": "`WP_HTML_Processor::normalize()` / unsupported-markup docs", + "problem": "The docs say unsupported markup causes output methods to return `null`, but they do not make the boundary between unsupported markup and incomplete trailing syntax especially prominent.", + "suggestion": "Clarify that unsupported parser errors return `null`, while incomplete trailing syntax may still produce a normalized string with the incomplete token omitted." + } + ] +} diff --git a/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-1/candidate.php b/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-1/candidate.php new file mode 100644 index 0000000000000..0a1a6ba870259 --- /dev/null +++ b/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-1/candidate.php @@ -0,0 +1,11 @@ +Unsupported HTML

'; + } + + return $normalized; +} diff --git a/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-1/execution.json b/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-1/execution.json new file mode 100644 index 0000000000000..aa5c79c185d05 --- /dev/null +++ b/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-1/execution.json @@ -0,0 +1,83 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-1/candidate.php", + "function": "normalize_or_placeholder", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "unclosed-tags-normalize", + "status": "pass", + "expected": "

Hello

", + "actual": "

Hello

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "table-normalizes", + "status": "pass", + "expected": "
ok
", + "actual": "
ok
", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attribute-quoting-normalizes", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-normalize", + "status": "pass", + "expected": "

Fish & chips

", + "actual": "

Fish & chips

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unsupported-misnested-formatting", + "status": "pass", + "expected": "

Unsupported HTML

", + "actual": "

Unsupported HTML

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [ + { + "function": "WP_HTML_Processor::serialize", + "message": "Cannot serialize HTML Processor with parsing error: unsupported.", + "level": 512 + } + ] + }, + { + "id": "unsupported-anchor-misnesting", + "status": "pass", + "expected": "

Unsupported HTML

", + "actual": "

Unsupported HTML

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [ + { + "function": "WP_HTML_Processor::serialize", + "message": "Cannot serialize HTML Processor with parsing error: unsupported.", + "level": 512 + } + ] + }, + { + "id": "empty-fragment", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-1/response.json b/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-1/response.json new file mode 100644 index 0000000000000..44e39dbe25ad4 --- /dev/null +++ b/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This calls the documented `WP_HTML_Processor::normalize()` helper, which normalizes a BODY-context HTML fragment and returns `null` when the HTML Processor cannot normalize unsupported input; in that failure case, the function returns the exact required fallback placeholder HTML.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-2/candidate.php b/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-2/candidate.php new file mode 100644 index 0000000000000..0a1a6ba870259 --- /dev/null +++ b/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-2/candidate.php @@ -0,0 +1,11 @@ +Unsupported HTML

'; + } + + return $normalized; +} diff --git a/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-2/execution.json b/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-2/execution.json new file mode 100644 index 0000000000000..3dd3aabfb5b4a --- /dev/null +++ b/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-2/execution.json @@ -0,0 +1,83 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-2/candidate.php", + "function": "normalize_or_placeholder", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "unclosed-tags-normalize", + "status": "pass", + "expected": "

Hello

", + "actual": "

Hello

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "table-normalizes", + "status": "pass", + "expected": "
ok
", + "actual": "
ok
", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attribute-quoting-normalizes", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-normalize", + "status": "pass", + "expected": "

Fish & chips

", + "actual": "

Fish & chips

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unsupported-misnested-formatting", + "status": "pass", + "expected": "

Unsupported HTML

", + "actual": "

Unsupported HTML

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [ + { + "function": "WP_HTML_Processor::serialize", + "message": "Cannot serialize HTML Processor with parsing error: unsupported.", + "level": 512 + } + ] + }, + { + "id": "unsupported-anchor-misnesting", + "status": "pass", + "expected": "

Unsupported HTML

", + "actual": "

Unsupported HTML

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [ + { + "function": "WP_HTML_Processor::serialize", + "message": "Cannot serialize HTML Processor with parsing error: unsupported.", + "level": 512 + } + ] + }, + { + "id": "empty-fragment", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-2/response.json b/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-2/response.json new file mode 100644 index 0000000000000..8393921e90a44 --- /dev/null +++ b/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses the documented `WP_HTML_Processor::normalize()` static method, which normalizes a BODY-context HTML fragment and returns `null` when the HTML Processor cannot produce normalized output. The function therefore returns the normalized serialization on success and the required `

Unsupported HTML

` fallback when normalization is unavailable.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-3/candidate.php b/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-3/candidate.php new file mode 100644 index 0000000000000..0a1a6ba870259 --- /dev/null +++ b/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-3/candidate.php @@ -0,0 +1,11 @@ +Unsupported HTML

'; + } + + return $normalized; +} diff --git a/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-3/execution.json b/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-3/execution.json new file mode 100644 index 0000000000000..63bf09f2a3f22 --- /dev/null +++ b/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-3/execution.json @@ -0,0 +1,83 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-3/candidate.php", + "function": "normalize_or_placeholder", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "unclosed-tags-normalize", + "status": "pass", + "expected": "

Hello

", + "actual": "

Hello

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "table-normalizes", + "status": "pass", + "expected": "
ok
", + "actual": "
ok
", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attribute-quoting-normalizes", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-normalize", + "status": "pass", + "expected": "

Fish & chips

", + "actual": "

Fish & chips

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unsupported-misnested-formatting", + "status": "pass", + "expected": "

Unsupported HTML

", + "actual": "

Unsupported HTML

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [ + { + "function": "WP_HTML_Processor::serialize", + "message": "Cannot serialize HTML Processor with parsing error: unsupported.", + "level": 512 + } + ] + }, + { + "id": "unsupported-anchor-misnesting", + "status": "pass", + "expected": "

Unsupported HTML

", + "actual": "

Unsupported HTML

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [ + { + "function": "WP_HTML_Processor::serialize", + "message": "Cannot serialize HTML Processor with parsing error: unsupported.", + "level": 512 + } + ] + }, + { + "id": "empty-fragment", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-3/response.json b/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-3/response.json new file mode 100644 index 0000000000000..2f291e8cddb4d --- /dev/null +++ b/doc-experiment/results/round-41/N04-normalize-or-placeholder/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses the documented `WP_HTML_Processor::normalize()` API, which normalizes an HTML fragment in BODY context and returns `null` when the HTML Processor cannot normalize unsupported markup. The function therefore returns the normalized serialization on success and the exact fallback placeholder when normalization fails.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-41/T09-mark-keyword/judge.json b/doc-experiment/results/round-41/T09-mark-keyword/judge.json new file mode 100644 index 0000000000000..691301442a564 --- /dev/null +++ b/doc-experiment/results/round-41/T09-mark-keyword/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Processor::create_fragment(), next_token(), get_token_type(), get_modifiable_text(), and serialize_token() for a BODY-fragment token rewrite. All API calls are documented and execution recorded no _doing_it_wrong entries. Minor adherence issue: after a rewrite loop it falls back to WP_HTML_Processor::normalize($html) when get_last_error() is non-null, which the serialize_token()/normalize docs warn can discard emitted rewrite changes." + }, + { + "trial_id": "trial-2", + "adherence": 99, + "hallucinated_methods": [], + "notes": "Correct processor and documented token-walking pattern. It restricts matching to #text tokens, uses decoded get_modifiable_text(), emits normalized tokens with serialize_token(), and returns an explicit empty-string sentinel on parser error, which the docs allow. Minor inefficiency: serialize_token() is called before knowing whether a #text token matches and may be called again for nonmatching text, but this is not API misuse." + }, + { + "trial_id": "trial-3", + "adherence": 99, + "hallucinated_methods": [], + "notes": "Correct processor and documented token-walking pattern. It checks only #text tokens, reads decoded modifiable text, wraps the current token serialization, and returns an explicit empty-string sentinel on parser error. The extra empty-text guard is redundant because the task says keyword is non-empty, but it does not change the API usage." + } + ], + "failure_analysis": "No hidden case failed in any trial: all three passed 8/8 with no _doing_it_wrong records. The docs appear to have successfully led subjects to the key contracts: the 'Which processor should I use?' guidance points normalized output and implied/missing closing tags to WP_HTML_Processor; the 'Recipe: collect DOM-style text from a subtree' passage says ordinary text is #text only and warns not to treat every token with modifiable text as DOM text; get_modifiable_text() states that #text is decoded while SCRIPT/STYLE/comment text may be raw or non-DOM; and serialize_token() explains the exact token-by-token rewrite pattern. The only near-miss was trial-1's normalize($html) fallback after a rewrite loop, despite the serialize_token()/normalize warnings that normalizing the original fragment is not a way to finish a rewrite. Trials 2 and 3 followed the documented error-policy options more closely by returning a caller-defined empty-string sentinel on get_last_error().", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::serialize_token() and WP_HTML_Processor::normalize()", + "problem": "The docs warn not to call normalize($html) after accumulating a token rewrite, but trial-1 still used it as an error fallback.", + "suggestion": "Add a small anti-pattern/corrected-pattern pair: after a serialize_token() rewrite loop, return the accumulated output on success; on get_last_error(), return the caller's sentinel/null/exception, not normalize($html)." + }, + { + "location": "WP_HTML_Processor::get_modifiable_text() and 'Recipe: collect DOM-style text from a subtree'", + "problem": "The distinction between ordinary DOM #text and broader modifiable text is central and easy to miss because it is described in prose across multiple sections.", + "suggestion": "Add a compact table mapping token type/name to whether get_modifiable_text() is decoded or raw and whether it represents ordinary DOM text. Cross-link it from next_token()." + }, + { + "location": "WP_HTML_Tag_Processor::paused_at_incomplete_token() and token-rewrite recipes", + "problem": "Incomplete trailing syntax behavior is documented, but the strict-vs-best-effort policy after token serialization is not shown as a reusable branch.", + "suggestion": "Add a short post-loop snippet showing both policies: accept accumulated serialization for best-effort normalization, or reject when paused_at_incomplete_token() or get_last_error() is set for complete-source rewrites." + } + ] +} diff --git a/doc-experiment/results/round-41/T09-mark-keyword/trial-1/candidate.php b/doc-experiment/results/round-41/T09-mark-keyword/trial-1/candidate.php new file mode 100644 index 0000000000000..a3c2bec7ec9d1 --- /dev/null +++ b/doc-experiment/results/round-41/T09-mark-keyword/trial-1/candidate.php @@ -0,0 +1,29 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text = $processor->get_modifiable_text(); + if ( str_contains( $text, $keyword ) ) { + $output .= '' . $processor->serialize_token() . ''; + continue; + } + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + $normalized = WP_HTML_Processor::normalize( $html ); + return null === $normalized ? '' : $normalized; + } + + return $output; +} diff --git a/doc-experiment/results/round-41/T09-mark-keyword/trial-1/execution.json b/doc-experiment/results/round-41/T09-mark-keyword/trial-1/execution.json new file mode 100644 index 0000000000000..5e4be5f396e22 --- /dev/null +++ b/doc-experiment/results/round-41/T09-mark-keyword/trial-1/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-41/T09-mark-keyword/trial-1/candidate.php", + "function": "mark_keyword", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple-unclosed", + "status": "pass", + "expected": "

hello world

", + "actual": "

hello world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-text-nodes", + "status": "pass", + "expected": "

alpha beta

beta gamma

delta

", + "actual": "

alpha beta

beta gamma

delta

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-attribute-not-wrapped", + "status": "pass", + "expected": "somewhere world", + "actual": "somewhere world", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-encoded-keyword-matches", + "status": "pass", + "expected": "

world peace

", + "actual": "

world peace

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "split-across-elements-no-match", + "status": "pass", + "expected": "

world

", + "actual": "

world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-comment-not-wrapped", + "status": "pass", + "expected": "

world

", + "actual": "

world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive", + "status": "pass", + "expected": "

World world

", + "actual": "

World world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "normalization-side-effects", + "status": "pass", + "expected": "
bold world

unclosed & markup

", + "actual": "
bold world

unclosed & markup

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-41/T09-mark-keyword/trial-1/response.json b/doc-experiment/results/round-41/T09-mark-keyword/trial-1/response.json new file mode 100644 index 0000000000000..f99be0d7718bf --- /dev/null +++ b/doc-experiment/results/round-41/T09-mark-keyword/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, walks every token with `next_token()`, and rewrites the normalized output with `serialize_token()`. Only ordinary `#text` tokens are considered, and their decoded text is read with `get_modifiable_text()` so character references match correctly; when a token contains the keyword, its normalized serialized text is wrapped in ``.", + "confidence": 72 +} diff --git a/doc-experiment/results/round-41/T09-mark-keyword/trial-2/candidate.php b/doc-experiment/results/round-41/T09-mark-keyword/trial-2/candidate.php new file mode 100644 index 0000000000000..181596fcd385a --- /dev/null +++ b/doc-experiment/results/round-41/T09-mark-keyword/trial-2/candidate.php @@ -0,0 +1,30 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $serialized_text = $processor->serialize_token(); + + if ( str_contains( $processor->get_modifiable_text(), $keyword ) ) { + $output .= '' . $serialized_text . ''; + continue; + } + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + return $output; +} diff --git a/doc-experiment/results/round-41/T09-mark-keyword/trial-2/execution.json b/doc-experiment/results/round-41/T09-mark-keyword/trial-2/execution.json new file mode 100644 index 0000000000000..8a6867ba5a41c --- /dev/null +++ b/doc-experiment/results/round-41/T09-mark-keyword/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-41/T09-mark-keyword/trial-2/candidate.php", + "function": "mark_keyword", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple-unclosed", + "status": "pass", + "expected": "

hello world

", + "actual": "

hello world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-text-nodes", + "status": "pass", + "expected": "

alpha beta

beta gamma

delta

", + "actual": "

alpha beta

beta gamma

delta

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-attribute-not-wrapped", + "status": "pass", + "expected": "somewhere world", + "actual": "somewhere world", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-encoded-keyword-matches", + "status": "pass", + "expected": "

world peace

", + "actual": "

world peace

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "split-across-elements-no-match", + "status": "pass", + "expected": "

world

", + "actual": "

world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-comment-not-wrapped", + "status": "pass", + "expected": "

world

", + "actual": "

world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive", + "status": "pass", + "expected": "

World world

", + "actual": "

World world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "normalization-side-effects", + "status": "pass", + "expected": "
bold world

unclosed & markup

", + "actual": "
bold world

unclosed & markup

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-41/T09-mark-keyword/trial-2/response.json b/doc-experiment/results/round-41/T09-mark-keyword/trial-2/response.json new file mode 100644 index 0000000000000..970cd8d84f26c --- /dev/null +++ b/doc-experiment/results/round-41/T09-mark-keyword/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, then walks every token with `next_token()`. For ordinary `#text` tokens only, it checks the decoded text via `get_modifiable_text()` and wraps the token\u2019s normalized serialization from `serialize_token()` in `` when the case-sensitive keyword substring is present; all other tokens, including special text-bearing elements like `script`, `style`, `title`, and `textarea`, are emitted unchanged through `serialize_token()`.", + "confidence": 89 +} diff --git a/doc-experiment/results/round-41/T09-mark-keyword/trial-3/candidate.php b/doc-experiment/results/round-41/T09-mark-keyword/trial-3/candidate.php new file mode 100644 index 0000000000000..a7c6b047b6438 --- /dev/null +++ b/doc-experiment/results/round-41/T09-mark-keyword/trial-3/candidate.php @@ -0,0 +1,29 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text = $processor->get_modifiable_text(); + + if ( '' !== $text && false !== strpos( $text, $keyword ) ) { + $output .= '' . $processor->serialize_token() . ''; + continue; + } + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + return $output; +} diff --git a/doc-experiment/results/round-41/T09-mark-keyword/trial-3/execution.json b/doc-experiment/results/round-41/T09-mark-keyword/trial-3/execution.json new file mode 100644 index 0000000000000..3854b8d1a38c5 --- /dev/null +++ b/doc-experiment/results/round-41/T09-mark-keyword/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-41/T09-mark-keyword/trial-3/candidate.php", + "function": "mark_keyword", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple-unclosed", + "status": "pass", + "expected": "

hello world

", + "actual": "

hello world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-text-nodes", + "status": "pass", + "expected": "

alpha beta

beta gamma

delta

", + "actual": "

alpha beta

beta gamma

delta

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-attribute-not-wrapped", + "status": "pass", + "expected": "somewhere world", + "actual": "somewhere world", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-encoded-keyword-matches", + "status": "pass", + "expected": "

world peace

", + "actual": "

world peace

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "split-across-elements-no-match", + "status": "pass", + "expected": "

world

", + "actual": "

world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-comment-not-wrapped", + "status": "pass", + "expected": "

world

", + "actual": "

world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive", + "status": "pass", + "expected": "

World world

", + "actual": "

World world

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "normalization-side-effects", + "status": "pass", + "expected": "
bold world

unclosed & markup

", + "actual": "
bold world

unclosed & markup

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-41/T09-mark-keyword/trial-3/response.json b/doc-experiment/results/round-41/T09-mark-keyword/trial-3/response.json new file mode 100644 index 0000000000000..373a01286543e --- /dev/null +++ b/doc-experiment/results/round-41/T09-mark-keyword/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, then walks every token with `next_token()`. Only ordinary `#text` tokens are considered for wrapping; their decoded text is read with `get_modifiable_text()`, matched with `strpos()`, and emitted as `` plus the token\u2019s normalized serialization from `serialize_token()`. All other tokens are copied with `serialize_token()`, which preserves the processor\u2019s normalized HTML output while leaving special text-bearing elements like `script`, `style`, `title`, and `textarea` unwrapped.", + "confidence": 86 +} diff --git a/doc-experiment/results/round-41/T12-unwrap-spans/judge.json b/doc-experiment/results/round-41/T12-unwrap-spans/judge.json new file mode 100644 index 0000000000000..f5ac5f2047a3e --- /dev/null +++ b/doc-experiment/results/round-41/T12-unwrap-spans/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Passed 7/7 hidden cases. Correctly used WP_HTML_Processor::create_fragment() for a BODY fragment, walked all tokens with next_token(), skipped SPAN openers and closers via get_tag(), and accumulated normalized output with serialize_token(). get_last_error() is documented and the empty-string fallback is a documented caller policy." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Passed 7/7 hidden cases. Same high-adherence implementation: documented processor choice, documented token-walking rewrite pattern, documented serialize_token() output path, and documented get_last_error() check. No _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Passed 7/7 hidden cases. Uses only documented methods from the rendered HTML Processor docs and follows the serialize_token() remove-wrapper pattern idiomatically. Handles unclosed span content through the processor's virtual closer behavior." + } + ], + "failure_analysis": "No hidden cases failed in any trial. The rendered docs did especially well in the serialize_token() section: it explains that walking every token and concatenating serialize_token() reconstructs normalized serialization, that skipped elements' closing tokens must also be skipped, and gives a general remove-element-but-keep-contents example. The next_token() docs also explain that the HTML Processor visits closing tokens for implicit and end-of-input closes, which directly supports the unclosed-span case. Near miss: all trials added a final get_last_error() empty-string fallback. That is documented as an allowed caller policy, but the docs leave the policy choice broad enough that models may cargo-cult empty string for every string-returning normalizer, even when a caller would prefer throwing, null, or explicit propagation.", + "doc_gaps": [ + { + "location": "html-processor.md, serialize_token(), 'Choose error policy explicitly'", + "problem": "The section lists possible unsupported-markup fallbacks but does not strongly tie the choice to the caller's contract, so models may treat empty string as the standard post-loop response.", + "suggestion": "Clarify that get_last_error() indicates partial output after parser abort, and that empty string is only one caller-defined sentinel, not a default recommendation." + }, + { + "location": "html-processor.md, get_tag()", + "problem": "The HTML Processor get_tag() section's example uses WP_HTML_Tag_Processor and does not locally show behavior on closing tokens, even though token-rewrite code often relies on the same tag name for opener and closer.", + "suggestion": "Add an HTML Processor next_token() example showing get_tag() returning the uppercase element name on both opener and closer and null on non-tag tokens." + }, + { + "location": "html-processor.md, create_fragment()/next_token()/serialize_token() incomplete-input notes", + "problem": "The distinction between an unclosed element at EOF, a trailing incomplete syntax token, and unsupported markup is spread across sections. These cases require different caller decisions.", + "suggestion": "Add a compact comparison table with examples and outcomes for next_token(), serialize_token(), paused_at_incomplete_token(), and get_last_error()." + } + ] +} diff --git a/doc-experiment/results/round-41/T12-unwrap-spans/trial-1/candidate.php b/doc-experiment/results/round-41/T12-unwrap-spans/trial-1/candidate.php new file mode 100644 index 0000000000000..f7a01467eea5a --- /dev/null +++ b/doc-experiment/results/round-41/T12-unwrap-spans/trial-1/candidate.php @@ -0,0 +1,24 @@ +next_token() ) { + if ( 'SPAN' === $processor->get_tag() ) { + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + return $output; +} diff --git a/doc-experiment/results/round-41/T12-unwrap-spans/trial-1/execution.json b/doc-experiment/results/round-41/T12-unwrap-spans/trial-1/execution.json new file mode 100644 index 0000000000000..3ea2b49b1f85a --- /dev/null +++ b/doc-experiment/results/round-41/T12-unwrap-spans/trial-1/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-41/T12-unwrap-spans/trial-1/candidate.php", + "function": "unwrap_spans", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

a b c d

", + "actual": "

a b c d

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-spans", + "status": "pass", + "expected": "

outer inner tail

", + "actual": "

outer inner tail

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-spans-normalized-passthrough", + "status": "pass", + "expected": "

plain & simple

", + "actual": "

plain & simple

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attributes-discarded", + "status": "pass", + "expected": "styled", + "actual": "styled", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "adjacent-spans", + "status": "pass", + "expected": "

ab

", + "actual": "

ab

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "span-with-block-content", + "status": "pass", + "expected": "
before after
", + "actual": "
before after
", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-span", + "status": "pass", + "expected": "

runs to end

", + "actual": "

runs to end

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-41/T12-unwrap-spans/trial-1/response.json b/doc-experiment/results/round-41/T12-unwrap-spans/trial-1/response.json new file mode 100644 index 0000000000000..cf2f0c96b0f15 --- /dev/null +++ b/doc-experiment/results/round-41/T12-unwrap-spans/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-context fragment, walks every token with `next_token()`, skips any token whose tag is `SPAN` so both span openers and closers are removed, and concatenates `serialize_token()` for everything else. That produces the normalized serialization directly, including proper closing of optional tags, canonical quoting, and normalized text encoding.", + "confidence": 86 +} diff --git a/doc-experiment/results/round-41/T12-unwrap-spans/trial-2/candidate.php b/doc-experiment/results/round-41/T12-unwrap-spans/trial-2/candidate.php new file mode 100644 index 0000000000000..f7a01467eea5a --- /dev/null +++ b/doc-experiment/results/round-41/T12-unwrap-spans/trial-2/candidate.php @@ -0,0 +1,24 @@ +next_token() ) { + if ( 'SPAN' === $processor->get_tag() ) { + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + return $output; +} diff --git a/doc-experiment/results/round-41/T12-unwrap-spans/trial-2/execution.json b/doc-experiment/results/round-41/T12-unwrap-spans/trial-2/execution.json new file mode 100644 index 0000000000000..091651abce913 --- /dev/null +++ b/doc-experiment/results/round-41/T12-unwrap-spans/trial-2/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-41/T12-unwrap-spans/trial-2/candidate.php", + "function": "unwrap_spans", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

a b c d

", + "actual": "

a b c d

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-spans", + "status": "pass", + "expected": "

outer inner tail

", + "actual": "

outer inner tail

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-spans-normalized-passthrough", + "status": "pass", + "expected": "

plain & simple

", + "actual": "

plain & simple

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attributes-discarded", + "status": "pass", + "expected": "styled", + "actual": "styled", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "adjacent-spans", + "status": "pass", + "expected": "

ab

", + "actual": "

ab

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "span-with-block-content", + "status": "pass", + "expected": "
before after
", + "actual": "
before after
", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-span", + "status": "pass", + "expected": "

runs to end

", + "actual": "

runs to end

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-41/T12-unwrap-spans/trial-2/response.json b/doc-experiment/results/round-41/T12-unwrap-spans/trial-2/response.json new file mode 100644 index 0000000000000..8b178e871bf6b --- /dev/null +++ b/doc-experiment/results/round-41/T12-unwrap-spans/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment, then walks every token with `next_token()`, skips any token whose tag is `SPAN` (which removes both opening and closing span wrappers), and appends `serialize_token()` for everything else so the result is the normalized HTML serialization. It also checks `get_last_error()` to avoid returning partial output after an unsupported-markup abort.", + "confidence": 88 +} diff --git a/doc-experiment/results/round-41/T12-unwrap-spans/trial-3/candidate.php b/doc-experiment/results/round-41/T12-unwrap-spans/trial-3/candidate.php new file mode 100644 index 0000000000000..a181d2e93f569 --- /dev/null +++ b/doc-experiment/results/round-41/T12-unwrap-spans/trial-3/candidate.php @@ -0,0 +1,24 @@ +next_token() ) { + if ( 'SPAN' === $processor->get_tag() ) { + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + return $output; +} diff --git a/doc-experiment/results/round-41/T12-unwrap-spans/trial-3/execution.json b/doc-experiment/results/round-41/T12-unwrap-spans/trial-3/execution.json new file mode 100644 index 0000000000000..95534ead50b6c --- /dev/null +++ b/doc-experiment/results/round-41/T12-unwrap-spans/trial-3/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-41/T12-unwrap-spans/trial-3/candidate.php", + "function": "unwrap_spans", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

a b c d

", + "actual": "

a b c d

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-spans", + "status": "pass", + "expected": "

outer inner tail

", + "actual": "

outer inner tail

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-spans-normalized-passthrough", + "status": "pass", + "expected": "

plain & simple

", + "actual": "

plain & simple

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attributes-discarded", + "status": "pass", + "expected": "styled", + "actual": "styled", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "adjacent-spans", + "status": "pass", + "expected": "

ab

", + "actual": "

ab

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "span-with-block-content", + "status": "pass", + "expected": "
before after
", + "actual": "
before after
", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-span", + "status": "pass", + "expected": "

runs to end

", + "actual": "

runs to end

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-41/T12-unwrap-spans/trial-3/response.json b/doc-experiment/results/round-41/T12-unwrap-spans/trial-3/response.json new file mode 100644 index 0000000000000..f94c73c8d3fa5 --- /dev/null +++ b/doc-experiment/results/round-41/T12-unwrap-spans/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body-fragment input, then walks every token with `next_token()`. For each token, it skips any `SPAN` opener or closer by checking `get_tag()`, and appends all other tokens with `serialize_token()`, which yields normalized HTML serialization. After the walk, it checks `get_last_error()` to avoid returning partial output from unsupported markup.", + "confidence": 88 +} diff --git a/doc-experiment/results/round-41/VARIANT.md b/doc-experiment/results/round-41/VARIANT.md new file mode 100644 index 0000000000000..55c910fc346a4 --- /dev/null +++ b/doc-experiment/results/round-41/VARIANT.md @@ -0,0 +1,33 @@ +# Round 41 Scratch Variant + +Variant name: `html-processor-serialization-fallback-policy-card` + +Control round: `round-40` + +Edited rendered file: `/tmp/html-api-docs-eval/round-41/html-processor.md` + +Source docblocks were not edited. This is a scratch-only rendered-doc A/B +variant. The staged `html-processor.md` SHA-256 recorded in +`round-metadata.json` is: + +```text +4aba1668246294ef9130b083b13360c9a12f7a6cfe54276b2bf9fe2e9470a76c +``` + +Changed rendered documentation in three places: + +- `WP_HTML_Processor::create_fragment()` now says `null` means no processor + was created, while a non-null processor can still later abort and should be + checked with `get_last_error()` after the relevant scan. +- `WP_HTML_Processor::normalize()` now says it normalizes the original + fragment and is not a way to finish a token-by-token rewrite; normalizing + the original input discards emitted rewrite changes. +- `WP_HTML_Processor::serialize_token()` now has an explicit fallback-policy + card: accumulated output is the rewrite, `serialize()` after scanning + returns `null`, raw original input is not normalized output, non-null + `get_last_error()` is unsupported parser abort, and + `paused_at_incomplete_token()` is a separate complete-input policy check. + +Purpose: test whether method-local fallback guidance improves transfer in +normalized-output tasks where subjects previously improvised raw-input or +`normalize( $html )` fallbacks after token-by-token rewriting. diff --git a/doc-experiment/results/round-41/codex-judges-output.json b/doc-experiment/results/round-41/codex-judges-output.json new file mode 100644 index 0000000000000..c962d15f0eb56 --- /dev/null +++ b/doc-experiment/results/round-41/codex-judges-output.json @@ -0,0 +1,133 @@ +{ + "result": [ + { + "id": "T09-mark-keyword", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Processor::create_fragment(), next_token(), get_token_type(), get_modifiable_text(), and serialize_token() for a BODY-fragment token rewrite. All API calls are documented and execution recorded no _doing_it_wrong entries. Minor adherence issue: after a rewrite loop it falls back to WP_HTML_Processor::normalize($html) when get_last_error() is non-null, which the serialize_token()/normalize docs warn can discard emitted rewrite changes." + }, + { + "trial_id": "trial-2", + "adherence": 99, + "hallucinated_methods": [], + "notes": "Correct processor and documented token-walking pattern. It restricts matching to #text tokens, uses decoded get_modifiable_text(), emits normalized tokens with serialize_token(), and returns an explicit empty-string sentinel on parser error, which the docs allow. Minor inefficiency: serialize_token() is called before knowing whether a #text token matches and may be called again for nonmatching text, but this is not API misuse." + }, + { + "trial_id": "trial-3", + "adherence": 99, + "hallucinated_methods": [], + "notes": "Correct processor and documented token-walking pattern. It checks only #text tokens, reads decoded modifiable text, wraps the current token serialization, and returns an explicit empty-string sentinel on parser error. The extra empty-text guard is redundant because the task says keyword is non-empty, but it does not change the API usage." + } + ], + "failure_analysis": "No hidden case failed in any trial: all three passed 8/8 with no _doing_it_wrong records. The docs appear to have successfully led subjects to the key contracts: the 'Which processor should I use?' guidance points normalized output and implied/missing closing tags to WP_HTML_Processor; the 'Recipe: collect DOM-style text from a subtree' passage says ordinary text is #text only and warns not to treat every token with modifiable text as DOM text; get_modifiable_text() states that #text is decoded while SCRIPT/STYLE/comment text may be raw or non-DOM; and serialize_token() explains the exact token-by-token rewrite pattern. The only near-miss was trial-1's normalize($html) fallback after a rewrite loop, despite the serialize_token()/normalize warnings that normalizing the original fragment is not a way to finish a rewrite. Trials 2 and 3 followed the documented error-policy options more closely by returning a caller-defined empty-string sentinel on get_last_error().", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::serialize_token() and WP_HTML_Processor::normalize()", + "problem": "The docs warn not to call normalize($html) after accumulating a token rewrite, but trial-1 still used it as an error fallback.", + "suggestion": "Add a small anti-pattern/corrected-pattern pair: after a serialize_token() rewrite loop, return the accumulated output on success; on get_last_error(), return the caller's sentinel/null/exception, not normalize($html)." + }, + { + "location": "WP_HTML_Processor::get_modifiable_text() and 'Recipe: collect DOM-style text from a subtree'", + "problem": "The distinction between ordinary DOM #text and broader modifiable text is central and easy to miss because it is described in prose across multiple sections.", + "suggestion": "Add a compact table mapping token type/name to whether get_modifiable_text() is decoded or raw and whether it represents ordinary DOM text. Cross-link it from next_token()." + }, + { + "location": "WP_HTML_Tag_Processor::paused_at_incomplete_token() and token-rewrite recipes", + "problem": "Incomplete trailing syntax behavior is documented, but the strict-vs-best-effort policy after token serialization is not shown as a reusable branch.", + "suggestion": "Add a short post-loop snippet showing both policies: accept accumulated serialization for best-effort normalization, or reject when paused_at_incomplete_token() or get_last_error() is set for complete-source rewrites." + } + ] + } + }, + { + "id": "T12-unwrap-spans", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Passed 7/7 hidden cases. Correctly used WP_HTML_Processor::create_fragment() for a BODY fragment, walked all tokens with next_token(), skipped SPAN openers and closers via get_tag(), and accumulated normalized output with serialize_token(). get_last_error() is documented and the empty-string fallback is a documented caller policy." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Passed 7/7 hidden cases. Same high-adherence implementation: documented processor choice, documented token-walking rewrite pattern, documented serialize_token() output path, and documented get_last_error() check. No _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Passed 7/7 hidden cases. Uses only documented methods from the rendered HTML Processor docs and follows the serialize_token() remove-wrapper pattern idiomatically. Handles unclosed span content through the processor's virtual closer behavior." + } + ], + "failure_analysis": "No hidden cases failed in any trial. The rendered docs did especially well in the serialize_token() section: it explains that walking every token and concatenating serialize_token() reconstructs normalized serialization, that skipped elements' closing tokens must also be skipped, and gives a general remove-element-but-keep-contents example. The next_token() docs also explain that the HTML Processor visits closing tokens for implicit and end-of-input closes, which directly supports the unclosed-span case. Near miss: all trials added a final get_last_error() empty-string fallback. That is documented as an allowed caller policy, but the docs leave the policy choice broad enough that models may cargo-cult empty string for every string-returning normalizer, even when a caller would prefer throwing, null, or explicit propagation.", + "doc_gaps": [ + { + "location": "html-processor.md, serialize_token(), 'Choose error policy explicitly'", + "problem": "The section lists possible unsupported-markup fallbacks but does not strongly tie the choice to the caller's contract, so models may treat empty string as the standard post-loop response.", + "suggestion": "Clarify that get_last_error() indicates partial output after parser abort, and that empty string is only one caller-defined sentinel, not a default recommendation." + }, + { + "location": "html-processor.md, get_tag()", + "problem": "The HTML Processor get_tag() section's example uses WP_HTML_Tag_Processor and does not locally show behavior on closing tokens, even though token-rewrite code often relies on the same tag name for opener and closer.", + "suggestion": "Add an HTML Processor next_token() example showing get_tag() returning the uppercase element name on both opener and closer and null on non-tag tokens." + }, + { + "location": "html-processor.md, create_fragment()/next_token()/serialize_token() incomplete-input notes", + "problem": "The distinction between an unclosed element at EOF, a trailing incomplete syntax token, and unsupported markup is spread across sections. These cases require different caller decisions.", + "suggestion": "Add a compact comparison table with examples and outcomes for next_token(), serialize_token(), paused_at_incomplete_token(), and get_last_error()." + } + ] + } + }, + { + "id": "N04-normalize-or-placeholder", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly chose `WP_HTML_Processor` and the documented static `normalize()` API for BODY-fragment normalization. The strict `null` check preserves valid empty output. No undocumented calls or `_doing_it_wrong` records; unsupported-case warnings came from the API's internal serialization path, not candidate misuse." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same fully adherent implementation as trial-1. Uses the documented `WP_HTML_Processor::normalize(string): string|null` contract directly and handles `null` separately from `''`." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same fully adherent implementation as trial-1. Correct processor choice, no hallucinated methods, and idiomatic use of the documented whole-fragment normalization shortcut." + } + ], + "failure_analysis": "All trials passed all 7 hidden cases, so there were no failed hidden cases to attribute to misconceptions. The docs did well in three places: the HTML Processor overview says to choose it for normalized output; the unsupported-markup section says output-producing methods such as `serialize()` and `normalize()` return `null`; and the `normalize()` method section documents BODY-fragment context, normalization effects such as quoted attributes and omitted tags, incomplete trailing syntax omission, and the `string|null` return. The main near-miss is that the successful path depends on readers finding the `normalize()` method rather than over-applying the general create/find/change workflow. Another near-miss is the distinction between `null` failure and valid empty-string output: the candidates handled it correctly, but the docs rely on the return type rather than an explicit example.", + "doc_gaps": [ + { + "location": "`WP_HTML_Processor::normalize()` docblock", + "problem": "The `string|null` return contract is documented, but there is no explicit example showing failure handling or distinguishing `null` from valid empty-string normalized output.", + "suggestion": "Add a short general example that checks `null === WP_HTML_Processor::normalize( $html )` before using the result, and state that an empty input fragment normalizes to `''`, not `null`." + }, + { + "location": "`WP_HTML_Processor` Usage overview", + "problem": "The overview's three-step workflow emphasizes creating a processor, finding a token, and requesting changes, which can obscure the simpler whole-fragment normalization API.", + "suggestion": "Add one sentence near Usage saying that callers who only need a normalized BODY-fragment serialization can call `WP_HTML_Processor::normalize()` directly." + }, + { + "location": "`WP_HTML_Processor::normalize()` / unsupported-markup docs", + "problem": "The docs say unsupported markup causes output methods to return `null`, but they do not make the boundary between unsupported markup and incomplete trailing syntax especially prominent.", + "suggestion": "Clarify that unsupported parser errors return `null`, while incomplete trailing syntax may still produce a normalized string with the incomplete token omitted." + } + ] + } + } + ] +} diff --git a/doc-experiment/results/round-41/codex-trials-output.json b/doc-experiment/results/round-41/codex-trials-output.json new file mode 100644 index 0000000000000..05ce6e48f612c --- /dev/null +++ b/doc-experiment/results/round-41/codex-trials-output.json @@ -0,0 +1,95 @@ +{ + "subject_isolation": { + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-41/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." + }, + "result": [ + { + "id": "T09-mark-keyword", + "trial": 1, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text = $processor->get_modifiable_text();\n if ( str_contains( $text, $keyword ) ) {\n $output .= '' . $processor->serialize_token() . '';\n continue;\n }\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n $normalized = WP_HTML_Processor::normalize( $html );\n return null === $normalized ? '' : $normalized;\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, walks every token with `next_token()`, and rewrites the normalized output with `serialize_token()`. Only ordinary `#text` tokens are considered, and their decoded text is read with `get_modifiable_text()` so character references match correctly; when a token contains the keyword, its normalized serialized text is wrapped in ``.", + "confidence": 72 + }, + { + "id": "T09-mark-keyword", + "trial": 2, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $serialized_text = $processor->serialize_token();\n\n if ( str_contains( $processor->get_modifiable_text(), $keyword ) ) {\n $output .= '' . $serialized_text . '';\n continue;\n }\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return '';\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, then walks every token with `next_token()`. For ordinary `#text` tokens only, it checks the decoded text via `get_modifiable_text()` and wraps the token’s normalized serialization from `serialize_token()` in `` when the case-sensitive keyword substring is present; all other tokens, including special text-bearing elements like `script`, `style`, `title`, and `textarea`, are emitted unchanged through `serialize_token()`.", + "confidence": 89 + }, + { + "id": "T09-mark-keyword", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text = $processor->get_modifiable_text();\n\n if ( '' !== $text && false !== strpos( $text, $keyword ) ) {\n $output .= '' . $processor->serialize_token() . '';\n continue;\n }\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return '';\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, then walks every token with `next_token()`. Only ordinary `#text` tokens are considered for wrapping; their decoded text is read with `get_modifiable_text()`, matched with `strpos()`, and emitted as `` plus the token’s normalized serialization from `serialize_token()`. All other tokens are copied with `serialize_token()`, which preserves the processor’s normalized HTML output while leaving special text-bearing elements like `script`, `style`, `title`, and `textarea` unwrapped.", + "confidence": 86 + }, + { + "id": "T12-unwrap-spans", + "trial": 1, + "ok": true, + "code": "next_token() ) {\n if ( 'SPAN' === $processor->get_tag() ) {\n continue;\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return '';\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-context fragment, walks every token with `next_token()`, skips any token whose tag is `SPAN` so both span openers and closers are removed, and concatenates `serialize_token()` for everything else. That produces the normalized serialization directly, including proper closing of optional tags, canonical quoting, and normalized text encoding.", + "confidence": 86 + }, + { + "id": "T12-unwrap-spans", + "trial": 2, + "ok": true, + "code": "next_token() ) {\n if ( 'SPAN' === $processor->get_tag() ) {\n continue;\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return '';\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment, then walks every token with `next_token()`, skips any token whose tag is `SPAN` (which removes both opening and closing span wrappers), and appends `serialize_token()` for everything else so the result is the normalized HTML serialization. It also checks `get_last_error()` to avoid returning partial output after an unsupported-markup abort.", + "confidence": 88 + }, + { + "id": "T12-unwrap-spans", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n if ( 'SPAN' === $processor->get_tag() ) {\n continue;\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return '';\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body-fragment input, then walks every token with `next_token()`. For each token, it skips any `SPAN` opener or closer by checking `get_tag()`, and appends all other tokens with `serialize_token()`, which yields normalized HTML serialization. After the walk, it checks `get_last_error()` to avoid returning partial output from unsupported markup.", + "confidence": 88 + }, + { + "id": "N04-normalize-or-placeholder", + "trial": 1, + "ok": true, + "code": "Unsupported HTML

';\n }\n\n return $normalized;\n}\n", + "explanation": "This calls the documented `WP_HTML_Processor::normalize()` helper, which normalizes a BODY-context HTML fragment and returns `null` when the HTML Processor cannot normalize unsupported input; in that failure case, the function returns the exact required fallback placeholder HTML.", + "confidence": 98 + }, + { + "id": "N04-normalize-or-placeholder", + "trial": 2, + "ok": true, + "code": "Unsupported HTML

';\n }\n\n return $normalized;\n}\n", + "explanation": "This uses the documented `WP_HTML_Processor::normalize()` static method, which normalizes a BODY-context HTML fragment and returns `null` when the HTML Processor cannot produce normalized output. The function therefore returns the normalized serialization on success and the required `

Unsupported HTML

` fallback when normalization is unavailable.", + "confidence": 98 + }, + { + "id": "N04-normalize-or-placeholder", + "trial": 3, + "ok": true, + "code": "Unsupported HTML

';\n }\n\n return $normalized;\n}\n", + "explanation": "This uses the documented `WP_HTML_Processor::normalize()` API, which normalizes an HTML fragment in BODY context and returns `null` when the HTML Processor cannot normalize unsupported markup. The function therefore returns the normalized serialization on success and the exact fallback placeholder when normalization fails.", + "confidence": 98 + } + ] +} diff --git a/doc-experiment/results/round-41/round-metadata.json b/doc-experiment/results/round-41/round-metadata.json new file mode 100644 index 0000000000000..feade99aa73d0 --- /dev/null +++ b/doc-experiment/results/round-41/round-metadata.json @@ -0,0 +1,133 @@ +{ + "round": "round-41", + "mode": "shadow-doc-a/b", + "task_ids": [ + "T09-mark-keyword", + "T12-unwrap-spans", + "N04-normalize-or-placeholder" + ], + "task_count": 3, + "splits": { + "train": 3 + }, + "concepts": { + "normalization": 1, + "serialization": 2 + }, + "trials_per_task": 3, + "subject": { + "model": "gpt-5.4", + "reasoning_effort": "medium", + "service_tier": "priority" + }, + "judge": { + "model": "gpt-5.5", + "reasoning_effort": "xhigh", + "service_tier": "priority" + }, + "git_head": "95739cdec1a49c4d597caa10d028dedde3251ae8", + "git_status_short": "?? doc-experiment/results/round-40/", + "source_file_digests": { + "ref": "working-tree", + "algorithm": "sha256", + "php_behavior_fingerprint": "token_get_all with T_COMMENT, T_DOC_COMMENT, and T_WHITESPACE removed; JSON-encoded token names and text", + "files": { + "src/wp-includes/html-api/class-wp-html-tag-processor.php": { + "source_sha256": "9e431d345cc6d5563a65e3456b8a944fe1def8ceb8db299e89f20e8e9f560058", + "php_without_comments_sha256": "d073f5772a64826878e4132ed9bb30717fb1b6f3ddb5e23a014a4fa9db218ad7", + "php_without_comments_token_count": 9881 + }, + "src/wp-includes/html-api/class-wp-html-processor.php": { + "source_sha256": "b115e956af65f69b4e07c7e761ccc9a49464ba3caf1f66944ed8eb3794dce472", + "php_without_comments_sha256": "2da9aa482f295c89a7018c5b74b622b4add7e942599c97785434408cfa72c083", + "php_without_comments_token_count": 16806 + } + } + }, + "corpus_file_digests": { + "ref": "working-tree", + "algorithm": "sha256", + "tasks": { + "T09-mark-keyword": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "serialization", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T09-mark-keyword/task.md": "5abba63d508cb56284ce7a02e374d35e4cdd45958d664174eb3002c13f1c3bce", + "doc-experiment/corpus/T09-mark-keyword/reference.php": "0141ececbe459f50729e657f59dd75e8b74911cd1d70940b8fd5072d2f1efb60", + "doc-experiment/corpus/T09-mark-keyword/tests.json": "946bb96c922fea63cda3c31b94840b05ffded9f2e1dfa4b1547c755b99cda0c5" + } + }, + "T12-unwrap-spans": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "serialization", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T12-unwrap-spans/task.md": "e95fcebf3ce7e2fd9004c4ff82b66a77d712b81b1317f3d020c4c89ace25eb6b", + "doc-experiment/corpus/T12-unwrap-spans/reference.php": "93d6828161817ec9a0b90765d91e7dae1c48f3b3aac35f6cad6f785b1c715797", + "doc-experiment/corpus/T12-unwrap-spans/tests.json": "fcade4b9f46bbb824b0095f58b99d4c71c8b3e696093ab0d0ae25d666aae5f53" + } + }, + "N04-normalize-or-placeholder": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "normalization", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/N04-normalize-or-placeholder/task.md": "cef7cd04d639b6b7b435be2d33f3435b8b76f0a4d83748aa7c232204746ad3d0", + "doc-experiment/corpus/N04-normalize-or-placeholder/reference.php": "e47feb6d3be887e1ee5df77e39160ffe812cb8322ca8600898de2a34e56ddeed", + "doc-experiment/corpus/N04-normalize-or-placeholder/tests.json": "417d2f770b341fc46bb8b7b46df899f1fd1e65d343aa8cc6c4e39307afcb4d18" + } + } + } + }, + "created_at_utc": "2026-06-13T15:07:16+00:00", + "isolation": { + "scratch_contains": [ + "html-tag-processor.md", + "html-processor.md", + "tasks/.md" + ], + "subjects_must_not_read": [ + "reference.php", + "tests.json", + "source files", + "logs", + "plans", + "hypothesis docs" + ] + }, + "scratch": "/tmp/html-api-docs-eval/round-41", + "staged_task_files": [ + "tasks/T09-mark-keyword.md", + "tasks/T12-unwrap-spans.md", + "tasks/N04-normalize-or-placeholder.md" + ], + "scratch_isolation_check": "OK: /tmp/html-api-docs-eval/round-41 exposes 2 docs and 3 task prompt(s), with no forbidden files.", + "scratch_file_sha256": { + "html-processor.md": "4aba1668246294ef9130b083b13360c9a12f7a6cfe54276b2bf9fe2e9470a76c", + "html-tag-processor.md": "8ca4ec32aa54723fe16a0a9f45386ca6ebeecde7a06575d8a782e53fb395a664", + "tasks/N04-normalize-or-placeholder.md": "cef7cd04d639b6b7b435be2d33f3435b8b76f0a4d83748aa7c232204746ad3d0", + "tasks/T09-mark-keyword.md": "5abba63d508cb56284ce7a02e374d35e4cdd45958d664174eb3002c13f1c3bce", + "tasks/T12-unwrap-spans.md": "e95fcebf3ce7e2fd9004c4ff82b66a77d712b81b1317f3d020c4c89ace25eb6b" + }, + "shadow_doc_variant": { + "name": "html-processor-serialization-fallback-policy-card", + "control_round": "round-40", + "edited_files": [ + "html-processor.md" + ], + "notes": "Scratch-only rendered-doc variant. Adds method-local fallback policy guidance around create_fragment(), normalize(), and serialize_token(): construction failure is separate from later parser abort, accumulated serialize_token output is the rewrite, normalize($html) discards emitted changes, raw input is not normalized output, and paused_at_incomplete_token() is a complete-input policy check. Source docblocks are unchanged." + } +} diff --git a/doc-experiment/results/round-41/round-summary.json b/doc-experiment/results/round-41/round-summary.json new file mode 100644 index 0000000000000..1b2964d3c2ef1 --- /dev/null +++ b/doc-experiment/results/round-41/round-summary.json @@ -0,0 +1,154 @@ +{ + "round_score": 99.83, + "core_score": 99.83, + "by_split": { + "train": 99.83 + }, + "by_concept": { + "normalization": 100.0, + "serialization": 99.75 + }, + "tasks": { + "T09-mark-keyword": { + "score": 99.5, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 97, + "score": 99.1 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 99, + "score": 99.7 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 99, + "score": 99.7 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "serialization", + "processor": "html", + "split": "train" + } + }, + "T12-unwrap-spans": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "serialization", + "processor": "html", + "split": "train" + } + }, + "N04-normalize-or-placeholder": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "normalization", + "processor": "html", + "split": "train" + } + } + }, + "round_metadata": { + "round": "round-41", + "mode": "shadow-doc-a/b", + "task_ids": [ + "T09-mark-keyword", + "T12-unwrap-spans", + "N04-normalize-or-placeholder" + ], + "task_count": 3, + "trials_per_task": 3, + "subject": { + "model": "gpt-5.4", + "reasoning_effort": "medium", + "service_tier": "priority" + }, + "judge": { + "model": "gpt-5.5", + "reasoning_effort": "xhigh", + "service_tier": "priority" + }, + "git_head": "95739cdec1a49c4d597caa10d028dedde3251ae8", + "git_status_short": "?? doc-experiment/results/round-40/" + }, + "subject_isolation": { + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-41/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." + } +} diff --git a/doc-experiment/results/round-41/subject-isolation.json b/doc-experiment/results/round-41/subject-isolation.json new file mode 100644 index 0000000000000..a7a3d8fb03e85 --- /dev/null +++ b/doc-experiment/results/round-41/subject-isolation.json @@ -0,0 +1,19 @@ +{ + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-41/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." +} diff --git a/doc-experiment/results/round-42/H04-remove-empty-paragraphs/judge.json b/doc-experiment/results/round-42/H04-remove-empty-paragraphs/judge.json new file mode 100644 index 0000000000000..2a65b1db0d1f9 --- /dev/null +++ b/doc-experiment/results/round-42/H04-remove-empty-paragraphs/judge.json @@ -0,0 +1,45 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 88, + "hallucinated_methods": [], + "notes": "Correctly chose WP_HTML_Processor::create_fragment(), used a single next_token() walk, documented structural calls, serialize_token() for most output, and checked both paused_at_incomplete_token() and get_last_error(). All API methods used are documented and execution recorded no _doing_it_wrong calls. Main adherence weakness: when a pending P proves non-empty it emits a literal

instead of the stored serialize_token() result, so the implementation is not fully following the documented token-serialization pattern and would drop attributes in broader cases." + }, + { + "trial_id": "trial-2", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Strong adherence. It uses the HTML Processor, buffers the serialized opener with serialize_token(), walks tokens once, identifies the closing P with documented is_tag_closer() and get_current_depth() semantics, and falls back on incomplete or unsupported input. No undocumented API calls or _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Strong adherence. It uses the HTML Processor, next_token(), serialize_token(), documented token/type/depth APIs, and the correct incomplete/error checks. The paragraph stack is more complex than necessary for HTML P parsing, but it remains within documented token-walking patterns and did not misuse the API." + } + ], + "failure_analysis": "All trials passed all 11 frozen cases, with no _doing_it_wrong records. The docs appear to have succeeded on the major points: the processor-choice guidance clearly directs structure-sensitive and normalized-output work to WP_HTML_Processor; the rewrite recipe for serialize_token() maps directly to dropping selected tokens while concatenating the rest; get_current_depth() explains closer-depth semantics well enough for the candidates to handle implicit paragraph closes; and the incomplete/error guidance led all trials to return the original input for truncated or unsupported markup. The main near-miss was trial-1's hand-built

emission after delaying a paragraph opener. That passed because the tests used un-attributed paragraphs, but a broader case with attributes would lose normalized opener details. This suggests the serialization docs are good but could be more explicit about storing serialized tokens when emission is deferred.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::serialize_token() docs and rewrite recipe", + "problem": "The docs say token-by-token rewriting can skip or emit tokens, but they do not explicitly warn that delayed emission should keep the exact serialize_token() result. A model hand-emitted

, which would drop attributes and other normalized opener details.", + "suggestion": "Add a short note and example: when buffering a token for possible later output, store `$serialized = $processor->serialize_token()` and emit that string later; do not reconstruct the tag name manually unless intentionally creating new markup." + }, + { + "location": "WP_HTML_Processor::get_current_depth() / is_tag_closer() docs", + "problem": "The closer-depth explanation is strong, but readers still have to derive the common predicate for identifying the closing token corresponding to a previously recorded opener.", + "suggestion": "Add a compact recipe for matching an element's own closer after recording opener depth: same tag name, is_tag_closer(), and depth below the opener depth, with a note that child closers can report the opener depth and must not end the subtree walk." + }, + { + "location": "WP_HTML_Processor overview or rewrite recipe", + "problem": "The docs discuss rejecting incomplete or unsupported input after a rewrite, but examples often return null rather than showing the common all-or-nothing filter policy of returning the original HTML unchanged.", + "suggestion": "Add a generic all-or-nothing rewrite skeleton that accumulates serialize_token() output and then returns the original input when paused_at_incomplete_token() is true or get_last_error() is non-null." + }, + { + "location": "WP_HTML_Processor::get_namespace() and tag-matching examples", + "problem": "The reference implementation guards P matching with get_namespace(), but the candidates matched only get_tag(). The docs list get_namespace(), yet examples of semantic tag matching rarely show a namespace guard.", + "suggestion": "In examples that transform HTML element semantics by tag name, include `html === $processor->get_namespace()` or a note explaining when tag-name checks should also verify namespace, especially around SVG and MathML content." + } + ] +} diff --git a/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-1/candidate.php b/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-1/candidate.php new file mode 100644 index 0000000000000..7fb42587191b9 --- /dev/null +++ b/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-1/candidate.php @@ -0,0 +1,56 @@ +next_token() ) { + $token_type = $processor->get_token_type(); + $tag_name = $processor->get_tag(); + $depth = $processor->get_current_depth(); + + if ( null !== $skip_p_depth ) { + if ( 'P' === $tag_name && $processor->is_tag_closer() && $depth === $skip_p_depth - 1 ) { + $skip_p_depth = null; + } + continue; + } + + if ( $pending_empty_p ) { + if ( 'P' === $tag_name && $processor->is_tag_closer() && $depth === $pending_p_depth - 1 ) { + $pending_empty_p = false; + $pending_p_depth = null; + continue; + } + + $output .= '

'; + $pending_empty_p = false; + $pending_p_depth = null; + } + + if ( '#tag' === $token_type && 'P' === $tag_name && ! $processor->is_tag_closer() ) { + $pending_empty_p = true; + $pending_p_depth = $depth; + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) { + return $html; + } + + if ( $pending_empty_p ) { + $output .= '

'; + } + + return $output; +} diff --git a/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-1/execution.json b/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-1/execution.json new file mode 100644 index 0000000000000..3a6ed3ce2ed59 --- /dev/null +++ b/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-1/execution.json @@ -0,0 +1,107 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-1/candidate.php", + "function": "remove_empty_paragraphs", + "passed": 11, + "total": 11, + "cases": [ + { + "id": "mixed-paragraphs", + "status": "pass", + "expected": "

Keep me

", + "actual": "

Keep me

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-and-whitespace", + "status": "pass", + "expected": "

\n\t

Text

", + "actual": "

\n\t

Text

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-content", + "status": "pass", + "expected": "

 

A B

", + "actual": "

 

A B

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "element-only-kept", + "status": "pass", + "expected": "


", + "actual": "


", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comment-and-script-kept", + "status": "pass", + "expected": "

", + "actual": "

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "self-closing-paragraph-syntax", + "status": "pass", + "expected": "

keep

", + "actual": "

keep

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "implicit-paragraph-close", + "status": "pass", + "expected": "

One

Block

Two

", + "actual": "

One

Block

Two

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-insensitive-source", + "status": "pass", + "expected": "

Keep

", + "actual": "

Keep

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-paragraphs", + "status": "pass", + "expected": "
Nothing to remove
", + "actual": "
Nothing to remove
", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-input-unchanged", + "status": "pass", + "expected": "

", + "actual": "

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-1/response.json b/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-1/response.json new file mode 100644 index 0000000000000..cdaf722632b69 --- /dev/null +++ b/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` and a single `next_token()` walk, building normalized output with `serialize_token()`. A `P` opener is held back until the next token proves whether it is immediately followed by its own closer; if so, both tokens are skipped as an empty paragraph, otherwise the opener is emitted and processing continues. Per the docs, the function falls back to the original HTML when parsing cannot complete cleanly by checking `paused_at_incomplete_token()` and `get_last_error()`.", + "confidence": 82 +} diff --git a/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-2/candidate.php b/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-2/candidate.php new file mode 100644 index 0000000000000..c653c987b7063 --- /dev/null +++ b/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-2/candidate.php @@ -0,0 +1,48 @@ +next_token() ) { + if ( null !== $pending_p ) { + $is_pending_p_closer = + '#tag' === $processor->get_token_type() && + 'P' === $processor->get_tag() && + $processor->is_tag_closer() && + $processor->get_current_depth() < $pending_p['depth']; + + if ( $is_pending_p_closer ) { + $pending_p = null; + continue; + } + + $output .= $pending_p['opener']; + $pending_p = null; + } + + if ( + '#tag' === $processor->get_token_type() && + 'P' === $processor->get_tag() && + ! $processor->is_tag_closer() + ) { + $pending_p = array( + 'depth' => $processor->get_current_depth(), + 'opener' => $processor->serialize_token(), + ); + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $pending_p || $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) { + return $html; + } + + return $output; +} diff --git a/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-2/execution.json b/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-2/execution.json new file mode 100644 index 0000000000000..51327acfa4a96 --- /dev/null +++ b/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-2/execution.json @@ -0,0 +1,107 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-2/candidate.php", + "function": "remove_empty_paragraphs", + "passed": 11, + "total": 11, + "cases": [ + { + "id": "mixed-paragraphs", + "status": "pass", + "expected": "

Keep me

", + "actual": "

Keep me

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-and-whitespace", + "status": "pass", + "expected": "

\n\t

Text

", + "actual": "

\n\t

Text

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-content", + "status": "pass", + "expected": "

 

A B

", + "actual": "

 

A B

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "element-only-kept", + "status": "pass", + "expected": "


", + "actual": "


", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comment-and-script-kept", + "status": "pass", + "expected": "

", + "actual": "

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "self-closing-paragraph-syntax", + "status": "pass", + "expected": "

keep

", + "actual": "

keep

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "implicit-paragraph-close", + "status": "pass", + "expected": "

One

Block

Two

", + "actual": "

One

Block

Two

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-insensitive-source", + "status": "pass", + "expected": "

Keep

", + "actual": "

Keep

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-paragraphs", + "status": "pass", + "expected": "
Nothing to remove
", + "actual": "
Nothing to remove
", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-input-unchanged", + "status": "pass", + "expected": "

", + "actual": "

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-2/response.json b/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-2/response.json new file mode 100644 index 0000000000000..dd183ef1c87c6 --- /dev/null +++ b/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment, then walks tokens with `next_token()` and rebuilds a normalized serialization via `serialize_token()`. A `P` opener is held back until the next token proves whether the paragraph is truly empty; if the next relevant token is that paragraph\u2019s closer, both are omitted, otherwise the opener is emitted and processing continues. If parsing aborts or ends on incomplete syntax, checked with `get_last_error()` and `paused_at_incomplete_token()`, the original HTML is returned unchanged.", + "confidence": 92 +} diff --git a/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-3/candidate.php b/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-3/candidate.php new file mode 100644 index 0000000000000..83d116d4e321a --- /dev/null +++ b/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-3/candidate.php @@ -0,0 +1,66 @@ +next_token() ) { + $is_p_tag = '#tag' === $processor->get_token_type() && 'P' === $processor->get_tag(); + + if ( $is_p_tag && ! $processor->is_tag_closer() ) { + $count = count( $paragraph_stack ); + for ( $i = 0; $i < $count; $i++ ) { + $paragraph_stack[ $i ]['has_content'] = true; + if ( ! $paragraph_stack[ $i ]['emitted'] ) { + $output .= $paragraph_stack[ $i ]['opener']; + $paragraph_stack[ $i ]['emitted'] = true; + } + } + + $paragraph_stack[] = array( + 'opener' => $processor->serialize_token(), + 'has_content' => false, + 'emitted' => false, + ); + continue; + } + + if ( $is_p_tag && $processor->is_tag_closer() ) { + if ( 0 === count( $paragraph_stack ) ) { + $output .= $processor->serialize_token(); + continue; + } + + $paragraph = array_pop( $paragraph_stack ); + if ( $paragraph['has_content'] ) { + if ( ! $paragraph['emitted'] ) { + $output .= $paragraph['opener']; + } + $output .= $processor->serialize_token(); + } + continue; + } + + $count = count( $paragraph_stack ); + for ( $i = 0; $i < $count; $i++ ) { + $paragraph_stack[ $i ]['has_content'] = true; + if ( ! $paragraph_stack[ $i ]['emitted'] ) { + $output .= $paragraph_stack[ $i ]['opener']; + $paragraph_stack[ $i ]['emitted'] = true; + } + } + + $output .= $processor->serialize_token(); + } + + if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) { + return $html; + } + + return $output; +} diff --git a/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-3/execution.json b/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-3/execution.json new file mode 100644 index 0000000000000..66860674e41cb --- /dev/null +++ b/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-3/execution.json @@ -0,0 +1,107 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-3/candidate.php", + "function": "remove_empty_paragraphs", + "passed": 11, + "total": 11, + "cases": [ + { + "id": "mixed-paragraphs", + "status": "pass", + "expected": "

Keep me

", + "actual": "

Keep me

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-and-whitespace", + "status": "pass", + "expected": "

\n\t

Text

", + "actual": "

\n\t

Text

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-content", + "status": "pass", + "expected": "

 

A B

", + "actual": "

 

A B

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "element-only-kept", + "status": "pass", + "expected": "


", + "actual": "


", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comment-and-script-kept", + "status": "pass", + "expected": "

", + "actual": "

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "self-closing-paragraph-syntax", + "status": "pass", + "expected": "

keep

", + "actual": "

keep

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "implicit-paragraph-close", + "status": "pass", + "expected": "

One

Block

Two

", + "actual": "

One

Block

Two

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-insensitive-source", + "status": "pass", + "expected": "

Keep

", + "actual": "

Keep

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-paragraphs", + "status": "pass", + "expected": "
Nothing to remove
", + "actual": "
Nothing to remove
", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-input-unchanged", + "status": "pass", + "expected": "

", + "actual": "

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-3/response.json b/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-3/response.json new file mode 100644 index 0000000000000..d83702dd3e569 --- /dev/null +++ b/doc-experiment/results/round-42/H04-remove-empty-paragraphs/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` and a single `next_token()` rewrite loop with `serialize_token()` to produce normalized output while selectively skipping empty `

` elements. Each paragraph opener is buffered until a child token is seen; if no token appears before its closing `P`, both opener and closer are omitted, and if processing aborts or pauses on incomplete input via `get_last_error()` or `paused_at_incomplete_token()`, the original HTML is returned unchanged.", + "confidence": 84 +} diff --git a/doc-experiment/results/round-42/N01-remove-external-class/judge.json b/doc-experiment/results/round-42/N01-remove-external-class/judge.json new file mode 100644 index 0000000000000..4ba1caa6b91ce --- /dev/null +++ b/doc-experiment/results/round-42/N01-remove-external-class/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the correct WP_HTML_Tag_Processor for a flat class edit. All called APIs and query keys are documented: constructor/new usage, next_tag(), tag_name, class_name, remove_class(), and get_updated_html(). The loop and final readback match documented patterns, and execution passed 7/7 with no _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation as trial-1. Correct processor, documented combined tag/class query, documented class-removal helper, and documented get_updated_html() output path. Execution passed 7/7 with no _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation as trial-1 with only formatting differences. API usage is fully documented and idiomatic for this task. Execution passed 7/7 with no _doing_it_wrong records." + } + ], + "failure_analysis": "No hidden cases failed across the three trials. The docs worked well for this task: the Tag Processor overview explicitly says to use it for flat tag/class/attribute edits; the Finding tags table documents next_tag() with both tag_name and class_name; the CSS class section says removing the only class removes the whole class attribute; and get_updated_html() is documented as the readback path after queued class changes. The main near-miss is class-name case semantics: the candidates happened to get the case-sensitive EXTERNAL case right, but next_tag()'s class_name parameter does not state the case/compat-mode behavior at the point of use, and has_class() documentation says ASCII case-insensitive even though default no-quirks behavior is byte-for-byte. That did not cause a failure here, but it is the most plausible source of future confusion.", + "doc_gaps": [ + { + "location": "WP_HTML_Tag_Processor::next_tag() parameter docs for $query['class_name']", + "problem": "The docs say the tag must contain the whole class name, but do not state whether matching is a whitespace-token match, whether it is substring-safe, or how case sensitivity works under the processor's compatibility mode.", + "suggestion": "Extend the class_name query docblock to say it matches a complete class token and document the exact case-sensitivity/compat-mode contract, with a short non-task-specific example such as class=\"note\" not matching class_name => \"not\"." + }, + { + "location": "WP_HTML_Tag_Processor::has_class() and class matching docs", + "problem": "The rendered docs say has_class() looks for an ASCII case-insensitive class name, while other docs/source behavior indicate no-quirks class matching is byte-for-byte and quirks mode is case-insensitive. This is easy to misapply to next_tag(... class_name ...) and remove_class().", + "suggestion": "Align has_class(), next_tag(class_name), add_class(), and remove_class() docs around one shared statement of class-name comparison semantics, including quirks vs no-quirks behavior." + }, + { + "location": "WP_HTML_Tag_Processor::remove_class() method docblock", + "problem": "The method-level section only says it removes a class and returns whether the class was set to be removed. The important contracts are elsewhere: it is safe when the class/attribute is absent, removing the final class removes the attribute, and the return value indicates the request was accepted for a matched opener, not necessarily that the class existed.", + "suggestion": "Move or repeat the key remove_class() behavioral contract in the method docblock: safe no-op for missing class, final class removes the attribute, untouched bytes are preserved as much as possible, and clarify return-value meaning." + } + ] +} diff --git a/doc-experiment/results/round-42/N01-remove-external-class/trial-1/candidate.php b/doc-experiment/results/round-42/N01-remove-external-class/trial-1/candidate.php new file mode 100644 index 0000000000000..62ea9dc242915 --- /dev/null +++ b/doc-experiment/results/round-42/N01-remove-external-class/trial-1/candidate.php @@ -0,0 +1,11 @@ +next_tag( array( 'tag_name' => 'A', 'class_name' => 'external' ) ) ) { + $processor->remove_class( 'external' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-42/N01-remove-external-class/trial-1/execution.json b/doc-experiment/results/round-42/N01-remove-external-class/trial-1/execution.json new file mode 100644 index 0000000000000..1a99f1984b4a7 --- /dev/null +++ b/doc-experiment/results/round-42/N01-remove-external-class/trial-1/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/N01-remove-external-class/trial-1/candidate.php", + "function": "remove_external_class", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "among-others", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "only-class-removes-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-class-untouched", + "status": "pass", + "expected": "stay", + "actual": "stay", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive-not-removed", + "status": "pass", + "expected": "caps", + "actual": "caps", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-links", + "status": "pass", + "expected": "123", + "actual": "123", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "non-link-untouched", + "status": "pass", + "expected": "

not a link
link", + "actual": "
not a link
link", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "middle-of-list", + "status": "pass", + "expected": "mid", + "actual": "mid", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/N01-remove-external-class/trial-1/response.json b/doc-experiment/results/round-42/N01-remove-external-class/trial-1/response.json new file mode 100644 index 0000000000000..6cc9cfd029f99 --- /dev/null +++ b/doc-experiment/results/round-42/N01-remove-external-class/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat attribute/class edit: construct the processor with the input HTML, repeatedly find `A` tags matching the `external` class via `next_tag()`, remove that class with `remove_class()`, and return the modified markup with `get_updated_html()`.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-42/N01-remove-external-class/trial-2/candidate.php b/doc-experiment/results/round-42/N01-remove-external-class/trial-2/candidate.php new file mode 100644 index 0000000000000..3134249573796 --- /dev/null +++ b/doc-experiment/results/round-42/N01-remove-external-class/trial-2/candidate.php @@ -0,0 +1,10 @@ +next_tag( array( 'tag_name' => 'A', 'class_name' => 'external' ) ) ) { + $processor->remove_class( 'external' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-42/N01-remove-external-class/trial-2/execution.json b/doc-experiment/results/round-42/N01-remove-external-class/trial-2/execution.json new file mode 100644 index 0000000000000..d0704eb3458de --- /dev/null +++ b/doc-experiment/results/round-42/N01-remove-external-class/trial-2/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/N01-remove-external-class/trial-2/candidate.php", + "function": "remove_external_class", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "among-others", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "only-class-removes-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-class-untouched", + "status": "pass", + "expected": "stay", + "actual": "stay", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive-not-removed", + "status": "pass", + "expected": "caps", + "actual": "caps", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-links", + "status": "pass", + "expected": "123", + "actual": "123", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "non-link-untouched", + "status": "pass", + "expected": "
not a link
link", + "actual": "
not a link
link", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "middle-of-list", + "status": "pass", + "expected": "mid", + "actual": "mid", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/N01-remove-external-class/trial-2/response.json b/doc-experiment/results/round-42/N01-remove-external-class/trial-2/response.json new file mode 100644 index 0000000000000..071da8cf36664 --- /dev/null +++ b/doc-experiment/results/round-42/N01-remove-external-class/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for flat tag/attribute editing: it constructs a processor for the input HTML, loops with `next_tag()` restricted to `A` tags carrying the `external` class, removes that class with `remove_class()`, and returns the result with `get_updated_html()`.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-42/N01-remove-external-class/trial-3/candidate.php b/doc-experiment/results/round-42/N01-remove-external-class/trial-3/candidate.php new file mode 100644 index 0000000000000..afe6664da97d8 --- /dev/null +++ b/doc-experiment/results/round-42/N01-remove-external-class/trial-3/candidate.php @@ -0,0 +1,17 @@ +next_tag( + array( + 'tag_name' => 'A', + 'class_name' => 'external', + ) + ) + ) { + $processor->remove_class( 'external' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-42/N01-remove-external-class/trial-3/execution.json b/doc-experiment/results/round-42/N01-remove-external-class/trial-3/execution.json new file mode 100644 index 0000000000000..1e6613385a691 --- /dev/null +++ b/doc-experiment/results/round-42/N01-remove-external-class/trial-3/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/N01-remove-external-class/trial-3/candidate.php", + "function": "remove_external_class", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "among-others", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "only-class-removes-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-class-untouched", + "status": "pass", + "expected": "stay", + "actual": "stay", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive-not-removed", + "status": "pass", + "expected": "caps", + "actual": "caps", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-links", + "status": "pass", + "expected": "123", + "actual": "123", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "non-link-untouched", + "status": "pass", + "expected": "
not a link
link", + "actual": "
not a link
link", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "middle-of-list", + "status": "pass", + "expected": "mid", + "actual": "mid", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/N01-remove-external-class/trial-3/response.json b/doc-experiment/results/round-42/N01-remove-external-class/trial-3/response.json new file mode 100644 index 0000000000000..64b8e6f1c4343 --- /dev/null +++ b/doc-experiment/results/round-42/N01-remove-external-class/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat attribute/class edit: it constructs the processor with the input HTML, repeatedly finds `A` tags matching the `external` class via `next_tag()`, removes that class with `remove_class()`, and returns the result with `get_updated_html()`.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-42/N02-collect-figure-images/judge.json b/doc-experiment/results/round-42/N02-collect-figure-images/judge.json new file mode 100644 index 0000000000000..2748160525945 --- /dev/null +++ b/doc-experiment/results/round-42/N02-collect-figure-images/judge.json @@ -0,0 +1,45 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Processor::create_fragment(), next_tag('IMG'), get_breadcrumbs(), and get_attribute(). All methods are documented, no _doing_it_wrong records appeared, and the attribute handling correctly distinguishes null, true, empty string, and decoded string values." + }, + { + "trial_id": "trial-2", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Uses the same documented structural approach as trial-1 and passes all edge cases. The only deduction is the extra all-or-nothing get_last_error() check after collection: documented, but not required by the task and potentially over-applies mutation/serialization guidance to a read-only extraction function." + }, + { + "trial_id": "trial-3", + "adherence": 93, + "hallucinated_methods": [], + "notes": "Correct processor choice and only documented APIs: create_fragment(), next_tag(), get_tag(), is_tag_closer(), and get_attribute(). The manual FIGURE depth counter with tag_closers is documented and works here, but is less idiomatic for ancestor containment than filtering IMG matches with get_breadcrumbs() or matches_breadcrumbs()." + } + ], + "failure_analysis": "No hidden case failed in any trial; each trial passed 9/9 cases with no _doing_it_wrong records. The docs did well at steering subjects to WP_HTML_Processor for structure-aware containment: the Tag Processor overview says it has no tree awareness, and the HTML Processor supported-elements section says to choose it when document structure matters. The Breadcrumbs section and get_breadcrumbs() method docs were enough for trials 1 and 2 to solve arbitrary-depth containment. The get_attribute() docs in the Tag Processor page explicitly describe null for missing attributes, true for boolean/valueless attributes, empty string for empty values, and decoded strings, which all trials handled correctly. Near-misses: trial 2 appears to have generalized get_last_error() rejection guidance beyond mutation/serialization, and trial 3 used manual closer tracking where breadcrumbs would have expressed the contract more directly.", + "doc_gaps": [ + { + "location": "html-processor.md, Breadcrumbs / next_tag() query documentation", + "problem": "The docs explain direct breadcrumb paths well, but they do not make the arbitrary-depth descendant pattern as explicit as the direct-child breadcrumb query pattern.", + "suggestion": "Add a general note that breadcrumb queries are child-path matches, while arbitrary ancestor containment should be checked by inspecting get_breadcrumbs() or matches_breadcrumbs() after matching the target token." + }, + { + "location": "html-processor.md, get_attribute()", + "problem": "The HTML Processor get_attribute() section lists string|true|null but omits the decoded-string sentence that appears in the Tag Processor docs, even though callers using only the HTML Processor page may need that contract.", + "suggestion": "Repeat or cross-link the inherited attribute-value semantics: missing returns null, valueless boolean returns true, empty quoted value returns '', and string values are already decoded." + }, + { + "location": "html-processor.md, get_last_error() and rewrite/scan recipes", + "problem": "The docs strongly emphasize rejecting or falling back on parser errors in mutation and serialization examples, which can make read-only extraction code apply an unnecessary all-or-nothing policy.", + "suggestion": "Clarify that get_last_error() distinguishes normal exhaustion from parser abort, and that whether to return partial results, empty results, or an error is caller policy for read-only scans." + }, + { + "location": "html-processor.md, tag_closers / is_tag_closer()", + "problem": "Manual opener/closer counters are documented but the docs do not clearly warn that they are often unnecessary for simple ancestor-membership checks and require understanding virtual closers and popped breadcrumbs.", + "suggestion": "Add guidance comparing manual closer tracking with breadcrumb-based containment, recommending breadcrumbs for membership tests and reserving closer/depth tracking for bounded subtree walks or transformations." + } + ] +} diff --git a/doc-experiment/results/round-42/N02-collect-figure-images/trial-1/candidate.php b/doc-experiment/results/round-42/N02-collect-figure-images/trial-1/candidate.php new file mode 100644 index 0000000000000..5f0a6b08a58ce --- /dev/null +++ b/doc-experiment/results/round-42/N02-collect-figure-images/trial-1/candidate.php @@ -0,0 +1,26 @@ +next_tag( 'IMG' ) ) { + $breadcrumbs = $processor->get_breadcrumbs(); + + if ( ! in_array( 'FIGURE', $breadcrumbs, true ) ) { + continue; + } + + $src = $processor->get_attribute( 'src' ); + + if ( is_string( $src ) && '' !== $src ) { + $images[] = $src; + } + } + + return $images; +} diff --git a/doc-experiment/results/round-42/N02-collect-figure-images/trial-1/execution.json b/doc-experiment/results/round-42/N02-collect-figure-images/trial-1/execution.json new file mode 100644 index 0000000000000..faca23b08bd56 --- /dev/null +++ b/doc-experiment/results/round-42/N02-collect-figure-images/trial-1/execution.json @@ -0,0 +1,129 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/N02-collect-figure-images/trial-1/candidate.php", + "function": "collect_figure_images", + "passed": 9, + "total": 9, + "cases": [ + { + "id": "in-and-out", + "status": "pass", + "expected": [ + "in.jpg" + ], + "actual": [ + "in.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-depth", + "status": "pass", + "expected": [ + "deep.jpg" + ], + "actual": [ + "deep.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-figures", + "status": "pass", + "expected": [ + "a.jpg", + "b.jpg", + "c.jpg" + ], + "actual": [ + "a.jpg", + "b.jpg", + "c.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-figures", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-src-skipped", + "status": "pass", + "expected": [ + "yes.jpg" + ], + "actual": [ + "yes.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-and-valueless-src-skipped", + "status": "pass", + "expected": [ + "yes.jpg" + ], + "actual": [ + "yes.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-decoded-src", + "status": "pass", + "expected": [ + "/i?a=1&b=2" + ], + "actual": [ + "/i?a=1&b=2" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "figcaption-sibling", + "status": "pass", + "expected": [ + "pic.jpg", + "cap.jpg" + ], + "actual": [ + "pic.jpg", + "cap.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-figure", + "status": "pass", + "expected": [ + "open.jpg", + "later.jpg" + ], + "actual": [ + "open.jpg", + "later.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/N02-collect-figure-images/trial-1/response.json b/doc-experiment/results/round-42/N02-collect-figure-images/trial-1/response.json new file mode 100644 index 0000000000000..2713a968fcb28 --- /dev/null +++ b/doc-experiment/results/round-42/N02-collect-figure-images/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure, then scans `IMG` openers in document order with `next_tag( 'IMG' )`. For each image, it checks `get_breadcrumbs()` to confirm a `FIGURE` ancestor exists at any depth, and reads the already-decoded `src` with `get_attribute()`, skipping missing or empty values.", + "confidence": 95 +} diff --git a/doc-experiment/results/round-42/N02-collect-figure-images/trial-2/candidate.php b/doc-experiment/results/round-42/N02-collect-figure-images/trial-2/candidate.php new file mode 100644 index 0000000000000..50cf882e3624a --- /dev/null +++ b/doc-experiment/results/round-42/N02-collect-figure-images/trial-2/candidate.php @@ -0,0 +1,28 @@ +next_tag( 'IMG' ) ) { + if ( ! in_array( 'FIGURE', $processor->get_breadcrumbs(), true ) ) { + continue; + } + + $src = $processor->get_attribute( 'src' ); + if ( is_string( $src ) && '' !== $src ) { + $images[] = $src; + } + } + + if ( null !== $processor->get_last_error() ) { + return array(); + } + + return $images; +} diff --git a/doc-experiment/results/round-42/N02-collect-figure-images/trial-2/execution.json b/doc-experiment/results/round-42/N02-collect-figure-images/trial-2/execution.json new file mode 100644 index 0000000000000..2d707d8f2f992 --- /dev/null +++ b/doc-experiment/results/round-42/N02-collect-figure-images/trial-2/execution.json @@ -0,0 +1,129 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/N02-collect-figure-images/trial-2/candidate.php", + "function": "collect_figure_images", + "passed": 9, + "total": 9, + "cases": [ + { + "id": "in-and-out", + "status": "pass", + "expected": [ + "in.jpg" + ], + "actual": [ + "in.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-depth", + "status": "pass", + "expected": [ + "deep.jpg" + ], + "actual": [ + "deep.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-figures", + "status": "pass", + "expected": [ + "a.jpg", + "b.jpg", + "c.jpg" + ], + "actual": [ + "a.jpg", + "b.jpg", + "c.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-figures", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-src-skipped", + "status": "pass", + "expected": [ + "yes.jpg" + ], + "actual": [ + "yes.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-and-valueless-src-skipped", + "status": "pass", + "expected": [ + "yes.jpg" + ], + "actual": [ + "yes.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-decoded-src", + "status": "pass", + "expected": [ + "/i?a=1&b=2" + ], + "actual": [ + "/i?a=1&b=2" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "figcaption-sibling", + "status": "pass", + "expected": [ + "pic.jpg", + "cap.jpg" + ], + "actual": [ + "pic.jpg", + "cap.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-figure", + "status": "pass", + "expected": [ + "open.jpg", + "later.jpg" + ], + "actual": [ + "open.jpg", + "later.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/N02-collect-figure-images/trial-2/response.json b/doc-experiment/results/round-42/N02-collect-figure-images/trial-2/response.json new file mode 100644 index 0000000000000..9649fa382d226 --- /dev/null +++ b/doc-experiment/results/round-42/N02-collect-figure-images/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because containment is structural, then scans only `IMG` openers with `next_tag( 'IMG' )`, checks `get_breadcrumbs()` to confirm each image is somewhere inside a `FIGURE`, and reads the already-decoded `src` with `get_attribute()`, skipping missing, boolean, and empty values.", + "confidence": 93 +} diff --git a/doc-experiment/results/round-42/N02-collect-figure-images/trial-3/candidate.php b/doc-experiment/results/round-42/N02-collect-figure-images/trial-3/candidate.php new file mode 100644 index 0000000000000..1ad46de04e686 --- /dev/null +++ b/doc-experiment/results/round-42/N02-collect-figure-images/trial-3/candidate.php @@ -0,0 +1,36 @@ +next_tag( array( 'tag_closers' => 'visit' ) ) ) { + if ( 'FIGURE' === $processor->get_tag() ) { + if ( $processor->is_tag_closer() ) { + if ( $figure_depth > 0 ) { + $figure_depth--; + } + } else { + $figure_depth++; + } + + continue; + } + + if ( $figure_depth > 0 && 'IMG' === $processor->get_tag() && ! $processor->is_tag_closer() ) { + $src = $processor->get_attribute( 'src' ); + + if ( is_string( $src ) && '' !== $src ) { + $images[] = $src; + } + } + } + + return $images; +} diff --git a/doc-experiment/results/round-42/N02-collect-figure-images/trial-3/execution.json b/doc-experiment/results/round-42/N02-collect-figure-images/trial-3/execution.json new file mode 100644 index 0000000000000..685fb8b3bc7b6 --- /dev/null +++ b/doc-experiment/results/round-42/N02-collect-figure-images/trial-3/execution.json @@ -0,0 +1,129 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/N02-collect-figure-images/trial-3/candidate.php", + "function": "collect_figure_images", + "passed": 9, + "total": 9, + "cases": [ + { + "id": "in-and-out", + "status": "pass", + "expected": [ + "in.jpg" + ], + "actual": [ + "in.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-depth", + "status": "pass", + "expected": [ + "deep.jpg" + ], + "actual": [ + "deep.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-figures", + "status": "pass", + "expected": [ + "a.jpg", + "b.jpg", + "c.jpg" + ], + "actual": [ + "a.jpg", + "b.jpg", + "c.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-figures", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-src-skipped", + "status": "pass", + "expected": [ + "yes.jpg" + ], + "actual": [ + "yes.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-and-valueless-src-skipped", + "status": "pass", + "expected": [ + "yes.jpg" + ], + "actual": [ + "yes.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-decoded-src", + "status": "pass", + "expected": [ + "/i?a=1&b=2" + ], + "actual": [ + "/i?a=1&b=2" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "figcaption-sibling", + "status": "pass", + "expected": [ + "pic.jpg", + "cap.jpg" + ], + "actual": [ + "pic.jpg", + "cap.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-figure", + "status": "pass", + "expected": [ + "open.jpg", + "later.jpg" + ], + "actual": [ + "open.jpg", + "later.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/N02-collect-figure-images/trial-3/response.json b/doc-experiment/results/round-42/N02-collect-figure-images/trial-3/response.json new file mode 100644 index 0000000000000..fac6175a5edb1 --- /dev/null +++ b/doc-experiment/results/round-42/N02-collect-figure-images/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses the structure-aware `WP_HTML_Processor::create_fragment()` and a single forward scan with `next_tag( array( 'tag_closers' => 'visit' ) )`. It tracks whether the current position is inside one or more `FIGURE` elements by incrementing/decrementing on `FIGURE` openers/closers, then collects decoded `src` values from `IMG` openers encountered while that depth is positive via `get_attribute( 'src' )`, skipping missing or empty values.", + "confidence": 94 +} diff --git a/doc-experiment/results/round-42/N03-first-list-count/judge.json b/doc-experiment/results/round-42/N03-first-list-count/judge.json new file mode 100644 index 0000000000000..0bed0fd8f5a0c --- /dev/null +++ b/doc-experiment/results/round-42/N03-first-list-count/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Processor::create_fragment() for a structural fragment task. Every API call is documented in the supplied markdown, including inherited Tag Processor methods. The solution follows the documented bookmark plus bounded next_token()/get_current_depth() pattern, seeks back to edit the opener, uses set_attribute() and get_updated_html(), and checks paused_at_incomplete_token() and get_last_error() before mutating." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same high-adherence pattern as trial-1: HTML Processor, documented calls only, no _doing_it_wrong records, depth-aware direct-child LI counting, bookmark/seek for the opener edit, and clean-scan checks for truncation or unsupported markup." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Uses the correct processor and the documented structural traversal idioms. The found_list flag is redundant but harmless. All methods are present in the rendered docs, and the code handles incomplete or unsupported input before applying the queued attribute update." + } + ], + "failure_analysis": "No failed hidden cases across the trials. All three passed 11/11 cases and execution.json recorded no _doing_it_wrong notices. The docs worked well here because the WP_HTML_Processor overview explicitly says to use the HTML Processor for nested structure, the scan-a-region recipe shows bookmark -> next_token() -> depth-bound walk -> paused_at_incomplete_token()/get_last_error() -> seek -> edit, next_tag() explains that tag_name is not a list and recommends scanning any tag then branching, and get_current_depth()/next_token() explain the >= subtree boundary needed for omitted closers and nested elements. Near-misses: the unsupported-after-closed-list case depends on stopping at the completed container boundary rather than draining the rest of the document; the recipes imply this, but get_last_error() itself does not make that scope especially explicit. Also, the HTML Processor set_bookmark section contains an inherited Tag Processor example, which could steer weaker readers toward the wrong processor despite the overview guidance.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::set_bookmark() docblock / rendered HTML Processor bookmark section", + "problem": "The method section includes a WP_HTML_Tag_Processor example inside the HTML Processor docs. For structural tasks, that can conflict with the overview’s advice to use WP_HTML_Processor.", + "suggestion": "Add or replace with an HTML Processor-specific bookmark example using create_fragment(), next_token(), get_current_depth(), seek(), and get_updated_html(); label any inherited Tag Processor example as lexical-only." + }, + { + "location": "WP_HTML_Processor::get_last_error() and next_token() bounded-walk docs", + "problem": "The docs do not explicitly state that get_last_error() only reflects markup scanned so far, so callers may over-scan beyond a completed region and reject otherwise valid edits because of later unsupported markup.", + "suggestion": "Document the contract for bounded scans: after a loop exits because depth dropped below the recorded container depth, paused_at_incomplete_token() and get_last_error() validate the scanned region; callers need not scan unrelated trailing markup unless their own contract requires whole-document validation." + }, + { + "location": "WP_HTML_Processor::get_current_depth() docblock", + "problem": "The direct-child opener predicate is easy to miss because the method doc emphasizes subtree membership, while the compact direct-child checks are in the overview recipe.", + "suggestion": "Include a short direct-child element predicate in the get_current_depth() method docs: require #tag, not a closer, and current depth equal to container depth + 1, then apply the caller’s tag-name test." + } + ] +} diff --git a/doc-experiment/results/round-42/N03-first-list-count/trial-1/candidate.php b/doc-experiment/results/round-42/N03-first-list-count/trial-1/candidate.php new file mode 100644 index 0000000000000..c747f9378f3a9 --- /dev/null +++ b/doc-experiment/results/round-42/N03-first-list-count/trial-1/candidate.php @@ -0,0 +1,54 @@ +next_tag() ) { + $tag_name = $processor->get_tag(); + if ( 'UL' !== $tag_name && 'OL' !== $tag_name ) { + continue; + } + + if ( ! $processor->set_bookmark( 'first-list' ) ) { + return $html; + } + + $list_depth = $processor->get_current_depth(); + $count = 0; + + while ( $processor->next_token() && $processor->get_current_depth() >= $list_depth ) { + $is_direct_child_li_opener = + '#tag' === $processor->get_token_type() && + ! $processor->is_tag_closer() && + 'LI' === $processor->get_tag() && + $processor->get_current_depth() === $list_depth + 1; + + if ( $is_direct_child_li_opener ) { + $count++; + } + } + + $scan_finished_cleanly = + ! $processor->paused_at_incomplete_token() && + null === $processor->get_last_error(); + + if ( ! $scan_finished_cleanly ) { + $processor->release_bookmark( 'first-list' ); + return $html; + } + + if ( ! $processor->seek( 'first-list' ) ) { + $processor->release_bookmark( 'first-list' ); + return $html; + } + + $processor->set_attribute( 'data-item-count', (string) $count ); + $processor->release_bookmark( 'first-list' ); + + return $processor->get_updated_html(); + } + + return $html; +} diff --git a/doc-experiment/results/round-42/N03-first-list-count/trial-1/execution.json b/doc-experiment/results/round-42/N03-first-list-count/trial-1/execution.json new file mode 100644 index 0000000000000..48941032519df --- /dev/null +++ b/doc-experiment/results/round-42/N03-first-list-count/trial-1/execution.json @@ -0,0 +1,107 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/N03-first-list-count/trial-1/candidate.php", + "function": "add_first_list_item_count", + "passed": 11, + "total": 11, + "cases": [ + { + "id": "simple-ul", + "status": "pass", + "expected": "
  • A
  • B
  • C
", + "actual": "
  • A
  • B
  • C
", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ol", + "status": "pass", + "expected": "
  1. A
  2. B
", + "actual": "
  1. A
  2. B
", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-list", + "status": "pass", + "expected": "

No list here.

", + "actual": "

No list here.

", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-count-overwritten", + "status": "pass", + "expected": "
  • A
", + "actual": "
  • A
", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-li-closers", + "status": "pass", + "expected": "
  • one
  • two", + "actual": "
    • one
    • two", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-list-counts-direct-children", + "status": "pass", + "expected": "
        • x
      • y", + "actual": "
          • x
        • y", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-token-inside-list", + "status": "pass", + "expected": "
          • ", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-images", + "status": "pass", + "expected": "

            Nothing here.

            ", + "actual": "

            Nothing here.

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unquoted-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-tag-at-end", + "status": "pass", + "expected": "

            text

            text

            next_tag( 'img' ) ) { + $processor->add_class( 'wp-image' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-42/T01-add-image-class/trial-2/execution.json b/doc-experiment/results/round-42/T01-add-image-class/trial-2/execution.json new file mode 100644 index 0000000000000..b3b8a3b31d772 --- /dev/null +++ b/doc-experiment/results/round-42/T01-add-image-class/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T01-add-image-class/trial-2/candidate.php", + "function": "add_image_class", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

            ", + "actual": "

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple", + "status": "pass", + "expected": "
            ", + "actual": "
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-classes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-tag", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-images", + "status": "pass", + "expected": "

            Nothing here.

            ", + "actual": "

            Nothing here.

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unquoted-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-tag-at-end", + "status": "pass", + "expected": "

            text

            text

            next_tag( 'img' ) ) { + $processor->add_class( 'wp-image' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-42/T01-add-image-class/trial-3/execution.json b/doc-experiment/results/round-42/T01-add-image-class/trial-3/execution.json new file mode 100644 index 0000000000000..063fd6b2d5fcc --- /dev/null +++ b/doc-experiment/results/round-42/T01-add-image-class/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T01-add-image-class/trial-3/candidate.php", + "function": "add_image_class", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

            ", + "actual": "

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple", + "status": "pass", + "expected": "
            ", + "actual": "
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-classes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-tag", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-images", + "status": "pass", + "expected": "

            Nothing here.

            ", + "actual": "

            Nothing here.

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unquoted-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-tag-at-end", + "status": "pass", + "expected": "

            text

            text

            get_attribute( $name ), with a warning that truthiness checks treat present-empty attributes as absent." + }, + { + "location": "WP_HTML_Tag_Processor::next_tag() and get_attribute() query/name docs", + "problem": "Case-insensitive tag and attribute-name matching is only implicit or scattered; exact-byte output tasks also care that untouched attribute casing is preserved.", + "suggestion": "State explicitly that HTML tag and attribute-name matching is ASCII case-insensitive, while untouched source bytes such as attribute casing remain preserved in get_updated_html()." + }, + { + "location": "Generated Method Index", + "problem": "Private/internal methods are listed alongside public methods, which can distract documentation-only users and invite invalid API usage despite the visibility column.", + "suggestion": "Separate private methods into an internal section or hide them in consumer-facing rendered docs, leaving public traversal, attribute, bookmark, text, and output APIs prominent." + } + ] +} diff --git a/doc-experiment/results/round-42/T02-link-targets/trial-1/candidate.php b/doc-experiment/results/round-42/T02-link-targets/trial-1/candidate.php new file mode 100644 index 0000000000000..76d60f57010d4 --- /dev/null +++ b/doc-experiment/results/round-42/T02-link-targets/trial-1/candidate.php @@ -0,0 +1,15 @@ +next_tag( 'a' ) ) { + if ( null === $processor->get_attribute( 'href' ) ) { + continue; + } + + $processor->set_attribute( 'target', '_blank' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-42/T02-link-targets/trial-1/execution.json b/doc-experiment/results/round-42/T02-link-targets/trial-1/execution.json new file mode 100644 index 0000000000000..7d919c3be0e8a --- /dev/null +++ b/doc-experiment/results/round-42/T02-link-targets/trial-1/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T02-link-targets/trial-1/candidate.php", + "function": "add_link_targets", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-skipped", + "status": "pass", + "expected": "staygo", + "actual": "staygo", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-target-overwritten", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup-in-link", + "status": "pass", + "expected": "bold move", + "actual": "bold move", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T02-link-targets/trial-1/response.json b/doc-experiment/results/round-42/T02-link-targets/trial-1/response.json new file mode 100644 index 0000000000000..a58862fd9ef1b --- /dev/null +++ b/doc-experiment/results/round-42/T02-link-targets/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat attribute edit where byte-preserving output matters. The function scans `A` tags with `next_tag( 'a' )`, uses `get_attribute( 'href' )` to detect whether `href` is present (including empty-string and boolean forms, since only absence returns `null`), overwrites or adds `target` via `set_attribute( 'target', '_blank' )`, and returns the result with `get_updated_html()`.", + "confidence": 97 +} diff --git a/doc-experiment/results/round-42/T02-link-targets/trial-2/candidate.php b/doc-experiment/results/round-42/T02-link-targets/trial-2/candidate.php new file mode 100644 index 0000000000000..972d891e974db --- /dev/null +++ b/doc-experiment/results/round-42/T02-link-targets/trial-2/candidate.php @@ -0,0 +1,14 @@ +next_tag( 'A' ) ) { + if ( null === $processor->get_attribute( 'href' ) ) { + continue; + } + + $processor->set_attribute( 'target', '_blank' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-42/T02-link-targets/trial-2/execution.json b/doc-experiment/results/round-42/T02-link-targets/trial-2/execution.json new file mode 100644 index 0000000000000..6dbdb01b33429 --- /dev/null +++ b/doc-experiment/results/round-42/T02-link-targets/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T02-link-targets/trial-2/candidate.php", + "function": "add_link_targets", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-skipped", + "status": "pass", + "expected": "staygo", + "actual": "staygo", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-target-overwritten", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup-in-link", + "status": "pass", + "expected": "bold move", + "actual": "bold move", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T02-link-targets/trial-2/response.json b/doc-experiment/results/round-42/T02-link-targets/trial-2/response.json new file mode 100644 index 0000000000000..87e4c25a13c22 --- /dev/null +++ b/doc-experiment/results/round-42/T02-link-targets/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for byte-preserving attribute edits: it scans only `A` tags with `next_tag( 'A' )`, checks `get_attribute( 'href' )` so present-empty (`\"\"`) and boolean (`true` for ``) both count as present while absent remains `null`, then overwrites or adds `target` via `set_attribute( 'target', '_blank' )`, and returns the modified markup with `get_updated_html()`.", + "confidence": 97 +} diff --git a/doc-experiment/results/round-42/T02-link-targets/trial-3/candidate.php b/doc-experiment/results/round-42/T02-link-targets/trial-3/candidate.php new file mode 100644 index 0000000000000..f64b40aff45c1 --- /dev/null +++ b/doc-experiment/results/round-42/T02-link-targets/trial-3/candidate.php @@ -0,0 +1,14 @@ +next_tag( 'a' ) ) { + if ( null === $processor->get_attribute( 'href' ) ) { + continue; + } + + $processor->set_attribute( 'target', '_blank' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-42/T02-link-targets/trial-3/execution.json b/doc-experiment/results/round-42/T02-link-targets/trial-3/execution.json new file mode 100644 index 0000000000000..cfc62bf063766 --- /dev/null +++ b/doc-experiment/results/round-42/T02-link-targets/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T02-link-targets/trial-3/candidate.php", + "function": "add_link_targets", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-skipped", + "status": "pass", + "expected": "staygo", + "actual": "staygo", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-target-overwritten", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup-in-link", + "status": "pass", + "expected": "bold move", + "actual": "bold move", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T02-link-targets/trial-3/response.json b/doc-experiment/results/round-42/T02-link-targets/trial-3/response.json new file mode 100644 index 0000000000000..62571a0d13caa --- /dev/null +++ b/doc-experiment/results/round-42/T02-link-targets/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat, byte-preserving attribute rewrite: it scans all `A` tags with `next_tag( 'a' )`, checks `get_attribute( 'href' )` so present empty-string and boolean `href` attributes both count, and applies `set_attribute( 'target', '_blank' )` to overwrite or create the attribute before returning the result with `get_updated_html()`.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-42/T03-first-h1-text/judge.json b/doc-experiment/results/round-42/T03-first-h1-text/judge.json new file mode 100644 index 0000000000000..d12d050525ec5 --- /dev/null +++ b/doc-experiment/results/round-42/T03-first-h1-text/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), found H1 with next_tag(), bounded the subtree walk by get_current_depth() with >=, collected only #text tokens via get_token_type() and get_modifiable_text(). This matches the rendered docs' subtree text recipe exactly. No _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same documented, idiomatic pattern as trial-1: HTML Processor for tree-aware text extraction, depth-bounded next_token() walk, #text-only accumulation, decoded text through get_modifiable_text(). No unsupported API usage or misuse records." + }, + { + "trial_id": "trial-3", + "adherence": 93, + "hallucinated_methods": [], + "notes": "Correct processor and all called methods are documented. The main traversal is idiomatic, but it also opts into SCRIPT, STYLE, TEXTAREA, and TITLE opener text. That behavior is documented, but the docs' subtree text recipe says ordinary subtree text should append only #text tokens unless the caller explicitly wants special-element content. This is a plausible over-application of the special-element exception and could diverge on special-element-in-heading inputs." + } + ], + "failure_analysis": "All three trials passed all 8 hidden cases, so there are no failed hidden cases to diagnose.\n\nThe docs did well on the core path: the HTML Processor overview explicitly says to use WP_HTML_Processor when structure matters, including collecting element text and handling missing closing tags. The 'Recipe: collect DOM-style text from a subtree' gives almost the exact shape needed: create_fragment(), next_tag(), record depth, walk next_token(), append only #text via get_modifiable_text(). The get_current_depth() section explains why the guard must be >= rather than >, which prevented the common nested-markup failure. The next_token() section explains that unclosed elements still produce closing tokens, which supports the unclosed-h1 case. The get_modifiable_text() section clearly states that #text is already decoded, preventing double decoding and preserving the empty-string image-only case.\n\nThe only near-miss is trial-3. It noticed the documented special-element exception and included opener text from SCRIPT, STYLE, TEXTAREA, and TITLE. The docs do say those elements carry modifiable text on the element token, but the same recipe also says ordinary subtree text is only #text tokens unless the caller intentionally opts into another token type. The remaining ambiguity is terminology: a task or reader saying 'text content' may sound broader than the docs' 'ordinary subtree text', especially because get_modifiable_text() documents special-element text in the same area.", + "doc_gaps": [ + { + "location": "html-processor.md, 'Recipe: collect DOM-style text from a subtree' and next_token() special-element note", + "problem": "The distinction between ordinary parsed text descendants and special-element token text is present, but easy to over-apply when a caller says 'text content'.", + "suggestion": "Add a short contract note defining the default recipe as 'ordinary HTML subtree text: #text tokens only; excludes SCRIPT/STYLE raw text and TEXTAREA/TITLE opener text unless the caller explicitly says to include those elements'." + }, + { + "location": "html-processor.md, get_modifiable_text()", + "problem": "The method documents many token types that can return text, but readers may treat that as a collection rule rather than a capability list.", + "suggestion": "Add a warning near the method summary: 'This method answers what the current token can expose, not whether that token belongs in a text-extraction result; choose token types first, then call this method.'" + }, + { + "location": "html-processor.md, text extraction examples", + "problem": "The successful pattern is shown for ARTICLE and LI, but not framed as reusable for headings or other phrasing-content containers where nested inline markup is common.", + "suggestion": "Add one compact example or sentence saying the same depth-bounded #text walk applies to headings, captions, links, and list items, and returns an empty string when the element contains no #text tokens." + } + ] +} diff --git a/doc-experiment/results/round-42/T03-first-h1-text/trial-1/candidate.php b/doc-experiment/results/round-42/T03-first-h1-text/trial-1/candidate.php new file mode 100644 index 0000000000000..46bfdf8965376 --- /dev/null +++ b/doc-experiment/results/round-42/T03-first-h1-text/trial-1/candidate.php @@ -0,0 +1,24 @@ +next_tag( 'H1' ) ) { + return null; + } + + $h1_depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-42/T03-first-h1-text/trial-1/execution.json b/doc-experiment/results/round-42/T03-first-h1-text/trial-1/execution.json new file mode 100644 index 0000000000000..a3c832307fe93 --- /dev/null +++ b/doc-experiment/results/round-42/T03-first-h1-text/trial-1/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T03-first-h1-text/trial-1/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T03-first-h1-text/trial-1/response.json b/doc-experiment/results/round-42/T03-first-h1-text/trial-1/response.json new file mode 100644 index 0000000000000..da5452cc66e36 --- /dev/null +++ b/doc-experiment/results/round-42/T03-first-h1-text/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, `next_tag( 'H1' )` to find the first heading, then a depth-bounded `next_token()` walk starting at that `H1`. During the walk it appends only `#text` token content via `get_modifiable_text()`, which the documentation says already returns decoded text, so nested markup contributes only its descendant text and non-text elements like images contribute nothing.", + "confidence": 93 +} diff --git a/doc-experiment/results/round-42/T03-first-h1-text/trial-2/candidate.php b/doc-experiment/results/round-42/T03-first-h1-text/trial-2/candidate.php new file mode 100644 index 0000000000000..9853515781c4d --- /dev/null +++ b/doc-experiment/results/round-42/T03-first-h1-text/trial-2/candidate.php @@ -0,0 +1,23 @@ +next_tag( 'H1' ) ) { + return null; + } + + $h1_depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-42/T03-first-h1-text/trial-2/execution.json b/doc-experiment/results/round-42/T03-first-h1-text/trial-2/execution.json new file mode 100644 index 0000000000000..0e6c694c22e81 --- /dev/null +++ b/doc-experiment/results/round-42/T03-first-h1-text/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T03-first-h1-text/trial-2/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T03-first-h1-text/trial-2/response.json b/doc-experiment/results/round-42/T03-first-h1-text/trial-2/response.json new file mode 100644 index 0000000000000..b8dfd5f3fc497 --- /dev/null +++ b/doc-experiment/results/round-42/T03-first-h1-text/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction depends on document structure, then finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, and walks its subtree with `next_token()`, concatenating only `#text` tokens via `get_token_type()` and `get_modifiable_text()`, which already returns decoded text.", + "confidence": 91 +} diff --git a/doc-experiment/results/round-42/T03-first-h1-text/trial-3/candidate.php b/doc-experiment/results/round-42/T03-first-h1-text/trial-3/candidate.php new file mode 100644 index 0000000000000..1fa6d223c7de9 --- /dev/null +++ b/doc-experiment/results/round-42/T03-first-h1-text/trial-3/candidate.php @@ -0,0 +1,39 @@ +next_tag( 'H1' ) ) { + return null; + } + + $depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + continue; + } + + if ( '#tag' !== $processor->get_token_type() || $processor->is_tag_closer() ) { + continue; + } + + $tag_name = $processor->get_token_name(); + if ( + 'SCRIPT' === $tag_name || + 'STYLE' === $tag_name || + 'TEXTAREA' === $tag_name || + 'TITLE' === $tag_name + ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-42/T03-first-h1-text/trial-3/execution.json b/doc-experiment/results/round-42/T03-first-h1-text/trial-3/execution.json new file mode 100644 index 0000000000000..e6c155db5c64b --- /dev/null +++ b/doc-experiment/results/round-42/T03-first-h1-text/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T03-first-h1-text/trial-3/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T03-first-h1-text/trial-3/response.json b/doc-experiment/results/round-42/T03-first-h1-text/trial-3/response.json new file mode 100644 index 0000000000000..b18fd2a458b16 --- /dev/null +++ b/doc-experiment/results/round-42/T03-first-h1-text/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is tree-aware, finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, then walks that subtree with `next_token()`. It appends decoded text from `#text` tokens via `get_modifiable_text()`, and also whitelists special text-carrying element opener tokens (`SCRIPT`, `STYLE`, `TEXTAREA`, `TITLE`) because the processor documents that their text lives on the element token rather than separate `#text` children.", + "confidence": 73 +} diff --git a/doc-experiment/results/round-42/T04-build-figure/judge.json b/doc-experiment/results/round-42/T04-build-figure/judge.json new file mode 100644 index 0000000000000..e2b02a92c5b7a --- /dev/null +++ b/doc-experiment/results/round-42/T04-build-figure/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Tag_Processor with a literal template, predeclared src/alt attributes to preserve order, walked tokens to a #text placeholder, used set_attribute()/set_modifiable_text() with plain strings, and returned get_updated_html(). All called methods are documented and execution recorded no misuse." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same documented pattern as the reference: Tag Processor construction, next_tag('img'), attribute replacement in-place, next_token() text walk, set_modifiable_text(), and get_updated_html(). No undocumented API calls or _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correct processor and documented API usage throughout. The early return if the template IMG is not found is unnecessary for a fixed internal template, but it is not an API misuse and does not affect adherence." + } + ], + "failure_analysis": "All three trials passed all 7 hidden cases, so there are no failed hidden cases to attribute. The docs did well in the exact areas this task required: the Tag Processor overview says it is appropriate for flat, byte-preserving edits; the 'Building markup from a template' section directly explains filling a literal template with untrusted values, including the two key rules that existing attributes preserve written order and text replacement needs a placeholder text node; set_attribute() documents that it accepts plain unescaped strings, encodes them, and preserves existing attribute positions; set_modifiable_text() documents that ordinary element text must be reached as a #text token and is encoded from plaintext; get_updated_html() is clearly identified as the correct output method after queued edits. The main near-miss is that next_token() contains a contradictory sentence saying the Tag Processor currently only supports the tag token, while surrounding examples and method docs rely on #text tokens. These subjects followed the stronger template-building guidance anyway, but that line could mislead less capable readers.", + "doc_gaps": [ + { + "location": "html-tag-processor.md, next_token() method docs", + "problem": "The text says the Tag Processor currently only supports the tag token, contradicting documented #text/comment/doctype token handling and the template-building examples that use #text.", + "suggestion": "Replace the stale limitation with an accurate list of supported token types and explicitly state that next_token() can visit #text tokens suitable for get_modifiable_text()/set_modifiable_text()." + }, + { + "location": "html-tag-processor.md, Building markup from a template", + "problem": "The example is excellent for a single text placeholder, but it does not name the failure mode if the placeholder is omitted beyond the bullet text.", + "suggestion": "Add a short note after the example: set_modifiable_text() replaces an existing text token; it does not insert a new child into an empty element, so templates intended for text replacement should include a placeholder." + }, + { + "location": "html-tag-processor.md, set_modifiable_text() examples", + "problem": "The method says to always check the return value, but examples often omit the check after matching #text, creating tension between strict guidance and common safe usage.", + "suggestion": "Clarify when checking can be omitted in examples, or show a minimal failure branch for set_modifiable_text() so readers understand the contract without overcomplicating template-fill code." + } + ] +} diff --git a/doc-experiment/results/round-42/T04-build-figure/trial-1/candidate.php b/doc-experiment/results/round-42/T04-build-figure/trial-1/candidate.php new file mode 100644 index 0000000000000..0ec02ccfdbd4c --- /dev/null +++ b/doc-experiment/results/round-42/T04-build-figure/trial-1/candidate.php @@ -0,0 +1,19 @@ +
            .
            ' ); + + if ( $processor->next_tag( 'img' ) ) { + $processor->set_attribute( 'src', $url ); + $processor->set_attribute( 'alt', $alt ); + } + + while ( $processor->next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $processor->set_modifiable_text( $caption ); + break; + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-42/T04-build-figure/trial-1/execution.json b/doc-experiment/results/round-42/T04-build-figure/trial-1/execution.json new file mode 100644 index 0000000000000..3aecf82bfcdc7 --- /dev/null +++ b/doc-experiment/results/round-42/T04-build-figure/trial-1/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T04-build-figure/trial-1/candidate.php", + "function": "build_figure", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "
            \"A
            My dog
            ", + "actual": "
            \"A
            My dog
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ampersand-in-caption", + "status": "pass", + "expected": "
            \"Pair\"
            Fish & Chips
            ", + "actual": "
            \"Pair\"
            Fish & Chips
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "quotes-in-alt", + "status": "pass", + "expected": "
            \"The
            Caption
            ", + "actual": "
            \"The
            Caption
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "special-chars-in-url", + "status": "pass", + "expected": "
            \"Alt\"
            Caption
            ", + "actual": "
            \"Alt\"
            Caption
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "angle-brackets-in-caption", + "status": "pass", + "expected": "
            \"Code\"
            Use <em> tags & enjoy
            ", + "actual": "
            \"Code\"
            Use <em> tags & enjoy
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unicode", + "status": "pass", + "expected": "
            \"Schnée
            Winter 🌨️ scene
            ", + "actual": "
            \"Schnée
            Winter 🌨️ scene
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "html-in-caption-not-parsed", + "status": "pass", + "expected": "
            \"alt\"
            <script>alert(1)</script>
            ", + "actual": "
            \"alt\"
            <script>alert(1)</script>
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T04-build-figure/trial-1/response.json b/doc-experiment/results/round-42/T04-build-figure/trial-1/response.json new file mode 100644 index 0000000000000..57179040b7273 --- /dev/null +++ b/doc-experiment/results/round-42/T04-build-figure/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This builds the fragment from a fixed HTML template using `WP_HTML_Tag_Processor`, which the docs recommend when exact attribute order matters. It updates the existing `src` and `alt` attributes with `set_attribute()`, then walks tokens with `next_token()` to find the placeholder `#text` node inside `figcaption` and replaces it with `set_modifiable_text()`, letting the HTML API handle all required encoding.", + "confidence": 95 +} diff --git a/doc-experiment/results/round-42/T04-build-figure/trial-2/candidate.php b/doc-experiment/results/round-42/T04-build-figure/trial-2/candidate.php new file mode 100644 index 0000000000000..bb347f67e790b --- /dev/null +++ b/doc-experiment/results/round-42/T04-build-figure/trial-2/candidate.php @@ -0,0 +1,18 @@ +
            .
            ' ); + + if ( $processor->next_tag( 'img' ) ) { + $processor->set_attribute( 'src', $url ); + $processor->set_attribute( 'alt', $alt ); + } + + while ( $processor->next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $processor->set_modifiable_text( $caption ); + break; + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-42/T04-build-figure/trial-2/execution.json b/doc-experiment/results/round-42/T04-build-figure/trial-2/execution.json new file mode 100644 index 0000000000000..6b4037642da3c --- /dev/null +++ b/doc-experiment/results/round-42/T04-build-figure/trial-2/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T04-build-figure/trial-2/candidate.php", + "function": "build_figure", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "
            \"A
            My dog
            ", + "actual": "
            \"A
            My dog
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ampersand-in-caption", + "status": "pass", + "expected": "
            \"Pair\"
            Fish & Chips
            ", + "actual": "
            \"Pair\"
            Fish & Chips
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "quotes-in-alt", + "status": "pass", + "expected": "
            \"The
            Caption
            ", + "actual": "
            \"The
            Caption
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "special-chars-in-url", + "status": "pass", + "expected": "
            \"Alt\"
            Caption
            ", + "actual": "
            \"Alt\"
            Caption
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "angle-brackets-in-caption", + "status": "pass", + "expected": "
            \"Code\"
            Use <em> tags & enjoy
            ", + "actual": "
            \"Code\"
            Use <em> tags & enjoy
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unicode", + "status": "pass", + "expected": "
            \"Schnée
            Winter 🌨️ scene
            ", + "actual": "
            \"Schnée
            Winter 🌨️ scene
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "html-in-caption-not-parsed", + "status": "pass", + "expected": "
            \"alt\"
            <script>alert(1)</script>
            ", + "actual": "
            \"alt\"
            <script>alert(1)</script>
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T04-build-figure/trial-2/response.json b/doc-experiment/results/round-42/T04-build-figure/trial-2/response.json new file mode 100644 index 0000000000000..de18367e0b607 --- /dev/null +++ b/doc-experiment/results/round-42/T04-build-figure/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` with a fixed HTML template so the output shape and attribute order stay exact. It updates the existing `src` and `alt` attributes with `set_attribute()` and replaces the placeholder figcaption text node with `set_modifiable_text()`, relying on the documented HTML API to perform the necessary escaping and encoding.", + "confidence": 96 +} diff --git a/doc-experiment/results/round-42/T04-build-figure/trial-3/candidate.php b/doc-experiment/results/round-42/T04-build-figure/trial-3/candidate.php new file mode 100644 index 0000000000000..d2c5538d13e31 --- /dev/null +++ b/doc-experiment/results/round-42/T04-build-figure/trial-3/candidate.php @@ -0,0 +1,20 @@ +
            .
            ' ); + + if ( ! $processor->next_tag( 'img' ) ) { + return ''; + } + + $processor->set_attribute( 'src', $url ); + $processor->set_attribute( 'alt', $alt ); + + while ( $processor->next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $processor->set_modifiable_text( $caption ); + break; + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-42/T04-build-figure/trial-3/execution.json b/doc-experiment/results/round-42/T04-build-figure/trial-3/execution.json new file mode 100644 index 0000000000000..ac197ad2d14fa --- /dev/null +++ b/doc-experiment/results/round-42/T04-build-figure/trial-3/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T04-build-figure/trial-3/candidate.php", + "function": "build_figure", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "
            \"A
            My dog
            ", + "actual": "
            \"A
            My dog
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ampersand-in-caption", + "status": "pass", + "expected": "
            \"Pair\"
            Fish & Chips
            ", + "actual": "
            \"Pair\"
            Fish & Chips
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "quotes-in-alt", + "status": "pass", + "expected": "
            \"The
            Caption
            ", + "actual": "
            \"The
            Caption
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "special-chars-in-url", + "status": "pass", + "expected": "
            \"Alt\"
            Caption
            ", + "actual": "
            \"Alt\"
            Caption
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "angle-brackets-in-caption", + "status": "pass", + "expected": "
            \"Code\"
            Use <em> tags & enjoy
            ", + "actual": "
            \"Code\"
            Use <em> tags & enjoy
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unicode", + "status": "pass", + "expected": "
            \"Schnée
            Winter 🌨️ scene
            ", + "actual": "
            \"Schnée
            Winter 🌨️ scene
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "html-in-caption-not-parsed", + "status": "pass", + "expected": "
            \"alt\"
            <script>alert(1)</script>
            ", + "actual": "
            \"alt\"
            <script>alert(1)</script>
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T04-build-figure/trial-3/response.json b/doc-experiment/results/round-42/T04-build-figure/trial-3/response.json new file mode 100644 index 0000000000000..c1f618e5fc15d --- /dev/null +++ b/doc-experiment/results/round-42/T04-build-figure/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` with a literal template so the final markup shape and `img` attribute order stay exact. It updates the existing `src` and `alt` attributes with `set_attribute()`, then walks tokens with `next_token()` to find the placeholder text node inside `figcaption` and replaces it with `set_modifiable_text()`, letting the HTML API handle all required escaping.", + "confidence": 94 +} diff --git a/doc-experiment/results/round-42/T05-text-excerpt/judge.json b/doc-experiment/results/round-42/T05-text-excerpt/judge.json new file mode 100644 index 0000000000000..25aa4e5d467eb --- /dev/null +++ b/doc-experiment/results/round-42/T05-text-excerpt/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), next_token(), get_token_type(), get_token_name(), is_tag_closer(), and get_modifiable_text(), all documented in the rendered files. Correctly treated text extraction as an HTML Processor token walk, whitelisted #text plus TITLE/TEXTAREA opener tokens, excluded SCRIPT/STYLE, and decoded text via get_modifiable_text(). No _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used only documented APIs, including get_tag() for tag-name checks after confirming #tag tokens. Processor choice, token walking, special-element handling, decoded-text handling, and UTF-8 truncation were all aligned with documented guidance. No _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used only documented APIs and closely followed the documented pattern: create a BODY fragment processor, walk tokens, collect #text, opt into TITLE/TEXTAREA opener modifiable text, and truncate with mb_* using UTF-8. No _doing_it_wrong records." + } + ], + "failure_analysis": "No hidden cases failed in any trial. The docs did well on the exact hazards this task exercises: html-processor.md's 'Recipe: collect DOM-style text from a subtree' says to use WP_HTML_Processor for tree-aware text extraction, append ordinary #text tokens, and not treat every token with modifiable text as text. Its opt-in policy explicitly says TITLE and TEXTAREA provide decoded text on opener tokens while SCRIPT and STYLE provide raw text and should not be included merely because available. The next_token() section explains that special elements produce no #text children and that malformed input still produces closing tokens. The get_modifiable_text() section states that #text, TITLE, and TEXTAREA are already decoded UTF-8 and should be measured/sliced with an explicit UTF-8 encoding. Near-misses: trial-2 used get_tag() while trials 1 and 3 used get_token_name(); both are documented and valid here, but the docs alternate between them in examples, which could confuse weaker users about which is preferred for token-walk code.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_modifiable_text() / text extraction recipe", + "problem": "The special-element guidance is correct, but implementers still have to synthesize the include/exclude policy from several paragraphs: #text is ordinary DOM text, TITLE/TEXTAREA are decoded opt-in opener text, and SCRIPT/STYLE are raw opt-in text that many text-content callers must exclude.", + "suggestion": "Add a compact table for token text policies: token/source, whether it appears as #text child tokens, whether get_modifiable_text() is decoded or raw, and when callers should opt in." + }, + { + "location": "WP_HTML_Processor::get_token_name() and get_tag() docs", + "problem": "Examples use both get_token_name() and get_tag() for tag-name checks during token walks. Both worked in these trials, but the preferred choice is not explicit for code that first checks get_token_type() === '#tag'.", + "suggestion": "Add a short note: in token walks, use get_token_type() to distinguish token kinds; after confirming '#tag', either get_tag() or get_token_name() can identify the element name, with any semantic differences called out." + }, + { + "location": "WP_HTML_Processor::next_token() incomplete-input guidance", + "problem": "The docs mention paused_at_incomplete_token() and get_last_error(), but the contract for read-only extraction is spread across mutation/rewrite examples. It is not obvious when best-effort extraction may ignore incomplete trailing syntax versus when callers should reject it.", + "suggestion": "Add a general note for read-only token walks: next_token() only visits complete reported tokens; callers that require proof of complete input should check paused_at_incomplete_token() and get_last_error() after the walk." + } + ] +} diff --git a/doc-experiment/results/round-42/T05-text-excerpt/trial-1/candidate.php b/doc-experiment/results/round-42/T05-text-excerpt/trial-1/candidate.php new file mode 100644 index 0000000000000..a88e5f8eb53a1 --- /dev/null +++ b/doc-experiment/results/round-42/T05-text-excerpt/trial-1/candidate.php @@ -0,0 +1,34 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + continue; + } + + if ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) { + $tag_name = $processor->get_token_name(); + if ( 'TEXTAREA' === $tag_name || 'TITLE' === $tag_name ) { + $text .= $processor->get_modifiable_text(); + } + } + } + + if ( preg_match( '/^.{0,' . $max_codepoints . '}/us', $text, $matches ) ) { + return $matches[0]; + } + + return ''; +} diff --git a/doc-experiment/results/round-42/T05-text-excerpt/trial-1/execution.json b/doc-experiment/results/round-42/T05-text-excerpt/trial-1/execution.json new file mode 100644 index 0000000000000..197f7c9cc8095 --- /dev/null +++ b/doc-experiment/results/round-42/T05-text-excerpt/trial-1/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T05-text-excerpt/trial-1/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T05-text-excerpt/trial-1/response.json b/doc-experiment/results/round-42/T05-text-excerpt/trial-1/response.json new file mode 100644 index 0000000000000..e969296d0fc42 --- /dev/null +++ b/doc-experiment/results/round-42/T05-text-excerpt/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment with structure awareness, then walks tokens with `next_token()`, appending decoded text from ordinary `#text` tokens and, per the documented special-case rule, from `TITLE` and `TEXTAREA` opening tags via `get_modifiable_text()`. It intentionally excludes other modifiable-text tokens such as `SCRIPT`, `STYLE`, comments, and processing instructions, and truncates the final UTF-8 string to the requested number of Unicode code points with a Unicode-aware regex.", + "confidence": 90 +} diff --git a/doc-experiment/results/round-42/T05-text-excerpt/trial-2/candidate.php b/doc-experiment/results/round-42/T05-text-excerpt/trial-2/candidate.php new file mode 100644 index 0000000000000..86c96fb37a393 --- /dev/null +++ b/doc-experiment/results/round-42/T05-text-excerpt/trial-2/candidate.php @@ -0,0 +1,46 @@ +next_token() ) { + $chunk = ''; + + if ( '#text' === $processor->get_token_type() ) { + $chunk = $processor->get_modifiable_text(); + } elseif ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) { + $tag_name = $processor->get_tag(); + if ( 'TITLE' === $tag_name || 'TEXTAREA' === $tag_name ) { + $chunk = $processor->get_modifiable_text(); + } + } + + if ( '' === $chunk ) { + continue; + } + + $chunk_length = mb_strlen( $chunk, 'UTF-8' ); + $remaining = $max_codepoints - $length; + + if ( $chunk_length <= $remaining ) { + $text .= $chunk; + $length += $chunk_length; + continue; + } + + $text .= mb_substr( $chunk, 0, $remaining, 'UTF-8' ); + return $text; + } + + return $text; +} diff --git a/doc-experiment/results/round-42/T05-text-excerpt/trial-2/execution.json b/doc-experiment/results/round-42/T05-text-excerpt/trial-2/execution.json new file mode 100644 index 0000000000000..44f0568a98944 --- /dev/null +++ b/doc-experiment/results/round-42/T05-text-excerpt/trial-2/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T05-text-excerpt/trial-2/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T05-text-excerpt/trial-2/response.json b/doc-experiment/results/round-42/T05-text-excerpt/trial-2/response.json new file mode 100644 index 0000000000000..87e8b51c8b284 --- /dev/null +++ b/doc-experiment/results/round-42/T05-text-excerpt/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the HTML fragment with tree awareness, then walks all tokens with `next_token()`, appending decoded text from ordinary `#text` tokens and from `TITLE`/`TEXTAREA` opening tags via `get_modifiable_text()`, while excluding raw-text elements like `SCRIPT` and `STYLE`. It truncates incrementally with `mb_strlen()` and `mb_substr()` in UTF-8 so the result never splits a multi-byte character.", + "confidence": 91 +} diff --git a/doc-experiment/results/round-42/T05-text-excerpt/trial-3/candidate.php b/doc-experiment/results/round-42/T05-text-excerpt/trial-3/candidate.php new file mode 100644 index 0000000000000..f309c33c5e217 --- /dev/null +++ b/doc-experiment/results/round-42/T05-text-excerpt/trial-3/candidate.php @@ -0,0 +1,43 @@ + 0 && $processor->next_token() ) { + $text = ''; + $type = $processor->get_token_type(); + + if ( '#text' === $type ) { + $text = $processor->get_modifiable_text(); + } elseif ( '#tag' === $type && ! $processor->is_tag_closer() ) { + $name = $processor->get_token_name(); + if ( 'TITLE' === $name || 'TEXTAREA' === $name ) { + $text = $processor->get_modifiable_text(); + } + } + + if ( '' === $text ) { + continue; + } + + $length = mb_strlen( $text, 'UTF-8' ); + if ( $length <= $remaining ) { + $excerpt .= $text; + $remaining -= $length; + } else { + $excerpt .= mb_substr( $text, 0, $remaining, 'UTF-8' ); + break; + } + } + + return $excerpt; +} diff --git a/doc-experiment/results/round-42/T05-text-excerpt/trial-3/execution.json b/doc-experiment/results/round-42/T05-text-excerpt/trial-3/execution.json new file mode 100644 index 0000000000000..164e8ec148ba4 --- /dev/null +++ b/doc-experiment/results/round-42/T05-text-excerpt/trial-3/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T05-text-excerpt/trial-3/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T05-text-excerpt/trial-3/response.json b/doc-experiment/results/round-42/T05-text-excerpt/trial-3/response.json new file mode 100644 index 0000000000000..43151d02cf86e --- /dev/null +++ b/doc-experiment/results/round-42/T05-text-excerpt/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-context fragment with tree awareness, then walks tokens via `next_token()`, appending decoded text from ordinary `#text` tokens plus the special text-bearing `TITLE` and `TEXTAREA` opening-tag tokens through `get_modifiable_text()`. It truncates incrementally with `mb_strlen()` and `mb_substr()` in UTF-8 so the result is limited by Unicode code points without splitting multibyte characters.", + "confidence": 93 +} diff --git a/doc-experiment/results/round-42/T06-collect-links/judge.json b/doc-experiment/results/round-42/T06-collect-links/judge.json new file mode 100644 index 0000000000000..f15977c2228cd --- /dev/null +++ b/doc-experiment/results/round-42/T06-collect-links/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 95, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), a single next_token() walk, get_attribute() with is_string(), #text filtering, and get_modifiable_text(); all called methods are documented and execution recorded no API misuse. Small deduction: the final paused_at_incomplete_token()/get_last_error() all-or-nothing return is too conservative for this read-only extraction task and would discard already collected links after a trailing incomplete token." + }, + { + "trial_id": "trial-2", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Used the right processor and only documented methods. The closer/depth tracking reflects the documented get_current_depth()/is_tag_closer() semantics, and text/attribute handling is idiomatic. Same small edge-policy issue as trial-1: it rejects the whole result on trailing incomplete syntax even though the task and reference allow best-effort extraction of already visited links." + }, + { + "trial_id": "trial-3", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), one next_token() loop, string-only href handling, #text-only text accumulation, and closer-driven flushing. All methods are documented and no misuse was recorded. It relies on a single current link rather than depth/breadcrumb state, which is acceptable for A elements under the processor's virtual-closer behavior but is a less general pattern for repeated subtree extraction." + } + ], + "failure_analysis": "No hidden cases failed: all three trials passed 8/8. The docs appear to have succeeded on the key concepts: the HTML Processor overview and create_fragment() docs pointed subjects to the structural parser; get_attribute() documented null/true/string and decoded values clearly enough that every trial used is_string(); the DOM-style text recipe and get_modifiable_text() docs led every trial to append only #text tokens and avoid comments/markup/special-element token text; and next_token() documented virtual closers well enough that unclosed links worked. Near-miss: trials 1 and 2 over-applied the clean-scan guidance from the mutation/rewrite examples. In a read-only probe, `oknext_token() ) { + if ( '#tag' === $processor->get_token_type() && 'A' === $processor->get_tag() ) { + if ( $processor->is_tag_closer() ) { + if ( ! empty( $link_stack ) ) { + array_pop( $link_stack ); + } + continue; + } + + $href = $processor->get_attribute( 'href' ); + if ( is_string( $href ) ) { + $links[] = array( + 'href' => $href, + 'text' => '', + ); + $link_stack[] = count( $links ) - 1; + } + + continue; + } + + if ( '#text' === $processor->get_token_type() && ! empty( $link_stack ) ) { + $current_index = $link_stack[ count( $link_stack ) - 1 ]; + $links[ $current_index ]['text'] .= $processor->get_modifiable_text(); + } + } + + if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) { + return array(); + } + + return $links; +} diff --git a/doc-experiment/results/round-42/T06-collect-links/trial-1/execution.json b/doc-experiment/results/round-42/T06-collect-links/trial-1/execution.json new file mode 100644 index 0000000000000..ac52d0b01f2aa --- /dev/null +++ b/doc-experiment/results/round-42/T06-collect-links/trial-1/execution.json @@ -0,0 +1,148 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T06-collect-links/trial-1/candidate.php", + "function": "collect_links", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "actual": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-excluded", + "status": "pass", + "expected": [ + { + "href": "/only", + "text": "real" + } + ], + "actual": [ + { + "href": "/only", + "text": "real" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-in-href-decoded", + "status": "pass", + "expected": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "actual": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-link-empty-text", + "status": "pass", + "expected": [ + { + "href": "/img", + "text": "" + } + ], + "actual": [ + { + "href": "/img", + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-text", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "actual": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-links", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-link", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "actual": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T06-collect-links/trial-1/response.json b/doc-experiment/results/round-42/T06-collect-links/trial-1/response.json new file mode 100644 index 0000000000000..11afaaee500b8 --- /dev/null +++ b/doc-experiment/results/round-42/T06-collect-links/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, then walks the document with `next_token()`. For each `A` opening tag, it reads `href` with `get_attribute()` and includes it only when the returned value is a string, and while that link is open it concatenates descendant ordinary `#text` tokens via `get_token_type()`, `get_tag()`, `is_tag_closer()`, and `get_modifiable_text()`. It rejects truncated or unsupported input by checking `paused_at_incomplete_token()` and `get_last_error()` after the scan.", + "confidence": 84 +} diff --git a/doc-experiment/results/round-42/T06-collect-links/trial-2/candidate.php b/doc-experiment/results/round-42/T06-collect-links/trial-2/candidate.php new file mode 100644 index 0000000000000..3822ef97976b8 --- /dev/null +++ b/doc-experiment/results/round-42/T06-collect-links/trial-2/candidate.php @@ -0,0 +1,60 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text = $processor->get_modifiable_text(); + + foreach ( $active as $open_link ) { + $links[ $open_link['index'] ]['text'] .= $text; + } + + continue; + } + + if ( '#tag' !== $processor->get_token_type() || 'A' !== $processor->get_tag() ) { + continue; + } + + if ( $processor->is_tag_closer() ) { + $closer_depth = $processor->get_current_depth(); + + while ( ! empty( $active ) ) { + $last = $active[ count( $active ) - 1 ]; + if ( $last['depth'] !== $closer_depth + 1 ) { + break; + } + + array_pop( $active ); + } + + continue; + } + + $href = $processor->get_attribute( 'href' ); + if ( is_string( $href ) ) { + $links[] = array( + 'href' => $href, + 'text' => '', + ); + $active[] = array( + 'depth' => $processor->get_current_depth(), + 'index' => count( $links ) - 1, + ); + } + } + + if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) { + return array(); + } + + return $links; +} diff --git a/doc-experiment/results/round-42/T06-collect-links/trial-2/execution.json b/doc-experiment/results/round-42/T06-collect-links/trial-2/execution.json new file mode 100644 index 0000000000000..a4d3dd8ca0d50 --- /dev/null +++ b/doc-experiment/results/round-42/T06-collect-links/trial-2/execution.json @@ -0,0 +1,148 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T06-collect-links/trial-2/candidate.php", + "function": "collect_links", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "actual": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-excluded", + "status": "pass", + "expected": [ + { + "href": "/only", + "text": "real" + } + ], + "actual": [ + { + "href": "/only", + "text": "real" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-in-href-decoded", + "status": "pass", + "expected": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "actual": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-link-empty-text", + "status": "pass", + "expected": [ + { + "href": "/img", + "text": "" + } + ], + "actual": [ + { + "href": "/img", + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-text", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "actual": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-links", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-link", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "actual": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T06-collect-links/trial-2/response.json b/doc-experiment/results/round-42/T06-collect-links/trial-2/response.json new file mode 100644 index 0000000000000..e868944a666c2 --- /dev/null +++ b/doc-experiment/results/round-42/T06-collect-links/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` for tree-aware parsing, then walks the fragment with `next_token()`. It starts a result entry when it sees an opening `A` tag whose `get_attribute( 'href' )` returns a string, appends decoded text from `#text` tokens via `get_modifiable_text()` while that link is open, and ends tracking on the corresponding closing `A` token using `get_current_depth()` and `is_tag_closer()`.", + "confidence": 87 +} diff --git a/doc-experiment/results/round-42/T06-collect-links/trial-3/candidate.php b/doc-experiment/results/round-42/T06-collect-links/trial-3/candidate.php new file mode 100644 index 0000000000000..48eaa6eb0ed1c --- /dev/null +++ b/doc-experiment/results/round-42/T06-collect-links/trial-3/candidate.php @@ -0,0 +1,40 @@ +next_token() ) { + if ( '#tag' === $processor->get_token_type() && 'A' === $processor->get_tag() ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current ) { + $links[] = $current; + $current = null; + } + continue; + } + + $href = $processor->get_attribute( 'href' ); + if ( is_string( $href ) ) { + $current = array( + 'href' => $href, + 'text' => '', + ); + } + + continue; + } + + if ( null !== $current && '#text' === $processor->get_token_type() ) { + $current['text'] .= $processor->get_modifiable_text(); + } + } + + return $links; +} diff --git a/doc-experiment/results/round-42/T06-collect-links/trial-3/execution.json b/doc-experiment/results/round-42/T06-collect-links/trial-3/execution.json new file mode 100644 index 0000000000000..12b7d70e152d6 --- /dev/null +++ b/doc-experiment/results/round-42/T06-collect-links/trial-3/execution.json @@ -0,0 +1,148 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T06-collect-links/trial-3/candidate.php", + "function": "collect_links", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "actual": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-excluded", + "status": "pass", + "expected": [ + { + "href": "/only", + "text": "real" + } + ], + "actual": [ + { + "href": "/only", + "text": "real" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-in-href-decoded", + "status": "pass", + "expected": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "actual": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-link-empty-text", + "status": "pass", + "expected": [ + { + "href": "/img", + "text": "" + } + ], + "actual": [ + { + "href": "/img", + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-text", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "actual": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-links", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-link", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "actual": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T06-collect-links/trial-3/response.json b/doc-experiment/results/round-42/T06-collect-links/trial-3/response.json new file mode 100644 index 0000000000000..cd7feb909322b --- /dev/null +++ b/doc-experiment/results/round-42/T06-collect-links/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with tree awareness, then performs a single `next_token()` walk. On each `A` opener it reads `href` with `get_attribute()` and keeps only string-valued attributes (excluding missing and boolean `href`), accumulates descendant ordinary text from `#text` tokens via `get_modifiable_text()`, and flushes the collected entry when the matching `A` closer is reached.", + "confidence": 86 +} diff --git a/doc-experiment/results/round-42/T07-nested-lists/judge.json b/doc-experiment/results/round-42/T07-nested-lists/judge.json new file mode 100644 index 0000000000000..acfc37026ec72 --- /dev/null +++ b/doc-experiment/results/round-42/T07-nested-lists/judge.json @@ -0,0 +1,45 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Passed 7/7. Correctly chose WP_HTML_Processor::create_fragment() for ancestor-aware parsing; all called methods are documented in the rendered files: create_fragment, next_tag, get_tag, get_breadcrumbs, add_class, get_last_error, get_updated_html. Idiomatic single-pass tag walk, excludes the current list from its breadcrumb ancestor check, uses add_class() to preserve existing classes, and returns get_updated_html(). Minor deduction: it adds an all-or-nothing get_last_error() fallback policy that is safe but not required by the task, and it does not distinguish incomplete trailing syntax with paused_at_incomplete_token()." + }, + { + "trial_id": "trial-2", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Passed 7/7. Same substantive implementation as trial-1: correct processor choice, documented API only, proper breadcrumb ancestor inspection, add_class(), and get_updated_html(). Existing classes and byte preservation are handled through the documented class mutation API. Minor deduction for the same extra get_last_error() fallback policy and no explicit incomplete-token policy." + }, + { + "trial_id": "trial-3", + "adherence": 94, + "hallucinated_methods": [], + "notes": "Passed 7/7. Correct processor and all methods are documented, including inherited paused_at_incomplete_token(). The final mutation pass is sound. Deductions are for non-idiomatic redundancy: it performs a full validation scan, computes an unused $is_nested value, reparses the same HTML, and rejects incomplete trailing syntax wholesale. That policy is documented as caller-dependent, but for this task it could skip valid edits to complete list tags before a truncated tail." + } + ], + "failure_analysis": "No hidden/frozen case failed across the three trials; every execution passed 7/7 and no _doing_it_wrong records appeared. The docs did well on the central decision points: they clearly direct structural/ancestor-sensitive work to WP_HTML_Processor rather than WP_HTML_Tag_Processor, explain create_fragment() for body fragments, document that next_tag() walks openers by default, define get_breadcrumbs() as the root-to-current path including HTML/BODY/current node, and point mutation output to add_class() plus get_updated_html(). The near-misses were policy and ergonomics issues rather than failures. Trial-3 appears to have overgeneralized the incomplete-input guidance into a two-pass all-or-nothing validation flow, even though this task's decision is local to each current tag's breadcrumbs. Trials 1 and 2 also added a get_last_error() fallback after queueing edits; this is conservative, but the docs' serialization-oriented 'reject or fall back' language can be read as applying to all mutation loops, even when get_updated_html() can preserve untouched bytes and return queued edits.", + "doc_gaps": [ + { + "location": "html-processor.md > Breadcrumbs / get_breadcrumbs()", + "problem": "The docs explain direct breadcrumb paths well, but do not give a compact pattern for 'has any ancestor named X' and do not explicitly remind readers to exclude the current node when checking ancestors.", + "suggestion": "Add a general example showing arbitrary ancestor containment with get_breadcrumbs(), e.g. slice/pop the current node before in_array() checks, and contrast it with matches_breadcrumbs()/breadcrumb queries, which match paths rather than arbitrary-depth ancestors." + }, + { + "location": "html-processor.md > Usage recipes", + "problem": "The recipes emphasize scan-before-edit and bounded subtree walks. For edits whose condition is known at the current token, this can encourage unnecessary validation scans or reparsing, as in trial-3.", + "suggestion": "Add a 'single-pass structural class/attribute edit' recipe: create_fragment(), while next_tag(), inspect get_tag()/get_breadcrumbs()/get_current_depth(), mutate immediately with add_class()/set_attribute(), then return get_updated_html(). State that bookmarks or a pre-scan are only needed when the edit depends on information discovered later." + }, + { + "location": "html-processor.md > unsupported/incomplete input guidance; html-tag-processor.md > get_updated_html()", + "problem": "The docs repeatedly say to reject or fall back on get_last_error() and optionally paused_at_incomplete_token(), but the policy boundary is not clear for get_updated_html() mutation loops versus normalization/serialization loops.", + "suggestion": "Clarify that get_last_error() means the HTML Processor stopped before full structural analysis; callers may choose all-or-nothing fallback, but get_updated_html() still returns the original bytes with queued edits applied. Separately document that incomplete trailing tokens are preserved by get_updated_html(), and rejecting them is a caller policy, not a universal requirement." + }, + { + "location": "html-processor.md > inherited mutation methods such as add_class() and get_updated_html()", + "problem": "The WP_HTML_Processor page exposes inherited mutation methods, but some detailed semantics live mainly on the Tag Processor page: class preservation/no duplicate behavior and byte-preserving output are easy to miss when working from the Processor page.", + "suggestion": "On the Processor method stubs for add_class(), set_attribute(), remove_class(), and get_updated_html(), include or directly link the full inherited contract: existing class preservation, no duplicate class append, changed attributes re-emitted with double quotes, and untouched bytes preserved exactly." + } + ] +} diff --git a/doc-experiment/results/round-42/T07-nested-lists/trial-1/candidate.php b/doc-experiment/results/round-42/T07-nested-lists/trial-1/candidate.php new file mode 100644 index 0000000000000..f6773eccfc090 --- /dev/null +++ b/doc-experiment/results/round-42/T07-nested-lists/trial-1/candidate.php @@ -0,0 +1,38 @@ +next_tag() ) { + $tag = $processor->get_tag(); + + if ( 'UL' !== $tag && 'OL' !== $tag ) { + continue; + } + + $breadcrumbs = $processor->get_breadcrumbs(); + $has_list_ancestor = false; + $last_index = count( $breadcrumbs ) - 1; + + for ( $i = 0; $i < $last_index; $i++ ) { + if ( 'UL' === $breadcrumbs[ $i ] || 'OL' === $breadcrumbs[ $i ] ) { + $has_list_ancestor = true; + break; + } + } + + if ( $has_list_ancestor ) { + $processor->add_class( 'nested-list' ); + } + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-42/T07-nested-lists/trial-1/execution.json b/doc-experiment/results/round-42/T07-nested-lists/trial-1/execution.json new file mode 100644 index 0000000000000..acbe44cbc4b10 --- /dev/null +++ b/doc-experiment/results/round-42/T07-nested-lists/trial-1/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T07-nested-lists/trial-1/candidate.php", + "function": "mark_nested_lists", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple-ol-inside-ul", + "status": "pass", + "expected": "
            • One
              1. Nested
            ", + "actual": "
            • One
              1. Nested
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "top-level-lists-untouched", + "status": "pass", + "expected": "
            1. Top
            • Also top
            ", + "actual": "
            1. Top
            • Also top
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ul-inside-ol", + "status": "pass", + "expected": "
            1. One
              • Nested
            ", + "actual": "
            1. One
              • Nested
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "deep-descendant", + "status": "pass", + "expected": "
              1. Deep
            ", + "actual": "
              1. Deep
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class-preserved", + "status": "pass", + "expected": "
              1. Nested
            ", + "actual": "
              1. Nested
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-nested-levels", + "status": "pass", + "expected": "
            • A
              1. B
                • C
            ", + "actual": "
            • A
              1. B
                • C
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "mixed-document", + "status": "pass", + "expected": "

            intro

            • A
              1. B
            1. C
            ", + "actual": "

            intro

            • A
              1. B
            1. C
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T07-nested-lists/trial-1/response.json b/doc-experiment/results/round-42/T07-nested-lists/trial-1/response.json new file mode 100644 index 0000000000000..9a7a6324cbe9c --- /dev/null +++ b/doc-experiment/results/round-42/T07-nested-lists/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the fragment with structural awareness, scans list openers with `next_tag()`, checks each matched tag\u2019s ancestor path via `get_breadcrumbs()`, and adds the `nested-list` class with `add_class()` only when a `UL` or `OL` appears above it; the final byte-preserving result is returned with `get_updated_html()`, while unsupported-parser failures fall back to the original HTML.", + "confidence": 91 +} diff --git a/doc-experiment/results/round-42/T07-nested-lists/trial-2/candidate.php b/doc-experiment/results/round-42/T07-nested-lists/trial-2/candidate.php new file mode 100644 index 0000000000000..c3ecf67a2ce38 --- /dev/null +++ b/doc-experiment/results/round-42/T07-nested-lists/trial-2/candidate.php @@ -0,0 +1,35 @@ +next_tag() ) { + $tag = $processor->get_tag(); + if ( 'UL' !== $tag && 'OL' !== $tag ) { + continue; + } + + $breadcrumbs = $processor->get_breadcrumbs(); + $has_list_ancestor = false; + $last_index = count( $breadcrumbs ) - 1; + + for ( $i = 0; $i < $last_index; $i++ ) { + if ( 'UL' === $breadcrumbs[ $i ] || 'OL' === $breadcrumbs[ $i ] ) { + $has_list_ancestor = true; + break; + } + } + + if ( $has_list_ancestor ) { + $processor->add_class( 'nested-list' ); + } + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-42/T07-nested-lists/trial-2/execution.json b/doc-experiment/results/round-42/T07-nested-lists/trial-2/execution.json new file mode 100644 index 0000000000000..c9ae7e8f0a093 --- /dev/null +++ b/doc-experiment/results/round-42/T07-nested-lists/trial-2/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T07-nested-lists/trial-2/candidate.php", + "function": "mark_nested_lists", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple-ol-inside-ul", + "status": "pass", + "expected": "
            • One
              1. Nested
            ", + "actual": "
            • One
              1. Nested
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "top-level-lists-untouched", + "status": "pass", + "expected": "
            1. Top
            • Also top
            ", + "actual": "
            1. Top
            • Also top
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ul-inside-ol", + "status": "pass", + "expected": "
            1. One
              • Nested
            ", + "actual": "
            1. One
              • Nested
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "deep-descendant", + "status": "pass", + "expected": "
              1. Deep
            ", + "actual": "
              1. Deep
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class-preserved", + "status": "pass", + "expected": "
              1. Nested
            ", + "actual": "
              1. Nested
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-nested-levels", + "status": "pass", + "expected": "
            • A
              1. B
                • C
            ", + "actual": "
            • A
              1. B
                • C
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "mixed-document", + "status": "pass", + "expected": "

            intro

            • A
              1. B
            1. C
            ", + "actual": "

            intro

            • A
              1. B
            1. C
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T07-nested-lists/trial-2/response.json b/doc-experiment/results/round-42/T07-nested-lists/trial-2/response.json new file mode 100644 index 0000000000000..3c56d48628593 --- /dev/null +++ b/doc-experiment/results/round-42/T07-nested-lists/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because ancestor-aware structure is required. It scans list openers with `next_tag()`, checks each matched tag\u2019s `get_breadcrumbs()` for any earlier `UL` or `OL` ancestor, adds the `nested-list` class with `add_class()`, and returns the byte-preserving edited fragment with `get_updated_html()`.", + "confidence": 90 +} diff --git a/doc-experiment/results/round-42/T07-nested-lists/trial-3/candidate.php b/doc-experiment/results/round-42/T07-nested-lists/trial-3/candidate.php new file mode 100644 index 0000000000000..ff43fbf15486a --- /dev/null +++ b/doc-experiment/results/round-42/T07-nested-lists/trial-3/candidate.php @@ -0,0 +1,62 @@ +next_tag() ) { + $tag = $scanner->get_tag(); + if ( 'UL' !== $tag && 'OL' !== $tag ) { + continue; + } + + $breadcrumbs = $scanner->get_breadcrumbs(); + $is_nested = false; + $last_index = count( $breadcrumbs ) - 1; + + for ( $i = 0; $i < $last_index; $i++ ) { + if ( 'UL' === $breadcrumbs[ $i ] || 'OL' === $breadcrumbs[ $i ] ) { + $is_nested = true; + break; + } + } + } + + if ( $scanner->paused_at_incomplete_token() || null !== $scanner->get_last_error() ) { + return $html; + } + + $processor = WP_HTML_Processor::create_fragment( $html ); + if ( null === $processor ) { + return $html; + } + + while ( $processor->next_tag() ) { + $tag = $processor->get_tag(); + if ( 'UL' !== $tag && 'OL' !== $tag ) { + continue; + } + + $breadcrumbs = $processor->get_breadcrumbs(); + $is_nested = false; + $last_index = count( $breadcrumbs ) - 1; + + for ( $i = 0; $i < $last_index; $i++ ) { + if ( 'UL' === $breadcrumbs[ $i ] || 'OL' === $breadcrumbs[ $i ] ) { + $is_nested = true; + break; + } + } + + if ( $is_nested ) { + $processor->add_class( 'nested-list' ); + } + } + + if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) { + return $html; + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-42/T07-nested-lists/trial-3/execution.json b/doc-experiment/results/round-42/T07-nested-lists/trial-3/execution.json new file mode 100644 index 0000000000000..b7b9c8247b336 --- /dev/null +++ b/doc-experiment/results/round-42/T07-nested-lists/trial-3/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T07-nested-lists/trial-3/candidate.php", + "function": "mark_nested_lists", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple-ol-inside-ul", + "status": "pass", + "expected": "
            • One
              1. Nested
            ", + "actual": "
            • One
              1. Nested
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "top-level-lists-untouched", + "status": "pass", + "expected": "
            1. Top
            • Also top
            ", + "actual": "
            1. Top
            • Also top
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ul-inside-ol", + "status": "pass", + "expected": "
            1. One
              • Nested
            ", + "actual": "
            1. One
              • Nested
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "deep-descendant", + "status": "pass", + "expected": "
              1. Deep
            ", + "actual": "
              1. Deep
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class-preserved", + "status": "pass", + "expected": "
              1. Nested
            ", + "actual": "
              1. Nested
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-nested-levels", + "status": "pass", + "expected": "
            • A
              1. B
                • C
            ", + "actual": "
            • A
              1. B
                • C
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "mixed-document", + "status": "pass", + "expected": "

            intro

            • A
              1. B
            1. C
            ", + "actual": "

            intro

            • A
              1. B
            1. C
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T07-nested-lists/trial-3/response.json b/doc-experiment/results/round-42/T07-nested-lists/trial-3/response.json new file mode 100644 index 0000000000000..829cbe7bf2bea --- /dev/null +++ b/doc-experiment/results/round-42/T07-nested-lists/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on ancestor structure. It scans tags with `next_tag()`, identifies `UL`/`OL` elements, checks their ancestry with `get_breadcrumbs()`, and adds the class with `add_class()`. To avoid partial edits on truncated or unsupported HTML, it first performs a validation scan and returns the original fragment unchanged if `paused_at_incomplete_token()` or `get_last_error()` indicates the structure scan was not clean; otherwise it performs the same scan again and returns the byte-preserving result via `get_updated_html()`.", + "confidence": 89 +} diff --git a/doc-experiment/results/round-42/T08-table-extract/judge.json b/doc-experiment/results/round-42/T08-table-extract/judge.json new file mode 100644 index 0000000000000..ac4740e440f9f --- /dev/null +++ b/doc-experiment/results/round-42/T08-table-extract/judge.json @@ -0,0 +1,45 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 95, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), next_tag(), next_token(), get_current_depth(), get_token_type(), get_tag(), is_tag_closer(), get_modifiable_text(), and get_last_error(), all documented. The solution follows the documented single-cursor, depth-bounded token walk and relies on virtual closers for omitted table markup. Minor near-miss: it also appends SCRIPT/STYLE/TEXTAREA/TITLE opener modifiable text inside cells, even though the docs' ordinary subtree-text recipe says to append only #text unless the caller explicitly wants special-element contents. It also does not check paused_at_incomplete_token()." + }, + { + "trial_id": "trial-2", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Used the correct HTML Processor and only documented APIs, including paused_at_incomplete_token(). It follows the documented single next_token() loop with explicit row/cell state and depth boundary, and handles decoded #text correctly. Minor near-miss: it includes #cdata-section and special-element opener text in cell output, which is broader than the ordinary DOM-style subtree-text recipe unless the caller explicitly asks for those token types." + }, + { + "trial_id": "trial-3", + "adherence": 95, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor and only documented APIs, with a clean depth-bounded token walk and explicit row/cell state. It handles decoded text, empty cells, omitted closers, and first-table scoping well. Minor near-miss: like trial 1, it appends special-element opener modifiable text inside cells and does not check paused_at_incomplete_token()." + } + ], + "failure_analysis": "No hidden case failed: all three trials passed 8/8, with no _doing_it_wrong or trigger_error records. The docs did well on the main risk areas for this task: they clearly directed structural work to WP_HTML_Processor rather than WP_HTML_Tag_Processor; create_fragment() was visible for body fragments; next_token() documented the one-cursor rule and recommended one loop with state for repeated regions; get_current_depth() documented the >= boundary rule and virtual closers; and get_modifiable_text() documented decoded #text semantics, which prevented double-decoding of entities. The main near-miss was special-element text. All trials added SCRIPT/STYLE/TEXTAREA/TITLE opener text to cell contents, while the reference and the ordinary subtree-text recipe append only #text tokens. This likely comes from the get_modifiable_text() documentation being broad and memorable: it correctly says special elements carry modifiable text, but implementers may over-apply that fact when asked for generic text extraction. Trial 2 was slightly stronger on incomplete-token hygiene because it checked paused_at_incomplete_token(), though the frozen cases did not exercise that difference.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_modifiable_text() and the 'Recipe: collect DOM-style text from a subtree' section", + "problem": "The docs explain that special elements carry modifiable text, but the boundary between ordinary subtree text and opt-in special-element data is still easy to over-apply. All trials included SCRIPT/STYLE/TEXTAREA/TITLE text for a generic text-extraction task.", + "suggestion": "Add a short warning and compact example in the method doc: for ordinary element text extraction, first filter to #text tokens; do not append every token with modifiable text. Show special-element handling as a separate opt-in policy." + }, + { + "location": "WP_HTML_Processor::next_token() repeated-region guidance", + "problem": "The DT example teaches the one-loop state-machine pattern, but table-like repeated structures with virtual/implied row and cell closers are a common harder case.", + "suggestion": "Add a general example for collecting repeated child regions in structured HTML, emphasizing opener/closer state, virtual closers, and a depth-bound guard without embedding any task-specific solution." + }, + { + "location": "HTML Processor method index / inherited public methods", + "problem": "paused_at_incomplete_token() is referenced from HTML Processor recipes but appears only in the Tag Processor docs, which can make it look less official on WP_HTML_Processor instances.", + "suggestion": "List inherited public methods used by HTML Processor recipes, or add an 'Inherited from WP_HTML_Tag_Processor' subsection with direct links for paused_at_incomplete_token(), get_modifiable_text(), and related token APIs." + }, + { + "location": "WP_HTML_Processor::create_fragment() / fragment-context documentation", + "problem": "The docs mention that fragment context changes table parsing, but the public factory currently supports only BODY context. This is easy to miss when handling snippets that may be table internals.", + "suggestion": "Clarify the current practical contract: body fragments containing full TABLE markup are parsed structurally, but isolated table-internal fragments need the appropriate ancestor markup until broader context support exists." + } + ] +} diff --git a/doc-experiment/results/round-42/T08-table-extract/trial-1/candidate.php b/doc-experiment/results/round-42/T08-table-extract/trial-1/candidate.php new file mode 100644 index 0000000000000..473d0d86a2d24 --- /dev/null +++ b/doc-experiment/results/round-42/T08-table-extract/trial-1/candidate.php @@ -0,0 +1,71 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = null; + + while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) { + if ( '#text' === $processor->get_token_type() ) { + if ( null !== $current_cell ) { + $current_cell .= $processor->get_modifiable_text(); + } + continue; + } + + if ( '#tag' !== $processor->get_token_type() ) { + continue; + } + + $tag_name = $processor->get_tag(); + + if ( $processor->is_tag_closer() ) { + if ( ( 'TD' === $tag_name || 'TH' === $tag_name ) && null !== $current_cell ) { + $current_row[] = $current_cell; + $current_cell = null; + } elseif ( 'TR' === $tag_name && null !== $current_row ) { + $rows[] = $current_row; + $current_row = null; + } + + continue; + } + + if ( 'TR' === $tag_name ) { + $current_row = array(); + continue; + } + + if ( ( 'TD' === $tag_name || 'TH' === $tag_name ) && null !== $current_row ) { + $current_cell = ''; + continue; + } + + if ( null !== $current_cell && in_array( $tag_name, array( 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE' ), true ) ) { + $current_cell .= $processor->get_modifiable_text(); + } + } + + if ( null !== $current_cell && null !== $current_row ) { + $current_row[] = $current_cell; + } + + if ( null !== $current_row ) { + $rows[] = $current_row; + } + + if ( null !== $processor->get_last_error() ) { + return array(); + } + + return $rows; +} diff --git a/doc-experiment/results/round-42/T08-table-extract/trial-1/execution.json b/doc-experiment/results/round-42/T08-table-extract/trial-1/execution.json new file mode 100644 index 0000000000000..ea0393fa4ed8a --- /dev/null +++ b/doc-experiment/results/round-42/T08-table-extract/trial-1/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T08-table-extract/trial-1/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T08-table-extract/trial-1/response.json b/doc-experiment/results/round-42/T08-table-extract/trial-1/response.json new file mode 100644 index 0000000000000..9d9e5fbfc41ed --- /dev/null +++ b/doc-experiment/results/round-42/T08-table-extract/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the HTML fragment with browser-like table handling, finds the first `TABLE` with `next_tag()`, then walks its subtree with `next_token()` and `get_current_depth()`. It collects row boundaries from `TR` open/close tags, cell boundaries from `TD`/`TH`, appends decoded ordinary text from `#text` tokens via `get_modifiable_text()`, and also includes documented special-element text carried on `SCRIPT`, `STYLE`, `TEXTAREA`, and `TITLE` opener tokens.", + "confidence": 87 +} diff --git a/doc-experiment/results/round-42/T08-table-extract/trial-2/candidate.php b/doc-experiment/results/round-42/T08-table-extract/trial-2/candidate.php new file mode 100644 index 0000000000000..c126e5ebe47b6 --- /dev/null +++ b/doc-experiment/results/round-42/T08-table-extract/trial-2/candidate.php @@ -0,0 +1,85 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + if ( null !== $processor->get_last_error() ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = ''; + $inside_cell = false; + $closed_table = false; + + while ( $processor->next_token() ) { + if ( $processor->get_current_depth() < $table_depth ) { + $closed_table = true; + break; + } + + $token_type = $processor->get_token_type(); + + if ( '#tag' === $token_type ) { + $tag_name = $processor->get_tag(); + + if ( $processor->is_tag_closer() ) { + if ( ( 'TD' === $tag_name || 'TH' === $tag_name ) && $inside_cell ) { + if ( null === $current_row ) { + $current_row = array(); + } + $current_row[] = $current_cell; + $current_cell = ''; + $inside_cell = false; + } elseif ( 'TR' === $tag_name && null !== $current_row ) { + $rows[] = $current_row; + $current_row = null; + } + + continue; + } + + if ( 'TR' === $tag_name ) { + $current_row = array(); + continue; + } + + if ( 'TD' === $tag_name || 'TH' === $tag_name ) { + if ( null === $current_row ) { + $current_row = array(); + } + $current_cell = ''; + $inside_cell = true; + continue; + } + + if ( + $inside_cell && + ( 'SCRIPT' === $tag_name || 'STYLE' === $tag_name || 'TEXTAREA' === $tag_name || 'TITLE' === $tag_name ) + ) { + $current_cell .= $processor->get_modifiable_text(); + } + + continue; + } + + if ( $inside_cell && ( '#text' === $token_type || '#cdata-section' === $token_type ) ) { + $current_cell .= $processor->get_modifiable_text(); + } + } + + if ( ! $closed_table && ( null !== $processor->get_last_error() || $processor->paused_at_incomplete_token() ) ) { + return array(); + } + + return $rows; +} diff --git a/doc-experiment/results/round-42/T08-table-extract/trial-2/execution.json b/doc-experiment/results/round-42/T08-table-extract/trial-2/execution.json new file mode 100644 index 0000000000000..5d75610c06cff --- /dev/null +++ b/doc-experiment/results/round-42/T08-table-extract/trial-2/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T08-table-extract/trial-2/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T08-table-extract/trial-2/response.json b/doc-experiment/results/round-42/T08-table-extract/trial-2/response.json new file mode 100644 index 0000000000000..4ef88da4d2724 --- /dev/null +++ b/doc-experiment/results/round-42/T08-table-extract/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` so the fragment is parsed with browser-like table structure, then finds the first `TABLE` with `next_tag()`. It walks the table in one `next_token()` pass, bounded by `get_current_depth()`, opens and closes rows/cells using `get_tag()` and `is_tag_closer()`, and concatenates cell text from `#text`/`#cdata-section` tokens plus special text-carrying element openers via `get_modifiable_text()`. It also rejects unsupported or truncated parses with `get_last_error()` and `paused_at_incomplete_token()`.", + "confidence": 88 +} diff --git a/doc-experiment/results/round-42/T08-table-extract/trial-3/candidate.php b/doc-experiment/results/round-42/T08-table-extract/trial-3/candidate.php new file mode 100644 index 0000000000000..d76277dd42f11 --- /dev/null +++ b/doc-experiment/results/round-42/T08-table-extract/trial-3/candidate.php @@ -0,0 +1,91 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = ''; + $in_cell = false; + + while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) { + $token_type = $processor->get_token_type(); + + if ( '#text' === $token_type ) { + if ( $in_cell ) { + $current_cell .= $processor->get_modifiable_text(); + } + continue; + } + + if ( '#tag' !== $token_type ) { + continue; + } + + $tag = $processor->get_tag(); + if ( null === $tag ) { + continue; + } + + if ( $processor->is_tag_closer() ) { + if ( 'TD' === $tag || 'TH' === $tag ) { + if ( null === $current_row ) { + $current_row = array(); + } + $current_row[] = $current_cell; + $current_cell = ''; + $in_cell = false; + } elseif ( 'TR' === $tag ) { + if ( null !== $current_row ) { + $rows[] = $current_row; + $current_row = null; + } + } + + continue; + } + + if ( 'TR' === $tag ) { + $current_row = array(); + continue; + } + + if ( 'TD' === $tag || 'TH' === $tag ) { + if ( null === $current_row ) { + $current_row = array(); + } + $current_cell = ''; + $in_cell = true; + continue; + } + + if ( $in_cell && ( 'SCRIPT' === $tag || 'STYLE' === $tag || 'TEXTAREA' === $tag || 'TITLE' === $tag ) ) { + $current_cell .= $processor->get_modifiable_text(); + } + } + + if ( null !== $processor->get_last_error() ) { + return array(); + } + + if ( $in_cell ) { + if ( null === $current_row ) { + $current_row = array(); + } + $current_row[] = $current_cell; + } + + if ( null !== $current_row ) { + $rows[] = $current_row; + } + + return $rows; +} diff --git a/doc-experiment/results/round-42/T08-table-extract/trial-3/execution.json b/doc-experiment/results/round-42/T08-table-extract/trial-3/execution.json new file mode 100644 index 0000000000000..c27615107fef1 --- /dev/null +++ b/doc-experiment/results/round-42/T08-table-extract/trial-3/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T08-table-extract/trial-3/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T08-table-extract/trial-3/response.json b/doc-experiment/results/round-42/T08-table-extract/trial-3/response.json new file mode 100644 index 0000000000000..cd111228e3263 --- /dev/null +++ b/doc-experiment/results/round-42/T08-table-extract/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, finds the first `TABLE` with `next_tag()`, then walks that table subtree with a single `next_token()` loop bounded by `get_current_depth()`. It builds rows from `TR` open/close events, collects cell contents from `TD`/`TH`, appends decoded text from `#text` tokens via `get_modifiable_text()`, and also includes special-element text carried on opener tokens (`SCRIPT`, `STYLE`, `TEXTAREA`, `TITLE`) as documented by the HTML API.", + "confidence": 87 +} diff --git a/doc-experiment/results/round-42/T09-mark-keyword/judge.json b/doc-experiment/results/round-42/T09-mark-keyword/judge.json new file mode 100644 index 0000000000000..d5a5ee5ca686b --- /dev/null +++ b/doc-experiment/results/round-42/T09-mark-keyword/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), next_token(), get_token_type(), get_modifiable_text(), serialize_token(), and get_last_error(), all documented. This matches the documented token-rewrite pattern, checks only ordinary #text tokens, matches decoded text, serializes normalized tokens, and avoids comments, attributes, and special text-bearing elements." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same API shape as trial-1, with strpos() instead of str_contains(). Correct processor choice, no undocumented API calls, idiomatic token-by-token serialization, and correct decoded-text handling." + }, + { + "trial_id": "trial-3", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Uses the correct documented APIs and the right token-rewrite model. Minor deduction for the get_last_error() fallback to WP_HTML_Processor::normalize($html) after emitting rewritten output: normalize() is documented, but the docs warn that normalizing the original input after a rewrite discards emitted changes unless that is intentional. Hidden cases all pass." + } + ], + "failure_analysis": "No hidden cases failed across the three trials; each passed 8/8. The rendered docs did well on the central distinctions this task required: the processor-selection guidance says to use WP_HTML_Processor for normalized output and document-structure-aware text walking; the DOM-style text recipe says ordinary text is only #text tokens and warns not to treat every token with modifiable text as ordinary text; next_token() explicitly says SCRIPT, STYLE, TITLE, and TEXTAREA do not produce #text children; get_modifiable_text() states that #text is decoded; serialize_token() gives the token-by-token rewrite pattern and says this is where to emit extra markup around selected tokens. Near-misses were small: trial-3's error fallback shows the rewrite/fallback policy could be clearer, and trials 1/2 defensively checked for empty text even though the non-empty keyword makes that unnecessary.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::serialize_token() docs", + "problem": "The docs say callers may emit extra markup around tokens, but the concrete example only removes tokens. Wrapper insertion is an important general rewrite pattern and is easy to confuse with queued modifications plus get_updated_html().", + "suggestion": "Add a general example showing trusted literal wrapper markup emitted before and after serialize_token(), and state that the accumulated string is the output for token rewrites." + }, + { + "location": "WP_HTML_Processor::serialize_token() / get_last_error() post-loop guidance", + "problem": "The docs say to reject or fall back on get_last_error(), but do not spell out that falling back to normalize($html) after a rewrite intentionally drops all emitted transformations.", + "suggestion": "Clarify all-or-nothing rewrite policy: after parser abort, callers should return a contract-specific failure/unchanged value, or knowingly discard partial emitted changes; normalize($html) is not a way to preserve rewrite changes." + }, + { + "location": "WP_HTML_Tag_Processor::get_modifiable_text() and WP_HTML_Processor::serialize_token() cross-reference", + "problem": "The decoded read side and normalized output side are documented in separate method sections, so readers must infer the round trip: inspect decoded text, but output serialize_token() rather than rebuilding from decoded text.", + "suggestion": "Add a cross-reference note: use get_modifiable_text() for decoded comparisons or replacement decisions, and use serialize_token() for normalized serialization of the original token unless deliberately replacing the token text." + } + ] +} diff --git a/doc-experiment/results/round-42/T09-mark-keyword/trial-1/candidate.php b/doc-experiment/results/round-42/T09-mark-keyword/trial-1/candidate.php new file mode 100644 index 0000000000000..465ba5ee9b590 --- /dev/null +++ b/doc-experiment/results/round-42/T09-mark-keyword/trial-1/candidate.php @@ -0,0 +1,27 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text = $processor->get_modifiable_text(); + if ( '' !== $text && str_contains( $text, $keyword ) ) { + $output .= '' . $processor->serialize_token() . ''; + continue; + } + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + return $output; +} diff --git a/doc-experiment/results/round-42/T09-mark-keyword/trial-1/execution.json b/doc-experiment/results/round-42/T09-mark-keyword/trial-1/execution.json new file mode 100644 index 0000000000000..0537535d886ff --- /dev/null +++ b/doc-experiment/results/round-42/T09-mark-keyword/trial-1/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T09-mark-keyword/trial-1/candidate.php", + "function": "mark_keyword", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple-unclosed", + "status": "pass", + "expected": "

            hello world

            ", + "actual": "

            hello world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-text-nodes", + "status": "pass", + "expected": "

            alpha beta

            beta gamma

            delta

            ", + "actual": "

            alpha beta

            beta gamma

            delta

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-attribute-not-wrapped", + "status": "pass", + "expected": "somewhere world", + "actual": "somewhere world", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-encoded-keyword-matches", + "status": "pass", + "expected": "

            world peace

            ", + "actual": "

            world peace

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "split-across-elements-no-match", + "status": "pass", + "expected": "

            world

            ", + "actual": "

            world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-comment-not-wrapped", + "status": "pass", + "expected": "

            world

            ", + "actual": "

            world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive", + "status": "pass", + "expected": "

            World world

            ", + "actual": "

            World world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "normalization-side-effects", + "status": "pass", + "expected": "
            bold world

            unclosed & markup

            ", + "actual": "
            bold world

            unclosed & markup

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T09-mark-keyword/trial-1/response.json b/doc-experiment/results/round-42/T09-mark-keyword/trial-1/response.json new file mode 100644 index 0000000000000..06a4a940ba228 --- /dev/null +++ b/doc-experiment/results/round-42/T09-mark-keyword/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment, then walks tokens with `next_token()` and rebuilds normalized HTML with `serialize_token()`. Only ordinary `#text` tokens are considered for wrapping, and their decoded text is read via `get_modifiable_text()` so character references match the keyword correctly; special text-bearing elements like `script`, `style`, `title`, and `textarea` are not wrapped because their text is carried on the element token, not as `#text` children.", + "confidence": 74 +} diff --git a/doc-experiment/results/round-42/T09-mark-keyword/trial-2/candidate.php b/doc-experiment/results/round-42/T09-mark-keyword/trial-2/candidate.php new file mode 100644 index 0000000000000..9090b2a0e0b37 --- /dev/null +++ b/doc-experiment/results/round-42/T09-mark-keyword/trial-2/candidate.php @@ -0,0 +1,27 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text = $processor->get_modifiable_text(); + if ( '' !== $text && false !== strpos( $text, $keyword ) ) { + $output .= '' . $processor->serialize_token() . ''; + continue; + } + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + return $output; +} diff --git a/doc-experiment/results/round-42/T09-mark-keyword/trial-2/execution.json b/doc-experiment/results/round-42/T09-mark-keyword/trial-2/execution.json new file mode 100644 index 0000000000000..17e9a5193c9a1 --- /dev/null +++ b/doc-experiment/results/round-42/T09-mark-keyword/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T09-mark-keyword/trial-2/candidate.php", + "function": "mark_keyword", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple-unclosed", + "status": "pass", + "expected": "

            hello world

            ", + "actual": "

            hello world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-text-nodes", + "status": "pass", + "expected": "

            alpha beta

            beta gamma

            delta

            ", + "actual": "

            alpha beta

            beta gamma

            delta

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-attribute-not-wrapped", + "status": "pass", + "expected": "somewhere world", + "actual": "somewhere world", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-encoded-keyword-matches", + "status": "pass", + "expected": "

            world peace

            ", + "actual": "

            world peace

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "split-across-elements-no-match", + "status": "pass", + "expected": "

            world

            ", + "actual": "

            world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-comment-not-wrapped", + "status": "pass", + "expected": "

            world

            ", + "actual": "

            world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive", + "status": "pass", + "expected": "

            World world

            ", + "actual": "

            World world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "normalization-side-effects", + "status": "pass", + "expected": "
            bold world

            unclosed & markup

            ", + "actual": "
            bold world

            unclosed & markup

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T09-mark-keyword/trial-2/response.json b/doc-experiment/results/round-42/T09-mark-keyword/trial-2/response.json new file mode 100644 index 0000000000000..00356d3586b08 --- /dev/null +++ b/doc-experiment/results/round-42/T09-mark-keyword/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This walks the fragment with `WP_HTML_Processor::create_fragment()` and `next_token()`, rebuilding normalized HTML token by token with `serialize_token()`. When the current token is an ordinary `#text` node, it checks the decoded text via `get_modifiable_text()` and wraps that token\u2019s normalized serialization in `` if it contains the case-sensitive keyword; special text-bearing elements are naturally excluded because their contents are not exposed as `#text` child tokens by the documented API.", + "confidence": 86 +} diff --git a/doc-experiment/results/round-42/T09-mark-keyword/trial-3/candidate.php b/doc-experiment/results/round-42/T09-mark-keyword/trial-3/candidate.php new file mode 100644 index 0000000000000..dece441332973 --- /dev/null +++ b/doc-experiment/results/round-42/T09-mark-keyword/trial-3/candidate.php @@ -0,0 +1,27 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() && str_contains( $processor->get_modifiable_text(), $keyword ) ) { + $output .= '' . $processor->serialize_token() . ''; + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + $normalized = WP_HTML_Processor::normalize( $html ); + return null === $normalized ? '' : $normalized; + } + + return $output; +} diff --git a/doc-experiment/results/round-42/T09-mark-keyword/trial-3/execution.json b/doc-experiment/results/round-42/T09-mark-keyword/trial-3/execution.json new file mode 100644 index 0000000000000..9e47621789c73 --- /dev/null +++ b/doc-experiment/results/round-42/T09-mark-keyword/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T09-mark-keyword/trial-3/candidate.php", + "function": "mark_keyword", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple-unclosed", + "status": "pass", + "expected": "

            hello world

            ", + "actual": "

            hello world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-text-nodes", + "status": "pass", + "expected": "

            alpha beta

            beta gamma

            delta

            ", + "actual": "

            alpha beta

            beta gamma

            delta

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-attribute-not-wrapped", + "status": "pass", + "expected": "somewhere world", + "actual": "somewhere world", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-encoded-keyword-matches", + "status": "pass", + "expected": "

            world peace

            ", + "actual": "

            world peace

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "split-across-elements-no-match", + "status": "pass", + "expected": "

            world

            ", + "actual": "

            world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-comment-not-wrapped", + "status": "pass", + "expected": "

            world

            ", + "actual": "

            world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive", + "status": "pass", + "expected": "

            World world

            ", + "actual": "

            World world

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "normalization-side-effects", + "status": "pass", + "expected": "
            bold world

            unclosed & markup

            ", + "actual": "
            bold world

            unclosed & markup

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T09-mark-keyword/trial-3/response.json b/doc-experiment/results/round-42/T09-mark-keyword/trial-3/response.json new file mode 100644 index 0000000000000..34b690991caf2 --- /dev/null +++ b/doc-experiment/results/round-42/T09-mark-keyword/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-context fragment structurally, walks every token with `next_token()`, checks only ordinary `#text` tokens via `get_token_type()` and decoded text via `get_modifiable_text()`, and rewrites the normalized output token-by-token with `serialize_token()`, inserting `` wrappers around matching text nodes so special-element text, comments, and attributes are left untouched.", + "confidence": 85 +} diff --git a/doc-experiment/results/round-42/T10-last-h2/judge.json b/doc-experiment/results/round-42/T10-last-h2/judge.json new file mode 100644 index 0000000000000..8a5a02def8aa5 --- /dev/null +++ b/doc-experiment/results/round-42/T10-last-h2/judge.json @@ -0,0 +1,45 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the correct `WP_HTML_Tag_Processor` for a flat position-based class edit. Every called method is present in the rendered docs: constructor, `next_tag`, `set_bookmark`, `seek`, `add_class`, `release_bookmark`, and `get_updated_html`. The repeated single bookmark is exactly the documented last-seen pattern, and execution passed 6/6 with no `_doing_it_wrong` records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation pattern as trial-1. Correct processor, documented API only, idiomatic token walk plus moving bookmark, guarded seek, documented bookmark release, and `get_updated_html` for output. Passed all hidden cases with no misuse records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used `WP_HTML_Tag_Processor`, `next_tag( 'H2' )`, a moving bookmark, `has_bookmark()` to guard `seek()`, `add_class()`, `release_bookmark()`, and `get_updated_html()`. All methods are documented in the supplied markdown. Passed all hidden cases with no `_doing_it_wrong` records." + } + ], + "failure_analysis": "No failed hidden cases occurred in any trial. The docs did well on the decisive concepts: the Tag Processor overview says it is for flat, position-based tag/class edits with byte-preserving output; `next_tag()` documents real tag matching and comment/raw-text non-matching; `set_bookmark()` explicitly describes re-setting one bookmark to remember the last matching tag; `add_class()` explains appending to existing classes; and `get_updated_html()` is clearly identified as the way to retrieve edits. Near-misses were limited: none of the trials needed text decoding or attribute null/true/empty-string semantics, and none had to choose a policy for truncated trailing input. The docs mention incomplete-token pauses, but a future subject could still miss the need to distinguish clean exhaustion from truncation when that matters.", + "doc_gaps": [ + { + "location": "WP_HTML_Tag_Processor::seek() docblock / rendered `seek()` section", + "problem": "The rendered docs say `seek()` returns false if it cannot move, but they do not explicitly warn that seeking an unknown bookmark records `_doing_it_wrong`. This matters when absence of a bookmark is an expected branch, such as no matches found.", + "suggestion": "Document that callers should guard normal-absence cases with `has_bookmark()` or tracked successful `set_bookmark()` before calling `seek()`, because an unknown bookmark is API misuse and triggers `_doing_it_wrong`." + }, + { + "location": "WP_HTML_Tag_Processor::set_bookmark() docblock / bookmark examples", + "problem": "The docs state the last-seen bookmark idiom, but the main example is comparatively complex and mixes nesting, closers, resume bookmarks, and mutation. The simple post-scan guard pattern is easy to miss.", + "suggestion": "Add a short tag-neutral snippet showing one literal bookmark being re-set during a forward scan, checked after the loop, sought once, used, and released. Keep it generic rather than tied to a specific task outcome." + }, + { + "location": "WP_HTML_Tag_Processor::next_tag() and `paused_at_incomplete_token()` docblocks", + "problem": "The docs explain that `next_tag()` can return false for both no match and incomplete trailing syntax, but the scan-all pattern does not clearly state how that affects later edits to already-seen complete tags.", + "suggestion": "Add guidance that after draining a scan, callers that require a complete input must check `paused_at_incomplete_token()`, while callers doing best-effort edits may still use bookmarks pointing at complete tokens already visited." + }, + { + "location": "WP_HTML_Tag_Processor::add_class() docblock", + "problem": "The contract says existing classes are preserved and the new class is appended, but there is no compact before/after example demonstrating existing-class behavior and duplicate no-op behavior.", + "suggestion": "Add a small generic before/after example showing `add_class()` preserving existing class text order, appending a new class, and not adding a duplicate." + } + ] +} diff --git a/doc-experiment/results/round-42/T10-last-h2/trial-1/candidate.php b/doc-experiment/results/round-42/T10-last-h2/trial-1/candidate.php new file mode 100644 index 0000000000000..3d84be60774d6 --- /dev/null +++ b/doc-experiment/results/round-42/T10-last-h2/trial-1/candidate.php @@ -0,0 +1,22 @@ +next_tag( 'H2' ) ) { + $processor->set_bookmark( 'last-h2' ); + $found_h2 = true; + } + + if ( ! $found_h2 ) { + return $html; + } + + if ( $processor->seek( 'last-h2' ) ) { + $processor->add_class( 'final-section' ); + } + + $processor->release_bookmark( 'last-h2' ); + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-42/T10-last-h2/trial-1/execution.json b/doc-experiment/results/round-42/T10-last-h2/trial-1/execution.json new file mode 100644 index 0000000000000..8710dbd1c039d --- /dev/null +++ b/doc-experiment/results/round-42/T10-last-h2/trial-1/execution.json @@ -0,0 +1,62 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T10-last-h2/trial-1/candidate.php", + "function": "mark_last_h2", + "passed": 6, + "total": 6, + "cases": [ + { + "id": "two-headings", + "status": "pass", + "expected": "

            One

            a

            Two

            b

            ", + "actual": "

            One

            a

            Two

            b

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "single-heading", + "status": "pass", + "expected": "

            Only

            ", + "actual": "

            Only

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-headings-unchanged", + "status": "pass", + "expected": "

            nothing

            ", + "actual": "

            nothing

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "many-headings", + "status": "pass", + "expected": "

            1

            2

            3

            4

            5

            6

            7

            8

            9

            10

            11

            12

            ", + "actual": "

            1

            2

            3

            4

            5

            6

            7

            8

            9

            10

            11

            12

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comment-h2-not-counted", + "status": "pass", + "expected": "

            Real

            ", + "actual": "

            Real

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class", + "status": "pass", + "expected": "

            A

            B

            ", + "actual": "

            A

            B

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T10-last-h2/trial-1/response.json b/doc-experiment/results/round-42/T10-last-h2/trial-1/response.json new file mode 100644 index 0000000000000..2fc32f92601d4 --- /dev/null +++ b/doc-experiment/results/round-42/T10-last-h2/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat tag-position edit, not a tree-structure query. It scans forward with `next_tag( 'H2' )`, repeatedly moves a single bookmark with `set_bookmark()` to remember the last matching opener, then `seek()`s back to that bookmarked tag, adds the class with `add_class()`, and returns the result with `get_updated_html()`.", + "confidence": 97 +} diff --git a/doc-experiment/results/round-42/T10-last-h2/trial-2/candidate.php b/doc-experiment/results/round-42/T10-last-h2/trial-2/candidate.php new file mode 100644 index 0000000000000..3d84be60774d6 --- /dev/null +++ b/doc-experiment/results/round-42/T10-last-h2/trial-2/candidate.php @@ -0,0 +1,22 @@ +next_tag( 'H2' ) ) { + $processor->set_bookmark( 'last-h2' ); + $found_h2 = true; + } + + if ( ! $found_h2 ) { + return $html; + } + + if ( $processor->seek( 'last-h2' ) ) { + $processor->add_class( 'final-section' ); + } + + $processor->release_bookmark( 'last-h2' ); + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-42/T10-last-h2/trial-2/execution.json b/doc-experiment/results/round-42/T10-last-h2/trial-2/execution.json new file mode 100644 index 0000000000000..7406ad158a350 --- /dev/null +++ b/doc-experiment/results/round-42/T10-last-h2/trial-2/execution.json @@ -0,0 +1,62 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T10-last-h2/trial-2/candidate.php", + "function": "mark_last_h2", + "passed": 6, + "total": 6, + "cases": [ + { + "id": "two-headings", + "status": "pass", + "expected": "

            One

            a

            Two

            b

            ", + "actual": "

            One

            a

            Two

            b

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "single-heading", + "status": "pass", + "expected": "

            Only

            ", + "actual": "

            Only

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-headings-unchanged", + "status": "pass", + "expected": "

            nothing

            ", + "actual": "

            nothing

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "many-headings", + "status": "pass", + "expected": "

            1

            2

            3

            4

            5

            6

            7

            8

            9

            10

            11

            12

            ", + "actual": "

            1

            2

            3

            4

            5

            6

            7

            8

            9

            10

            11

            12

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comment-h2-not-counted", + "status": "pass", + "expected": "

            Real

            ", + "actual": "

            Real

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class", + "status": "pass", + "expected": "

            A

            B

            ", + "actual": "

            A

            B

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T10-last-h2/trial-2/response.json b/doc-experiment/results/round-42/T10-last-h2/trial-2/response.json new file mode 100644 index 0000000000000..0d53998d06680 --- /dev/null +++ b/doc-experiment/results/round-42/T10-last-h2/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat, position-based edit. It scans forward with `next_tag( 'H2' )`, reuses a single literal bookmark via `set_bookmark()` to remember the last matched `H2`, then `seek()`s back to that opener and adds the class with `add_class()`, finally returning the result with `get_updated_html()`.", + "confidence": 96 +} diff --git a/doc-experiment/results/round-42/T10-last-h2/trial-3/candidate.php b/doc-experiment/results/round-42/T10-last-h2/trial-3/candidate.php new file mode 100644 index 0000000000000..f9c9cd1d41b97 --- /dev/null +++ b/doc-experiment/results/round-42/T10-last-h2/trial-3/candidate.php @@ -0,0 +1,20 @@ +next_tag( 'H2' ) ) { + $processor->set_bookmark( 'last-h2' ); + } + + if ( ! $processor->has_bookmark( 'last-h2' ) ) { + return $html; + } + + if ( $processor->seek( 'last-h2' ) ) { + $processor->add_class( 'final-section' ); + } + + $processor->release_bookmark( 'last-h2' ); + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-42/T10-last-h2/trial-3/execution.json b/doc-experiment/results/round-42/T10-last-h2/trial-3/execution.json new file mode 100644 index 0000000000000..ab023281c6b90 --- /dev/null +++ b/doc-experiment/results/round-42/T10-last-h2/trial-3/execution.json @@ -0,0 +1,62 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T10-last-h2/trial-3/candidate.php", + "function": "mark_last_h2", + "passed": 6, + "total": 6, + "cases": [ + { + "id": "two-headings", + "status": "pass", + "expected": "

            One

            a

            Two

            b

            ", + "actual": "

            One

            a

            Two

            b

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "single-heading", + "status": "pass", + "expected": "

            Only

            ", + "actual": "

            Only

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-headings-unchanged", + "status": "pass", + "expected": "

            nothing

            ", + "actual": "

            nothing

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "many-headings", + "status": "pass", + "expected": "

            1

            2

            3

            4

            5

            6

            7

            8

            9

            10

            11

            12

            ", + "actual": "

            1

            2

            3

            4

            5

            6

            7

            8

            9

            10

            11

            12

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comment-h2-not-counted", + "status": "pass", + "expected": "

            Real

            ", + "actual": "

            Real

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class", + "status": "pass", + "expected": "

            A

            B

            ", + "actual": "

            A

            B

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T10-last-h2/trial-3/response.json b/doc-experiment/results/round-42/T10-last-h2/trial-3/response.json new file mode 100644 index 0000000000000..a3f1a6c084347 --- /dev/null +++ b/doc-experiment/results/round-42/T10-last-h2/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat scan of the markup, since the task only needs the last `H2` opener in document order. It repeatedly calls `next_tag( 'H2' )`, moves a single bookmark with `set_bookmark()` to remember the most recent match, then `seek()`s back to that bookmarked tag and applies `add_class( 'final-section' )` before returning the result with `get_updated_html()`.", + "confidence": 96 +} diff --git a/doc-experiment/results/round-42/T11-strip-tracking-attributes/judge.json b/doc-experiment/results/round-42/T11-strip-tracking-attributes/judge.json new file mode 100644 index 0000000000000..57b540fab53e4 --- /dev/null +++ b/doc-experiment/results/round-42/T11-strip-tracking-attributes/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the correct WP_HTML_Tag_Processor for a flat attribute rewrite; all called APIs are documented: constructor, next_tag(), get_attribute_names_with_prefix(), remove_attribute(), and get_updated_html(). The token-walking pattern and byte-preserving output method are idiomatic, and no _doing_it_wrong records appeared." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same correct implementation pattern as the reference. The response's case-insensitive prefix claim is supported by get_attribute_names_with_prefix() docs. It avoids structural HTML Processor features because no tree awareness is needed." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correct processor choice, documented method usage only, idiomatic while-next_tag loop, safe removal of matched attributes, and correct get_updated_html() return path. No misuse or undocumented API calls found." + } + ], + "failure_analysis": "All three trials passed all 7 hidden cases, so there were no failed hidden cases to diagnose. The rendered docs did well in three places: the Tag Processor overview explicitly says to use it for flat attribute/class edits and byte-precise preservation; the Usage section gives the construct -> next_tag() -> modify attributes pattern; and get_attribute_names_with_prefix() documents lowercase returned names plus case-insensitive matching, which led subjects to preserve data-track and data-tracker while removing only data-track-* attributes. Near-misses: remove_attribute() itself does not locally state that attribute-name matching is ASCII case-insensitive, so the uppercase-source-attribute case relied on connecting the prefix helper's lowercase result to removal behavior. Also, get_attribute_names_with_prefix() says null means no tag opener is matched, but does not explicitly contrast that with an empty array for a matched tag with no prefix matches; the candidates handled this naturally, but weaker implementations could misread null as the no-match-on-current-tag value.", + "doc_gaps": [ + { + "location": "html-tag-processor.md#get_attribute_names_with_prefix", + "problem": "The return contract does not explicitly distinguish a matched tag with no matching attributes from no currently matched tag.", + "suggestion": "State that the method returns an empty array when a tag opener is matched but no attributes match the prefix, and returns null only when no tag opener is currently matched." + }, + { + "location": "html-tag-processor.md#remove_attribute", + "problem": "The method doc does not locally explain case-insensitive attribute-name matching or that normalized lowercase names can be passed back to remove source-cased attributes.", + "suggestion": "Add a sentence that attribute names are matched ASCII case-insensitively, so names returned by get_attribute_names_with_prefix() are safe to pass to remove_attribute() even when the source used different casing." + }, + { + "location": "html-tag-processor.md attribute examples", + "problem": "The docs document prefix discovery and attribute removal separately, but do not show the general bulk-edit pattern of collecting attribute names from the current token before mutating it.", + "suggestion": "Add a generic example showing a next_tag() loop that gets a list of attribute names by prefix and then removes or updates each returned name, emphasizing that get_updated_html() preserves untouched bytes." + } + ] +} diff --git a/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-1/candidate.php b/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-1/candidate.php new file mode 100644 index 0000000000000..df0f95c384e15 --- /dev/null +++ b/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-1/candidate.php @@ -0,0 +1,18 @@ +next_tag() ) { + $attributes = $processor->get_attribute_names_with_prefix( 'data-track-' ); + + if ( null === $attributes ) { + continue; + } + + foreach ( $attributes as $attribute_name ) { + $processor->remove_attribute( $attribute_name ); + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-1/execution.json b/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-1/execution.json new file mode 100644 index 0000000000000..12d01a5f2cfc9 --- /dev/null +++ b/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-1/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-1/candidate.php", + "function": "strip_tracking_attributes", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "single-link", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-tags", + "status": "pass", + "expected": "

            Text

            ", + "actual": "

            Text

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-matching-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "similar-prefixes-kept", + "status": "pass", + "expected": "x", + "actual": "x", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-source-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comments-untouched", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-matches", + "status": "pass", + "expected": "

            Text

            ", + "actual": "

            Text

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-1/response.json b/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-1/response.json new file mode 100644 index 0000000000000..35a977a50b57e --- /dev/null +++ b/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat attribute rewrite on every tag. It scans all opening tags with `next_tag()`, finds matching attributes with the documented `get_attribute_names_with_prefix( 'data-track-' )`, removes each one via `remove_attribute()`, and returns the modified markup with `get_updated_html()`, preserving all untouched bytes.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-2/candidate.php b/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-2/candidate.php new file mode 100644 index 0000000000000..11042f4367401 --- /dev/null +++ b/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-2/candidate.php @@ -0,0 +1,18 @@ +next_tag() ) { + $attribute_names = $processor->get_attribute_names_with_prefix( 'data-track-' ); + + if ( null === $attribute_names ) { + continue; + } + + foreach ( $attribute_names as $attribute_name ) { + $processor->remove_attribute( $attribute_name ); + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-2/execution.json b/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-2/execution.json new file mode 100644 index 0000000000000..1b33393e8e05a --- /dev/null +++ b/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-2/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-2/candidate.php", + "function": "strip_tracking_attributes", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "single-link", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-tags", + "status": "pass", + "expected": "

            Text

            ", + "actual": "

            Text

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-matching-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "similar-prefixes-kept", + "status": "pass", + "expected": "x", + "actual": "x", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-source-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comments-untouched", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-matches", + "status": "pass", + "expected": "

            Text

            ", + "actual": "

            Text

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-2/response.json b/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-2/response.json new file mode 100644 index 0000000000000..7f07d0b7cc055 --- /dev/null +++ b/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat scan over every tag with `next_tag()`, collects matching attribute names via `get_attribute_names_with_prefix( 'data-track-' )`, removes each one with `remove_attribute()`, and returns the edited markup with `get_updated_html()`. That prefix match is case-insensitive and only targets names beginning with `data-track-`, so similar names like `data-track` and `data-tracker` are preserved.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-3/candidate.php b/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-3/candidate.php new file mode 100644 index 0000000000000..df0f95c384e15 --- /dev/null +++ b/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-3/candidate.php @@ -0,0 +1,18 @@ +next_tag() ) { + $attributes = $processor->get_attribute_names_with_prefix( 'data-track-' ); + + if ( null === $attributes ) { + continue; + } + + foreach ( $attributes as $attribute_name ) { + $processor->remove_attribute( $attribute_name ); + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-3/execution.json b/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-3/execution.json new file mode 100644 index 0000000000000..be3fb9c16e675 --- /dev/null +++ b/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-3/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-3/candidate.php", + "function": "strip_tracking_attributes", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "single-link", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-tags", + "status": "pass", + "expected": "

            Text

            ", + "actual": "

            Text

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-matching-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "similar-prefixes-kept", + "status": "pass", + "expected": "x", + "actual": "x", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-source-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comments-untouched", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-matches", + "status": "pass", + "expected": "

            Text

            ", + "actual": "

            Text

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-3/response.json b/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-3/response.json new file mode 100644 index 0000000000000..5997c0862fd7e --- /dev/null +++ b/doc-experiment/results/round-42/T11-strip-tracking-attributes/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat, byte-preserving attribute edit pass: it scans every tag with `next_tag()`, finds matching attributes via `get_attribute_names_with_prefix( 'data-track-' )`, removes each one with `remove_attribute()`, and returns the modified markup with `get_updated_html()`.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-42/T12-unwrap-spans/judge.json b/doc-experiment/results/round-42/T12-unwrap-spans/judge.json new file mode 100644 index 0000000000000..72d8f0177023f --- /dev/null +++ b/doc-experiment/results/round-42/T12-unwrap-spans/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Used the correct WP_HTML_Processor fragment parser, walked with next_token(), skipped SPAN tokens, and accumulated serialize_token() output. All called methods are documented. Minor deduction: the final get_last_error() fallback returns an empty string, which is a policy choice not specified by the task, though it follows the docs' warning not to trust output after unsupported markup." + }, + { + "trial_id": "trial-2", + "adherence": 94, + "hallucinated_methods": [], + "notes": "Same strong documented pattern as trial-1: create_fragment(), next_token(), get_tag(), serialize_token(), get_last_error(). Minor additional deduction because fallback to the original input on create/parse failure would not be normalized and may retain spans, so the edge policy is less aligned with the task contract." + }, + { + "trial_id": "trial-3", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Closest to the reference. It uses WP_HTML_Processor::create_fragment(), a token walk, explicit #tag filtering, get_tag(), and serialize_token(). All methods are documented, and there were no _doing_it_wrong records. The only small deduction is the unspecified empty-string fallback on parser error." + } + ], + "failure_analysis": "All trials passed all 7 frozen cases. The docs did well on the key concepts needed here: the processor-choice sections distinguish Tag Processor byte-preserving attribute edits from HTML Processor structure/normalization; next_token() documents that closers, implied closers, and end-of-input closers are visited; serialize_token() explicitly presents token-by-token rewriting where selected element tokens are skipped while contents remain; and create_fragment() says BODY-fragment parsing is the right default for rendered content fragments. Near-misses were around fallback policy: every trial added get_last_error() handling, but chose either '' or the original input. That did not affect these cases, but it reflects that the docs explain parser-abort detection better than they explain how a string-returning caller should choose a contract-specific fallback. There were no hallucinated API calls and no hidden-case failures to attribute to documentation gaps.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::serialize_token() docblock / rewrite examples", + "problem": "The docs say to reject or fall back on get_last_error(), but examples often return null. For callers whose contract requires string output, subjects guessed inconsistent fallbacks such as '' or the original unnormalized input.", + "suggestion": "Add a short contract note: after token-serialization rewrites, get_last_error() means the accumulated output may be partial; choose a caller-specific fallback, and returning original input preserves bytes but is not normalized and does not include token-loop edits." + }, + { + "location": "WP_HTML_Processor::get_tag() docblock", + "problem": "The method says it returns the matched tag name, but the important token-walking detail that it matches both openers and closers is only made obvious indirectly by the serialize_token() example.", + "suggestion": "State explicitly that during next_token() walks, get_tag() returns the element name for both opening and closing tag tokens, and null for non-tag tokens." + }, + { + "location": "WP_HTML_Processor::next_token() / serialize_token() incomplete-input notes", + "problem": "The docs discuss virtual closers and incomplete trailing syntax in separate places, which can leave readers unsure whether an unclosed element is an error or a normal virtual close.", + "suggestion": "Add a concise contrast: missing optional/end tags still produce structural closing tokens, while an incomplete syntax token is not visited and must be detected with paused_at_incomplete_token() when the caller requires complete source bytes." + } + ] +} diff --git a/doc-experiment/results/round-42/T12-unwrap-spans/trial-1/candidate.php b/doc-experiment/results/round-42/T12-unwrap-spans/trial-1/candidate.php new file mode 100644 index 0000000000000..6e7cc87e73eed --- /dev/null +++ b/doc-experiment/results/round-42/T12-unwrap-spans/trial-1/candidate.php @@ -0,0 +1,25 @@ +next_token() ) { + if ( 'SPAN' === $processor->get_tag() ) { + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + return $output; +} diff --git a/doc-experiment/results/round-42/T12-unwrap-spans/trial-1/execution.json b/doc-experiment/results/round-42/T12-unwrap-spans/trial-1/execution.json new file mode 100644 index 0000000000000..c9677bff06cbb --- /dev/null +++ b/doc-experiment/results/round-42/T12-unwrap-spans/trial-1/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T12-unwrap-spans/trial-1/candidate.php", + "function": "unwrap_spans", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

            a b c d

            ", + "actual": "

            a b c d

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-spans", + "status": "pass", + "expected": "

            outer inner tail

            ", + "actual": "

            outer inner tail

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-spans-normalized-passthrough", + "status": "pass", + "expected": "

            plain & simple

            ", + "actual": "

            plain & simple

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attributes-discarded", + "status": "pass", + "expected": "styled", + "actual": "styled", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "adjacent-spans", + "status": "pass", + "expected": "

            ab

            ", + "actual": "

            ab

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "span-with-block-content", + "status": "pass", + "expected": "
            before after
            ", + "actual": "
            before after
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-span", + "status": "pass", + "expected": "

            runs to end

            ", + "actual": "

            runs to end

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T12-unwrap-spans/trial-1/response.json b/doc-experiment/results/round-42/T12-unwrap-spans/trial-1/response.json new file mode 100644 index 0000000000000..fd95e95475af1 --- /dev/null +++ b/doc-experiment/results/round-42/T12-unwrap-spans/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment with full structural awareness, then walks every token with `next_token()`, skips both opening and closing `SPAN` tokens by checking `get_tag()`, and appends every other token\u2019s normalized serialization via `serialize_token()`. That produces normalized HTML while leaving span contents in place.", + "confidence": 88 +} diff --git a/doc-experiment/results/round-42/T12-unwrap-spans/trial-2/candidate.php b/doc-experiment/results/round-42/T12-unwrap-spans/trial-2/candidate.php new file mode 100644 index 0000000000000..c7a5d216e97d1 --- /dev/null +++ b/doc-experiment/results/round-42/T12-unwrap-spans/trial-2/candidate.php @@ -0,0 +1,22 @@ +next_token() ) { + if ( 'SPAN' === $processor->get_tag() ) { + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + return $output; +} diff --git a/doc-experiment/results/round-42/T12-unwrap-spans/trial-2/execution.json b/doc-experiment/results/round-42/T12-unwrap-spans/trial-2/execution.json new file mode 100644 index 0000000000000..a6afe93bbc908 --- /dev/null +++ b/doc-experiment/results/round-42/T12-unwrap-spans/trial-2/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T12-unwrap-spans/trial-2/candidate.php", + "function": "unwrap_spans", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

            a b c d

            ", + "actual": "

            a b c d

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-spans", + "status": "pass", + "expected": "

            outer inner tail

            ", + "actual": "

            outer inner tail

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-spans-normalized-passthrough", + "status": "pass", + "expected": "

            plain & simple

            ", + "actual": "

            plain & simple

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attributes-discarded", + "status": "pass", + "expected": "styled", + "actual": "styled", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "adjacent-spans", + "status": "pass", + "expected": "

            ab

            ", + "actual": "

            ab

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "span-with-block-content", + "status": "pass", + "expected": "
            before after
            ", + "actual": "
            before after
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-span", + "status": "pass", + "expected": "

            runs to end

            ", + "actual": "

            runs to end

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T12-unwrap-spans/trial-2/response.json b/doc-experiment/results/round-42/T12-unwrap-spans/trial-2/response.json new file mode 100644 index 0000000000000..f3bd0b7342aca --- /dev/null +++ b/doc-experiment/results/round-42/T12-unwrap-spans/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, walks every token with `next_token()`, skips any token whose tag is `SPAN` so both span openers and closers disappear while their child tokens still serialize, and appends all other tokens with `serialize_token()` to produce normalized HTML.", + "confidence": 87 +} diff --git a/doc-experiment/results/round-42/T12-unwrap-spans/trial-3/candidate.php b/doc-experiment/results/round-42/T12-unwrap-spans/trial-3/candidate.php new file mode 100644 index 0000000000000..29c7f2ee742e7 --- /dev/null +++ b/doc-experiment/results/round-42/T12-unwrap-spans/trial-3/candidate.php @@ -0,0 +1,25 @@ +next_token() ) { + if ( '#tag' === $processor->get_token_type() && 'SPAN' === $processor->get_tag() ) { + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + return $output; +} diff --git a/doc-experiment/results/round-42/T12-unwrap-spans/trial-3/execution.json b/doc-experiment/results/round-42/T12-unwrap-spans/trial-3/execution.json new file mode 100644 index 0000000000000..a2b446f684c60 --- /dev/null +++ b/doc-experiment/results/round-42/T12-unwrap-spans/trial-3/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-42/T12-unwrap-spans/trial-3/candidate.php", + "function": "unwrap_spans", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

            a b c d

            ", + "actual": "

            a b c d

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-spans", + "status": "pass", + "expected": "

            outer inner tail

            ", + "actual": "

            outer inner tail

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-spans-normalized-passthrough", + "status": "pass", + "expected": "

            plain & simple

            ", + "actual": "

            plain & simple

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attributes-discarded", + "status": "pass", + "expected": "styled", + "actual": "styled", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "adjacent-spans", + "status": "pass", + "expected": "

            ab

            ", + "actual": "

            ab

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "span-with-block-content", + "status": "pass", + "expected": "
            before after
            ", + "actual": "
            before after
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-span", + "status": "pass", + "expected": "

            runs to end

            ", + "actual": "

            runs to end

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-42/T12-unwrap-spans/trial-3/response.json b/doc-experiment/results/round-42/T12-unwrap-spans/trial-3/response.json new file mode 100644 index 0000000000000..17460d7aaf97d --- /dev/null +++ b/doc-experiment/results/round-42/T12-unwrap-spans/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the input as a BODY fragment, then walks every token with `next_token()`. It removes span wrappers by skipping any `#tag` token whose tag name is `SPAN` (both openers and closers) and appends every other token\u2019s normalized serialization with `serialize_token()`, producing normalized HTML while preserving the spans\u2019 contents in place.", + "confidence": 78 +} diff --git a/doc-experiment/results/round-42/codex-judges-output.json b/doc-experiment/results/round-42/codex-judges-output.json new file mode 100644 index 0000000000000..c13811ab8c63a --- /dev/null +++ b/doc-experiment/results/round-42/codex-judges-output.json @@ -0,0 +1,861 @@ +{ + "result": [ + { + "id": "H04-remove-empty-paragraphs", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 88, + "hallucinated_methods": [], + "notes": "Correctly chose WP_HTML_Processor::create_fragment(), used a single next_token() walk, documented structural calls, serialize_token() for most output, and checked both paused_at_incomplete_token() and get_last_error(). All API methods used are documented and execution recorded no _doing_it_wrong calls. Main adherence weakness: when a pending P proves non-empty it emits a literal

            instead of the stored serialize_token() result, so the implementation is not fully following the documented token-serialization pattern and would drop attributes in broader cases." + }, + { + "trial_id": "trial-2", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Strong adherence. It uses the HTML Processor, buffers the serialized opener with serialize_token(), walks tokens once, identifies the closing P with documented is_tag_closer() and get_current_depth() semantics, and falls back on incomplete or unsupported input. No undocumented API calls or _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Strong adherence. It uses the HTML Processor, next_token(), serialize_token(), documented token/type/depth APIs, and the correct incomplete/error checks. The paragraph stack is more complex than necessary for HTML P parsing, but it remains within documented token-walking patterns and did not misuse the API." + } + ], + "failure_analysis": "All trials passed all 11 frozen cases, with no _doing_it_wrong records. The docs appear to have succeeded on the major points: the processor-choice guidance clearly directs structure-sensitive and normalized-output work to WP_HTML_Processor; the rewrite recipe for serialize_token() maps directly to dropping selected tokens while concatenating the rest; get_current_depth() explains closer-depth semantics well enough for the candidates to handle implicit paragraph closes; and the incomplete/error guidance led all trials to return the original input for truncated or unsupported markup. The main near-miss was trial-1's hand-built

            emission after delaying a paragraph opener. That passed because the tests used un-attributed paragraphs, but a broader case with attributes would lose normalized opener details. This suggests the serialization docs are good but could be more explicit about storing serialized tokens when emission is deferred.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::serialize_token() docs and rewrite recipe", + "problem": "The docs say token-by-token rewriting can skip or emit tokens, but they do not explicitly warn that delayed emission should keep the exact serialize_token() result. A model hand-emitted

            , which would drop attributes and other normalized opener details.", + "suggestion": "Add a short note and example: when buffering a token for possible later output, store `$serialized = $processor->serialize_token()` and emit that string later; do not reconstruct the tag name manually unless intentionally creating new markup." + }, + { + "location": "WP_HTML_Processor::get_current_depth() / is_tag_closer() docs", + "problem": "The closer-depth explanation is strong, but readers still have to derive the common predicate for identifying the closing token corresponding to a previously recorded opener.", + "suggestion": "Add a compact recipe for matching an element's own closer after recording opener depth: same tag name, is_tag_closer(), and depth below the opener depth, with a note that child closers can report the opener depth and must not end the subtree walk." + }, + { + "location": "WP_HTML_Processor overview or rewrite recipe", + "problem": "The docs discuss rejecting incomplete or unsupported input after a rewrite, but examples often return null rather than showing the common all-or-nothing filter policy of returning the original HTML unchanged.", + "suggestion": "Add a generic all-or-nothing rewrite skeleton that accumulates serialize_token() output and then returns the original input when paused_at_incomplete_token() is true or get_last_error() is non-null." + }, + { + "location": "WP_HTML_Processor::get_namespace() and tag-matching examples", + "problem": "The reference implementation guards P matching with get_namespace(), but the candidates matched only get_tag(). The docs list get_namespace(), yet examples of semantic tag matching rarely show a namespace guard.", + "suggestion": "In examples that transform HTML element semantics by tag name, include `html === $processor->get_namespace()` or a note explaining when tag-name checks should also verify namespace, especially around SVG and MathML content." + } + ] + } + }, + { + "id": "N01-remove-external-class", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the correct WP_HTML_Tag_Processor for a flat class edit. All called APIs and query keys are documented: constructor/new usage, next_tag(), tag_name, class_name, remove_class(), and get_updated_html(). The loop and final readback match documented patterns, and execution passed 7/7 with no _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation as trial-1. Correct processor, documented combined tag/class query, documented class-removal helper, and documented get_updated_html() output path. Execution passed 7/7 with no _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation as trial-1 with only formatting differences. API usage is fully documented and idiomatic for this task. Execution passed 7/7 with no _doing_it_wrong records." + } + ], + "failure_analysis": "No hidden cases failed across the three trials. The docs worked well for this task: the Tag Processor overview explicitly says to use it for flat tag/class/attribute edits; the Finding tags table documents next_tag() with both tag_name and class_name; the CSS class section says removing the only class removes the whole class attribute; and get_updated_html() is documented as the readback path after queued class changes. The main near-miss is class-name case semantics: the candidates happened to get the case-sensitive EXTERNAL case right, but next_tag()'s class_name parameter does not state the case/compat-mode behavior at the point of use, and has_class() documentation says ASCII case-insensitive even though default no-quirks behavior is byte-for-byte. That did not cause a failure here, but it is the most plausible source of future confusion.", + "doc_gaps": [ + { + "location": "WP_HTML_Tag_Processor::next_tag() parameter docs for $query['class_name']", + "problem": "The docs say the tag must contain the whole class name, but do not state whether matching is a whitespace-token match, whether it is substring-safe, or how case sensitivity works under the processor's compatibility mode.", + "suggestion": "Extend the class_name query docblock to say it matches a complete class token and document the exact case-sensitivity/compat-mode contract, with a short non-task-specific example such as class=\"note\" not matching class_name => \"not\"." + }, + { + "location": "WP_HTML_Tag_Processor::has_class() and class matching docs", + "problem": "The rendered docs say has_class() looks for an ASCII case-insensitive class name, while other docs/source behavior indicate no-quirks class matching is byte-for-byte and quirks mode is case-insensitive. This is easy to misapply to next_tag(... class_name ...) and remove_class().", + "suggestion": "Align has_class(), next_tag(class_name), add_class(), and remove_class() docs around one shared statement of class-name comparison semantics, including quirks vs no-quirks behavior." + }, + { + "location": "WP_HTML_Tag_Processor::remove_class() method docblock", + "problem": "The method-level section only says it removes a class and returns whether the class was set to be removed. The important contracts are elsewhere: it is safe when the class/attribute is absent, removing the final class removes the attribute, and the return value indicates the request was accepted for a matched opener, not necessarily that the class existed.", + "suggestion": "Move or repeat the key remove_class() behavioral contract in the method docblock: safe no-op for missing class, final class removes the attribute, untouched bytes are preserved as much as possible, and clarify return-value meaning." + } + ] + } + }, + { + "id": "N02-collect-figure-images", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Processor::create_fragment(), next_tag('IMG'), get_breadcrumbs(), and get_attribute(). All methods are documented, no _doing_it_wrong records appeared, and the attribute handling correctly distinguishes null, true, empty string, and decoded string values." + }, + { + "trial_id": "trial-2", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Uses the same documented structural approach as trial-1 and passes all edge cases. The only deduction is the extra all-or-nothing get_last_error() check after collection: documented, but not required by the task and potentially over-applies mutation/serialization guidance to a read-only extraction function." + }, + { + "trial_id": "trial-3", + "adherence": 93, + "hallucinated_methods": [], + "notes": "Correct processor choice and only documented APIs: create_fragment(), next_tag(), get_tag(), is_tag_closer(), and get_attribute(). The manual FIGURE depth counter with tag_closers is documented and works here, but is less idiomatic for ancestor containment than filtering IMG matches with get_breadcrumbs() or matches_breadcrumbs()." + } + ], + "failure_analysis": "No hidden case failed in any trial; each trial passed 9/9 cases with no _doing_it_wrong records. The docs did well at steering subjects to WP_HTML_Processor for structure-aware containment: the Tag Processor overview says it has no tree awareness, and the HTML Processor supported-elements section says to choose it when document structure matters. The Breadcrumbs section and get_breadcrumbs() method docs were enough for trials 1 and 2 to solve arbitrary-depth containment. The get_attribute() docs in the Tag Processor page explicitly describe null for missing attributes, true for boolean/valueless attributes, empty string for empty values, and decoded strings, which all trials handled correctly. Near-misses: trial 2 appears to have generalized get_last_error() rejection guidance beyond mutation/serialization, and trial 3 used manual closer tracking where breadcrumbs would have expressed the contract more directly.", + "doc_gaps": [ + { + "location": "html-processor.md, Breadcrumbs / next_tag() query documentation", + "problem": "The docs explain direct breadcrumb paths well, but they do not make the arbitrary-depth descendant pattern as explicit as the direct-child breadcrumb query pattern.", + "suggestion": "Add a general note that breadcrumb queries are child-path matches, while arbitrary ancestor containment should be checked by inspecting get_breadcrumbs() or matches_breadcrumbs() after matching the target token." + }, + { + "location": "html-processor.md, get_attribute()", + "problem": "The HTML Processor get_attribute() section lists string|true|null but omits the decoded-string sentence that appears in the Tag Processor docs, even though callers using only the HTML Processor page may need that contract.", + "suggestion": "Repeat or cross-link the inherited attribute-value semantics: missing returns null, valueless boolean returns true, empty quoted value returns '', and string values are already decoded." + }, + { + "location": "html-processor.md, get_last_error() and rewrite/scan recipes", + "problem": "The docs strongly emphasize rejecting or falling back on parser errors in mutation and serialization examples, which can make read-only extraction code apply an unnecessary all-or-nothing policy.", + "suggestion": "Clarify that get_last_error() distinguishes normal exhaustion from parser abort, and that whether to return partial results, empty results, or an error is caller policy for read-only scans." + }, + { + "location": "html-processor.md, tag_closers / is_tag_closer()", + "problem": "Manual opener/closer counters are documented but the docs do not clearly warn that they are often unnecessary for simple ancestor-membership checks and require understanding virtual closers and popped breadcrumbs.", + "suggestion": "Add guidance comparing manual closer tracking with breadcrumb-based containment, recommending breadcrumbs for membership tests and reserving closer/depth tracking for bounded subtree walks or transformations." + } + ] + } + }, + { + "id": "N03-first-list-count", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Processor::create_fragment() for a structural fragment task. Every API call is documented in the supplied markdown, including inherited Tag Processor methods. The solution follows the documented bookmark plus bounded next_token()/get_current_depth() pattern, seeks back to edit the opener, uses set_attribute() and get_updated_html(), and checks paused_at_incomplete_token() and get_last_error() before mutating." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same high-adherence pattern as trial-1: HTML Processor, documented calls only, no _doing_it_wrong records, depth-aware direct-child LI counting, bookmark/seek for the opener edit, and clean-scan checks for truncation or unsupported markup." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Uses the correct processor and the documented structural traversal idioms. The found_list flag is redundant but harmless. All methods are present in the rendered docs, and the code handles incomplete or unsupported input before applying the queued attribute update." + } + ], + "failure_analysis": "No failed hidden cases across the trials. All three passed 11/11 cases and execution.json recorded no _doing_it_wrong notices. The docs worked well here because the WP_HTML_Processor overview explicitly says to use the HTML Processor for nested structure, the scan-a-region recipe shows bookmark -> next_token() -> depth-bound walk -> paused_at_incomplete_token()/get_last_error() -> seek -> edit, next_tag() explains that tag_name is not a list and recommends scanning any tag then branching, and get_current_depth()/next_token() explain the >= subtree boundary needed for omitted closers and nested elements. Near-misses: the unsupported-after-closed-list case depends on stopping at the completed container boundary rather than draining the rest of the document; the recipes imply this, but get_last_error() itself does not make that scope especially explicit. Also, the HTML Processor set_bookmark section contains an inherited Tag Processor example, which could steer weaker readers toward the wrong processor despite the overview guidance.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::set_bookmark() docblock / rendered HTML Processor bookmark section", + "problem": "The method section includes a WP_HTML_Tag_Processor example inside the HTML Processor docs. For structural tasks, that can conflict with the overview’s advice to use WP_HTML_Processor.", + "suggestion": "Add or replace with an HTML Processor-specific bookmark example using create_fragment(), next_token(), get_current_depth(), seek(), and get_updated_html(); label any inherited Tag Processor example as lexical-only." + }, + { + "location": "WP_HTML_Processor::get_last_error() and next_token() bounded-walk docs", + "problem": "The docs do not explicitly state that get_last_error() only reflects markup scanned so far, so callers may over-scan beyond a completed region and reject otherwise valid edits because of later unsupported markup.", + "suggestion": "Document the contract for bounded scans: after a loop exits because depth dropped below the recorded container depth, paused_at_incomplete_token() and get_last_error() validate the scanned region; callers need not scan unrelated trailing markup unless their own contract requires whole-document validation." + }, + { + "location": "WP_HTML_Processor::get_current_depth() docblock", + "problem": "The direct-child opener predicate is easy to miss because the method doc emphasizes subtree membership, while the compact direct-child checks are in the overview recipe.", + "suggestion": "Include a short direct-child element predicate in the get_current_depth() method docs: require #tag, not a closer, and current depth equal to container depth + 1, then apply the caller’s tag-name test." + } + ] + } + }, + { + "id": "N04-normalize-or-placeholder", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Uses the documented `WP_HTML_Processor::normalize()` static method, the correct processor for normalized BODY-fragment serialization. It checks `null` strictly, so unsupported markup falls back while an empty normalized string remains valid. No `_doing_it_wrong` records; the captured `WP_HTML_Processor::serialize` warnings are the documented null-return unsupported path bubbling from `normalize()` internals." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same correct implementation as the reference: documented HTML Processor normalization, strict `null` handling, and no undocumented API calls. It relies on the documented normalization contract rather than hand-walking tokens, which is idiomatic for this task." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly uses only `WP_HTML_Processor::normalize()`, documented in the rendered HTML Processor docs. The ternary preserves `''` for empty fragments and falls back only for `null`, matching the documented `string|null` contract." + } + ], + "failure_analysis": "No hidden case failed in any trial. The docs did well on the core decision points: the Tag Processor overview says to use the HTML Processor for producing normalized output; the HTML Processor supported-elements section says unsupported markup aborts and output methods such as `serialize()` and `normalize()` return `null`; and the `normalize()` docblock gives the exact signature, BODY-fragment context, normalization effects, and `string|null` return. The successful table, unclosed-tag, attribute-quoting, entity, unsupported-misnesting, and empty-fragment cases all follow directly from those passages. Near misses: the docs imply strict null handling via `string|null`, but they do not explicitly warn that `''` is a valid normalized result; and unsupported inputs emit warnings from internal `serialize()` even though the high-level contract is a `null` return, which could surprise harnesses or callers that treat warnings as failures.", + "doc_gaps": [ + { + "location": "`WP_HTML_Processor::normalize()` return-value docblock", + "problem": "The `string|null` return type is correct, but the docs do not explicitly state that an empty fragment normalizes to the empty string and only `null` means failure.", + "suggestion": "Add a sentence recommending strict `null === $normalized` checks when distinguishing failure from valid empty output." + }, + { + "location": "`WP_HTML_Processor::normalize()` examples", + "problem": "All examples show successful normalization. The null-on-unsupported contract is stated elsewhere, but not demonstrated where callers learn the convenience API.", + "suggestion": "Add a small generic example showing that unsupported input returns `null`, without prescribing any task-specific fallback markup." + }, + { + "location": "`WP_HTML_Processor::normalize()` / `serialize()` unsupported-output notes", + "problem": "Unsupported normalization returns `null` but can also trigger a warning from `WP_HTML_Processor::serialize`; the rendered docs do not make that side effect clear.", + "suggestion": "Document whether callers should expect a warning when serialization fails because the parser aborted, and clarify that the programmatic failure signal remains `null`." + } + ] + } + }, + { + "id": "N05-document-title", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 95, + "hallucinated_methods": [], + "notes": "Used the intended WP_HTML_Processor::create_full_parser(), checked null creation, used documented next_tag('TITLE') and get_modifiable_text(). Correctly relies on decoded TITLE modifiable text and preserves empty string versus null. Small deduction: it does not check get_namespace() or structural location, so a preceding SVG/MathML TITLE could be mistaken for the document title." + }, + { + "trial_id": "trial-2", + "adherence": 95, + "hallucinated_methods": [], + "notes": "Same strong API use as trial-1: full parser, documented cursor walk, documented decoded TITLE text. No _doing_it_wrong records. The while loop does not actually filter anything, so it still has the same namespace/structure near-miss as trial-1." + }, + { + "trial_id": "trial-3", + "adherence": 74, + "hallucinated_methods": [], + "notes": "All called APIs are documented: WP_HTML_Tag_Processor constructor, next_tag(), and get_modifiable_text(). It passes because TITLE is documented as a special element with decoded modifiable text. Major deduction: the task is complete-document/document-title work, and the rendered docs specifically steer TITLE-in-HEAD/full-document parsing to WP_HTML_Processor::create_full_parser(); the Tag Processor is only lexical and lacks structural/namespace awareness." + } + ], + "failure_analysis": "All trials passed the frozen hidden cases, with no _doing_it_wrong records. The docs did well on the core contract: create_full_parser() is documented for complete documents, next_tag() is documented as a forward cursor search, and get_modifiable_text() explicitly says TITLE/TEXTAREA text is decoded and carried on the opening element token, which led all subjects to preserve decoded entities and empty titles. Near-misses: trials 1 and 2 omit the reference implementation's get_namespace() guard, and trial 3 chose the lexical Tag Processor. The likely documentation cause is that namespace collisions are not called out near the TITLE/get_modifiable_text examples, while the Tag Processor page contains a token-walking example that extracts TITLE text and can look suitable despite later reminders that complete-document TITLE-in-HEAD parsing belongs to the HTML Processor.", + "doc_gaps": [ + { + "location": "html-processor.md#get_modifiable_text", + "problem": "The TITLE example shows how to read special-element text but does not warn that tag-name searches can encounter same-named foreign-content elements.", + "suggestion": "Add a general note that when selecting HTML elements by name in full documents with SVG/MathML, callers should check get_namespace() === 'html' or otherwise constrain by structure." + }, + { + "location": "html-processor.md#next_tag", + "problem": "The tag_name query docs do not make namespace matching behavior explicit.", + "suggestion": "Clarify whether next_tag('NAME') matches by local name across namespaces, and show the paired namespace-check pattern for names that exist in HTML and foreign content." + }, + { + "location": "html-tag-processor.md#Tokens and finer-grained processing", + "problem": "The lexical token example extracts TITLE text, which can encourage Tag Processor use for document metadata even though it lacks document-tree semantics.", + "suggestion": "Label that example as lexical extraction only, and cross-link to the HTML Processor full-parser pattern for document-level metadata or HEAD-sensitive reads." + }, + { + "location": "html-tag-processor.md#get_modifiable_text", + "problem": "The reminder about complete-document TITLE-in-HEAD parsing is useful but buried after the generic decoded-text explanation.", + "suggestion": "Move or duplicate that reminder near the TITLE special-element discussion so users choosing between processors see it before copying Tag Processor patterns." + } + ] + } + }, + { + "id": "N06-extract-toc", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), a single next_token() pass, documented token/type/name checks, closer handling, and guarded get_modifiable_text(). Strong fit for fragment text extraction, including decoded text and a documented special-element opt-in. No _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 94, + "hallucinated_methods": [], + "notes": "Correct processor and all API calls are documented. The single-pass closer-driven accumulator is explicitly supported by the next_token() docs and handled virtual heading closers. Main near-miss: it only accumulates #text tokens, so documented text-carrying special element openers such as TEXTAREA/TITLE inside a collected subtree would be missed." + }, + { + "trial_id": "trial-3", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Correct processor and documented APIs throughout. The depth-bounded subtree walk matches the get_current_depth()/next_token() recipe and uses >= correctly, plus a special-element opt-in. Slight idiom caveat: it nests next_token() loops for repeated regions, which the docs warn can skip boundaries in less constrained cases, though this implementation is safe for the tested heading traversal." + } + ], + "failure_analysis": "All three trials passed all 7 hidden cases with no _doing_it_wrong or trigger_error records. The docs did well on the key decision points: they clearly steer tree-aware text extraction toward WP_HTML_Processor rather than WP_HTML_Tag_Processor; next_token() documents virtual/implied/end-of-input closers, which is what made the implied-heading-close case work; get_modifiable_text() documents decoded #text output, which made the entity case work; and get_current_depth() explains the >= subtree guard used by trial-3. Near-misses were outside the hidden cases: trial-2 missed the documented exception that SCRIPT/STYLE/TITLE/TEXTAREA carry text on the opener rather than #text children, and trial-3 followed the depth-bounded recipe but in the nested-loop shape that another passage warns against for repeated regions.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_tag() docblock/rendered section", + "problem": "In the HTML Processor docs, the inherited get_tag() example constructs WP_HTML_Tag_Processor, which weakens the distinction the overview is trying to teach.", + "suggestion": "Use WP_HTML_Processor::create_fragment() in the HTML Processor rendering and add one sentence clarifying get_tag() vs get_token_name() on tag tokens, including virtual closers." + }, + { + "location": "WP_HTML_Processor::next_token() and get_current_depth() recipes", + "problem": "The docs both show a depth-bounded inner walk and warn against nested next_token() loops for repeated regions; the boundary between safe and risky nested walks is not explicit.", + "suggestion": "Add a short note explaining resumption semantics: a bounded subtree walk exits while matched on the boundary token, and a single-loop state machine is preferred when the caller must process every sibling boundary as its own region." + }, + { + "location": "WP_HTML_Processor::get_modifiable_text() / collect DOM-style text recipe", + "problem": "The ordinary #text recipe and special-element exception are documented, but there is no compact pattern for callers whose contract wants textContent-like extraction including special elements.", + "suggestion": "Add a general example that collects #text tokens and, only by explicit policy, whitelisted special-element opener text; state which returned text is decoded and which remains raw." + }, + { + "location": "HTML Processor supported markup section", + "problem": "The heading implied-close example is terse and uses a mismatched end tag; it does not clearly show that a following heading opener closes the previous heading in the parsed tree.", + "suggestion": "Add a general supported-markup note that opening one heading while another heading is open produces a closer for the previous heading, visible during next_token() traversal." + }, + { + "location": "paused_at_incomplete_token() guidance in WP_HTML_Processor text-walk docs", + "problem": "The docs explain checking truncation for mutations or rejection, but do not spell out the read-only extraction policy choice.", + "suggestion": "Add a sentence distinguishing best-effort extraction, which may return visited text plus virtual closers, from strict extraction, which should drain the processor and inspect paused_at_incomplete_token() and get_last_error()." + } + ] + } + }, + { + "id": "T01-add-image-class", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Tag_Processor, the documented choice for flat byte-preserving tag/class edits. Calls only documented APIs: next_tag(), add_class(), and get_updated_html(). The while-loop scan and add_class() helper match the docs, and documented next_tag()/get_updated_html() behavior covers comments, case-insensitive tag matching, untouched bytes, unquoted attributes, and incomplete trailing tags." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation as trial-1. Correct processor, no undocumented methods, idiomatic linear scan over IMG tags, add_class() for class merging, and get_updated_html() for byte-preserving output. Execution had no _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation as trial-1. Correctly followed the documented Tag Processor pattern for all matching tags and relied on documented add_class() semantics instead of manually parsing attributes or classes." + } + ], + "failure_analysis": "No failed hidden cases across trials: all three passed 8/8, including existing classes, uppercase tag names, comment-contained tag-like text, unquoted attributes, and incomplete trailing input. The docs worked well here. The Tag Processor overview, especially 'Which processor should I use?', directly says to use WP_HTML_Tag_Processor for flat attribute/class edits and byte-precise preservation. The next_tag() method docs explicitly state ASCII case-insensitive tag-name matching, that comments/raw-text contents are not matched as tags, and that truncated tags are not matched. The add_class() docs state that missing class attributes are created and existing classes are appended without removal or reordering. The get_updated_html() docs clearly identify it as the way to read queued edits while preserving every untouched byte. Near-misses are small: the high-level Usage section stops at requesting changes and does not make returning get_updated_html() part of the main three-step recipe, and add_class() does not locally restate where a newly-created class attribute is inserted, even though the broader set_attribute/get_updated_html docs explain new attribute placement and output quoting.", + "doc_gaps": [ + { + "location": "WP_HTML_Tag_Processor overview / Usage", + "problem": "The main three-step usage recipe covers construction, finding tags, and requesting changes, but the final readback step is only documented later under get_updated_html().", + "suggestion": "Make the top-level recipe include a fourth step: return or otherwise read the modified document with get_updated_html() after queued attribute/class/text edits." + }, + { + "location": "WP_HTML_Tag_Processor::add_class()", + "problem": "The method explains append/no-reorder/no-duplicate behavior, but it does not locally state the placement and quoting behavior when it creates a missing class attribute.", + "suggestion": "Add one sentence that newly-created class attributes follow the normal new-attribute insertion contract: inserted immediately after the tag name and emitted as a double-quoted attribute value." + }, + { + "location": "WP_HTML_Tag_Processor Finding tags examples", + "problem": "The examples show finding one tag and a custom loop, but there is no compact general recipe for applying one edit to every tag matching a simple query.", + "suggestion": "Add a general 'apply an edit to every matching tag' pattern using while ( $processor->next_tag( $query ) ) { ... } followed by get_updated_html(), without tying it to any specific task." + } + ] + } + }, + { + "id": "T02-link-targets", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Tag_Processor, next_tag('a'), get_attribute('href') with a strict null absence check, set_attribute('target','_blank'), and get_updated_html(). All methods are documented and the implementation follows the byte-preserving attribute-edit pattern." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same canonical Tag Processor solution, using next_tag('A') and strict null semantics for href presence. No undocumented calls or _doing_it_wrong records; passed all 8 cases." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same canonical Tag Processor solution, using documented methods only and the correct get_updated_html retrieval path. Handles empty and valueless href by avoiding truthiness checks; passed all 8 cases." + } + ], + "failure_analysis": "No failed hidden cases across trials: each trial passed simple, no-href-skipped, empty-href-counts, valueless-href-counts, existing-target-overwritten, uppercase-attribute, inside-comment-ignored, and nested-markup-in-link. The docs did well in the Tag Processor 'Which processor should I use?' section, which explicitly points flat byte-precise attribute edits to WP_HTML_Tag_Processor; the 'Usage' and 'Finding tags' sections show construction and next_tag(); the 'Custom queries' passage states get_attribute() returns null for absence, empty string for present-empty, and true for valueless boolean attributes; 'Modifying HTML attributes' says set_attribute() overwrites existing attributes; and get_updated_html() is documented as the way to return queued byte-preserving edits. Near miss: the correct presence-check idiom is present in prose but not highlighted as a named recipe, so weaker subjects could still have written a truthiness check and skipped href=\"\".", + "doc_gaps": [ + { + "location": "WP_HTML_Tag_Processor::get_attribute() / attribute-reading docs", + "problem": "The null, empty-string, and true semantics are documented, but the common 'attribute presence' idiom is not emphasized near the method signature.", + "suggestion": "Add a short presence-check example using null !== $processor->get_attribute( $name ), with a warning that truthiness checks treat present-empty attributes as absent." + }, + { + "location": "WP_HTML_Tag_Processor::next_tag() and get_attribute() query/name docs", + "problem": "Case-insensitive tag and attribute-name matching is only implicit or scattered; exact-byte output tasks also care that untouched attribute casing is preserved.", + "suggestion": "State explicitly that HTML tag and attribute-name matching is ASCII case-insensitive, while untouched source bytes such as attribute casing remain preserved in get_updated_html()." + }, + { + "location": "Generated Method Index", + "problem": "Private/internal methods are listed alongside public methods, which can distract documentation-only users and invite invalid API usage despite the visibility column.", + "suggestion": "Separate private methods into an internal section or hide them in consumer-facing rendered docs, leaving public traversal, attribute, bookmark, text, and output APIs prominent." + } + ] + } + }, + { + "id": "T03-first-h1-text", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), found H1 with next_tag(), bounded the subtree walk by get_current_depth() with >=, collected only #text tokens via get_token_type() and get_modifiable_text(). This matches the rendered docs' subtree text recipe exactly. No _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same documented, idiomatic pattern as trial-1: HTML Processor for tree-aware text extraction, depth-bounded next_token() walk, #text-only accumulation, decoded text through get_modifiable_text(). No unsupported API usage or misuse records." + }, + { + "trial_id": "trial-3", + "adherence": 93, + "hallucinated_methods": [], + "notes": "Correct processor and all called methods are documented. The main traversal is idiomatic, but it also opts into SCRIPT, STYLE, TEXTAREA, and TITLE opener text. That behavior is documented, but the docs' subtree text recipe says ordinary subtree text should append only #text tokens unless the caller explicitly wants special-element content. This is a plausible over-application of the special-element exception and could diverge on special-element-in-heading inputs." + } + ], + "failure_analysis": "All three trials passed all 8 hidden cases, so there are no failed hidden cases to diagnose.\n\nThe docs did well on the core path: the HTML Processor overview explicitly says to use WP_HTML_Processor when structure matters, including collecting element text and handling missing closing tags. The 'Recipe: collect DOM-style text from a subtree' gives almost the exact shape needed: create_fragment(), next_tag(), record depth, walk next_token(), append only #text via get_modifiable_text(). The get_current_depth() section explains why the guard must be >= rather than >, which prevented the common nested-markup failure. The next_token() section explains that unclosed elements still produce closing tokens, which supports the unclosed-h1 case. The get_modifiable_text() section clearly states that #text is already decoded, preventing double decoding and preserving the empty-string image-only case.\n\nThe only near-miss is trial-3. It noticed the documented special-element exception and included opener text from SCRIPT, STYLE, TEXTAREA, and TITLE. The docs do say those elements carry modifiable text on the element token, but the same recipe also says ordinary subtree text is only #text tokens unless the caller intentionally opts into another token type. The remaining ambiguity is terminology: a task or reader saying 'text content' may sound broader than the docs' 'ordinary subtree text', especially because get_modifiable_text() documents special-element text in the same area.", + "doc_gaps": [ + { + "location": "html-processor.md, 'Recipe: collect DOM-style text from a subtree' and next_token() special-element note", + "problem": "The distinction between ordinary parsed text descendants and special-element token text is present, but easy to over-apply when a caller says 'text content'.", + "suggestion": "Add a short contract note defining the default recipe as 'ordinary HTML subtree text: #text tokens only; excludes SCRIPT/STYLE raw text and TEXTAREA/TITLE opener text unless the caller explicitly says to include those elements'." + }, + { + "location": "html-processor.md, get_modifiable_text()", + "problem": "The method documents many token types that can return text, but readers may treat that as a collection rule rather than a capability list.", + "suggestion": "Add a warning near the method summary: 'This method answers what the current token can expose, not whether that token belongs in a text-extraction result; choose token types first, then call this method.'" + }, + { + "location": "html-processor.md, text extraction examples", + "problem": "The successful pattern is shown for ARTICLE and LI, but not framed as reusable for headings or other phrasing-content containers where nested inline markup is common.", + "suggestion": "Add one compact example or sentence saying the same depth-bounded #text walk applies to headings, captions, links, and list items, and returns an empty string when the element contains no #text tokens." + } + ] + } + }, + { + "id": "T04-build-figure", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Tag_Processor with a literal template, predeclared src/alt attributes to preserve order, walked tokens to a #text placeholder, used set_attribute()/set_modifiable_text() with plain strings, and returned get_updated_html(). All called methods are documented and execution recorded no misuse." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same documented pattern as the reference: Tag Processor construction, next_tag('img'), attribute replacement in-place, next_token() text walk, set_modifiable_text(), and get_updated_html(). No undocumented API calls or _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correct processor and documented API usage throughout. The early return if the template IMG is not found is unnecessary for a fixed internal template, but it is not an API misuse and does not affect adherence." + } + ], + "failure_analysis": "All three trials passed all 7 hidden cases, so there are no failed hidden cases to attribute. The docs did well in the exact areas this task required: the Tag Processor overview says it is appropriate for flat, byte-preserving edits; the 'Building markup from a template' section directly explains filling a literal template with untrusted values, including the two key rules that existing attributes preserve written order and text replacement needs a placeholder text node; set_attribute() documents that it accepts plain unescaped strings, encodes them, and preserves existing attribute positions; set_modifiable_text() documents that ordinary element text must be reached as a #text token and is encoded from plaintext; get_updated_html() is clearly identified as the correct output method after queued edits. The main near-miss is that next_token() contains a contradictory sentence saying the Tag Processor currently only supports the tag token, while surrounding examples and method docs rely on #text tokens. These subjects followed the stronger template-building guidance anyway, but that line could mislead less capable readers.", + "doc_gaps": [ + { + "location": "html-tag-processor.md, next_token() method docs", + "problem": "The text says the Tag Processor currently only supports the tag token, contradicting documented #text/comment/doctype token handling and the template-building examples that use #text.", + "suggestion": "Replace the stale limitation with an accurate list of supported token types and explicitly state that next_token() can visit #text tokens suitable for get_modifiable_text()/set_modifiable_text()." + }, + { + "location": "html-tag-processor.md, Building markup from a template", + "problem": "The example is excellent for a single text placeholder, but it does not name the failure mode if the placeholder is omitted beyond the bullet text.", + "suggestion": "Add a short note after the example: set_modifiable_text() replaces an existing text token; it does not insert a new child into an empty element, so templates intended for text replacement should include a placeholder." + }, + { + "location": "html-tag-processor.md, set_modifiable_text() examples", + "problem": "The method says to always check the return value, but examples often omit the check after matching #text, creating tension between strict guidance and common safe usage.", + "suggestion": "Clarify when checking can be omitted in examples, or show a minimal failure branch for set_modifiable_text() so readers understand the contract without overcomplicating template-fill code." + } + ] + } + }, + { + "id": "T05-text-excerpt", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), next_token(), get_token_type(), get_token_name(), is_tag_closer(), and get_modifiable_text(), all documented in the rendered files. Correctly treated text extraction as an HTML Processor token walk, whitelisted #text plus TITLE/TEXTAREA opener tokens, excluded SCRIPT/STYLE, and decoded text via get_modifiable_text(). No _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used only documented APIs, including get_tag() for tag-name checks after confirming #tag tokens. Processor choice, token walking, special-element handling, decoded-text handling, and UTF-8 truncation were all aligned with documented guidance. No _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used only documented APIs and closely followed the documented pattern: create a BODY fragment processor, walk tokens, collect #text, opt into TITLE/TEXTAREA opener modifiable text, and truncate with mb_* using UTF-8. No _doing_it_wrong records." + } + ], + "failure_analysis": "No hidden cases failed in any trial. The docs did well on the exact hazards this task exercises: html-processor.md's 'Recipe: collect DOM-style text from a subtree' says to use WP_HTML_Processor for tree-aware text extraction, append ordinary #text tokens, and not treat every token with modifiable text as text. Its opt-in policy explicitly says TITLE and TEXTAREA provide decoded text on opener tokens while SCRIPT and STYLE provide raw text and should not be included merely because available. The next_token() section explains that special elements produce no #text children and that malformed input still produces closing tokens. The get_modifiable_text() section states that #text, TITLE, and TEXTAREA are already decoded UTF-8 and should be measured/sliced with an explicit UTF-8 encoding. Near-misses: trial-2 used get_tag() while trials 1 and 3 used get_token_name(); both are documented and valid here, but the docs alternate between them in examples, which could confuse weaker users about which is preferred for token-walk code.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_modifiable_text() / text extraction recipe", + "problem": "The special-element guidance is correct, but implementers still have to synthesize the include/exclude policy from several paragraphs: #text is ordinary DOM text, TITLE/TEXTAREA are decoded opt-in opener text, and SCRIPT/STYLE are raw opt-in text that many text-content callers must exclude.", + "suggestion": "Add a compact table for token text policies: token/source, whether it appears as #text child tokens, whether get_modifiable_text() is decoded or raw, and when callers should opt in." + }, + { + "location": "WP_HTML_Processor::get_token_name() and get_tag() docs", + "problem": "Examples use both get_token_name() and get_tag() for tag-name checks during token walks. Both worked in these trials, but the preferred choice is not explicit for code that first checks get_token_type() === '#tag'.", + "suggestion": "Add a short note: in token walks, use get_token_type() to distinguish token kinds; after confirming '#tag', either get_tag() or get_token_name() can identify the element name, with any semantic differences called out." + }, + { + "location": "WP_HTML_Processor::next_token() incomplete-input guidance", + "problem": "The docs mention paused_at_incomplete_token() and get_last_error(), but the contract for read-only extraction is spread across mutation/rewrite examples. It is not obvious when best-effort extraction may ignore incomplete trailing syntax versus when callers should reject it.", + "suggestion": "Add a general note for read-only token walks: next_token() only visits complete reported tokens; callers that require proof of complete input should check paused_at_incomplete_token() and get_last_error() after the walk." + } + ] + } + }, + { + "id": "T06-collect-links", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 95, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), a single next_token() walk, get_attribute() with is_string(), #text filtering, and get_modifiable_text(); all called methods are documented and execution recorded no API misuse. Small deduction: the final paused_at_incomplete_token()/get_last_error() all-or-nothing return is too conservative for this read-only extraction task and would discard already collected links after a trailing incomplete token." + }, + { + "trial_id": "trial-2", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Used the right processor and only documented methods. The closer/depth tracking reflects the documented get_current_depth()/is_tag_closer() semantics, and text/attribute handling is idiomatic. Same small edge-policy issue as trial-1: it rejects the whole result on trailing incomplete syntax even though the task and reference allow best-effort extraction of already visited links." + }, + { + "trial_id": "trial-3", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), one next_token() loop, string-only href handling, #text-only text accumulation, and closer-driven flushing. All methods are documented and no misuse was recorded. It relies on a single current link rather than depth/breadcrumb state, which is acceptable for A elements under the processor's virtual-closer behavior but is a less general pattern for repeated subtree extraction." + } + ], + "failure_analysis": "No hidden cases failed: all three trials passed 8/8. The docs appear to have succeeded on the key concepts: the HTML Processor overview and create_fragment() docs pointed subjects to the structural parser; get_attribute() documented null/true/string and decoded values clearly enough that every trial used is_string(); the DOM-style text recipe and get_modifiable_text() docs led every trial to append only #text tokens and avoid comments/markup/special-element token text; and next_token() documented virtual closers well enough that unclosed links worked. Near-miss: trials 1 and 2 over-applied the clean-scan guidance from the mutation/rewrite examples. In a read-only probe, `ok

            ';\n $pending_empty_p = false;\n $pending_p_depth = null;\n }\n\n if ( '#tag' === $token_type && 'P' === $tag_name && ! $processor->is_tag_closer() ) {\n $pending_empty_p = true;\n $pending_p_depth = $depth;\n continue;\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) {\n return $html;\n }\n\n if ( $pending_empty_p ) {\n $output .= '

            ';\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` and a single `next_token()` walk, building normalized output with `serialize_token()`. A `P` opener is held back until the next token proves whether it is immediately followed by its own closer; if so, both tokens are skipped as an empty paragraph, otherwise the opener is emitted and processing continues. Per the docs, the function falls back to the original HTML when parsing cannot complete cleanly by checking `paused_at_incomplete_token()` and `get_last_error()`.", + "confidence": 82 + }, + { + "id": "H04-remove-empty-paragraphs", + "trial": 2, + "ok": true, + "code": "next_token() ) {\n if ( null !== $pending_p ) {\n $is_pending_p_closer =\n '#tag' === $processor->get_token_type() &&\n 'P' === $processor->get_tag() &&\n $processor->is_tag_closer() &&\n $processor->get_current_depth() < $pending_p['depth'];\n\n if ( $is_pending_p_closer ) {\n $pending_p = null;\n continue;\n }\n\n $output .= $pending_p['opener'];\n $pending_p = null;\n }\n\n if (\n '#tag' === $processor->get_token_type() &&\n 'P' === $processor->get_tag() &&\n ! $processor->is_tag_closer()\n ) {\n $pending_p = array(\n 'depth' => $processor->get_current_depth(),\n 'opener' => $processor->serialize_token(),\n );\n continue;\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $pending_p || $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) {\n return $html;\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment, then walks tokens with `next_token()` and rebuilds a normalized serialization via `serialize_token()`. A `P` opener is held back until the next token proves whether the paragraph is truly empty; if the next relevant token is that paragraph’s closer, both are omitted, otherwise the opener is emitted and processing continues. If parsing aborts or ends on incomplete syntax, checked with `get_last_error()` and `paused_at_incomplete_token()`, the original HTML is returned unchanged.", + "confidence": 92 + }, + { + "id": "H04-remove-empty-paragraphs", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n $is_p_tag = '#tag' === $processor->get_token_type() && 'P' === $processor->get_tag();\n\n if ( $is_p_tag && ! $processor->is_tag_closer() ) {\n $count = count( $paragraph_stack );\n for ( $i = 0; $i < $count; $i++ ) {\n $paragraph_stack[ $i ]['has_content'] = true;\n if ( ! $paragraph_stack[ $i ]['emitted'] ) {\n $output .= $paragraph_stack[ $i ]['opener'];\n $paragraph_stack[ $i ]['emitted'] = true;\n }\n }\n\n $paragraph_stack[] = array(\n 'opener' => $processor->serialize_token(),\n 'has_content' => false,\n 'emitted' => false,\n );\n continue;\n }\n\n if ( $is_p_tag && $processor->is_tag_closer() ) {\n if ( 0 === count( $paragraph_stack ) ) {\n $output .= $processor->serialize_token();\n continue;\n }\n\n $paragraph = array_pop( $paragraph_stack );\n if ( $paragraph['has_content'] ) {\n if ( ! $paragraph['emitted'] ) {\n $output .= $paragraph['opener'];\n }\n $output .= $processor->serialize_token();\n }\n continue;\n }\n\n $count = count( $paragraph_stack );\n for ( $i = 0; $i < $count; $i++ ) {\n $paragraph_stack[ $i ]['has_content'] = true;\n if ( ! $paragraph_stack[ $i ]['emitted'] ) {\n $output .= $paragraph_stack[ $i ]['opener'];\n $paragraph_stack[ $i ]['emitted'] = true;\n }\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) {\n return $html;\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` and a single `next_token()` rewrite loop with `serialize_token()` to produce normalized output while selectively skipping empty `

            ` elements. Each paragraph opener is buffered until a child token is seen; if no token appears before its closing `P`, both opener and closer are omitted, and if processing aborts or pauses on incomplete input via `get_last_error()` or `paused_at_incomplete_token()`, the original HTML is returned unchanged.", + "confidence": 84 + }, + { + "id": "N01-remove-external-class", + "trial": 1, + "ok": true, + "code": "next_tag( array( 'tag_name' => 'A', 'class_name' => 'external' ) ) ) {\n $processor->remove_class( 'external' );\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat attribute/class edit: construct the processor with the input HTML, repeatedly find `A` tags matching the `external` class via `next_tag()`, remove that class with `remove_class()`, and return the modified markup with `get_updated_html()`.", + "confidence": 98 + }, + { + "id": "N01-remove-external-class", + "trial": 2, + "ok": true, + "code": "next_tag( array( 'tag_name' => 'A', 'class_name' => 'external' ) ) ) {\n $processor->remove_class( 'external' );\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` for flat tag/attribute editing: it constructs a processor for the input HTML, loops with `next_tag()` restricted to `A` tags carrying the `external` class, removes that class with `remove_class()`, and returns the result with `get_updated_html()`.", + "confidence": 98 + }, + { + "id": "N01-remove-external-class", + "trial": 3, + "ok": true, + "code": "next_tag(\n array(\n 'tag_name' => 'A',\n 'class_name' => 'external',\n )\n )\n ) {\n $processor->remove_class( 'external' );\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat attribute/class edit: it constructs the processor with the input HTML, repeatedly finds `A` tags matching the `external` class via `next_tag()`, removes that class with `remove_class()`, and returns the result with `get_updated_html()`.", + "confidence": 98 + }, + { + "id": "N02-collect-figure-images", + "trial": 1, + "ok": true, + "code": "next_tag( 'IMG' ) ) {\n $breadcrumbs = $processor->get_breadcrumbs();\n\n if ( ! in_array( 'FIGURE', $breadcrumbs, true ) ) {\n continue;\n }\n\n $src = $processor->get_attribute( 'src' );\n\n if ( is_string( $src ) && '' !== $src ) {\n $images[] = $src;\n }\n }\n\n return $images;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure, then scans `IMG` openers in document order with `next_tag( 'IMG' )`. For each image, it checks `get_breadcrumbs()` to confirm a `FIGURE` ancestor exists at any depth, and reads the already-decoded `src` with `get_attribute()`, skipping missing or empty values.", + "confidence": 95 + }, + { + "id": "N02-collect-figure-images", + "trial": 2, + "ok": true, + "code": "next_tag( 'IMG' ) ) {\n if ( ! in_array( 'FIGURE', $processor->get_breadcrumbs(), true ) ) {\n continue;\n }\n\n $src = $processor->get_attribute( 'src' );\n if ( is_string( $src ) && '' !== $src ) {\n $images[] = $src;\n }\n }\n\n if ( null !== $processor->get_last_error() ) {\n return array();\n }\n\n return $images;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because containment is structural, then scans only `IMG` openers with `next_tag( 'IMG' )`, checks `get_breadcrumbs()` to confirm each image is somewhere inside a `FIGURE`, and reads the already-decoded `src` with `get_attribute()`, skipping missing, boolean, and empty values.", + "confidence": 93 + }, + { + "id": "N02-collect-figure-images", + "trial": 3, + "ok": true, + "code": "next_tag( array( 'tag_closers' => 'visit' ) ) ) {\n if ( 'FIGURE' === $processor->get_tag() ) {\n if ( $processor->is_tag_closer() ) {\n if ( $figure_depth > 0 ) {\n $figure_depth--;\n }\n } else {\n $figure_depth++;\n }\n\n continue;\n }\n\n if ( $figure_depth > 0 && 'IMG' === $processor->get_tag() && ! $processor->is_tag_closer() ) {\n $src = $processor->get_attribute( 'src' );\n\n if ( is_string( $src ) && '' !== $src ) {\n $images[] = $src;\n }\n }\n }\n\n return $images;\n}\n", + "explanation": "This uses the structure-aware `WP_HTML_Processor::create_fragment()` and a single forward scan with `next_tag( array( 'tag_closers' => 'visit' ) )`. It tracks whether the current position is inside one or more `FIGURE` elements by incrementing/decrementing on `FIGURE` openers/closers, then collects decoded `src` values from `IMG` openers encountered while that depth is positive via `get_attribute( 'src' )`, skipping missing or empty values.", + "confidence": 94 + }, + { + "id": "N03-first-list-count", + "trial": 1, + "ok": true, + "code": "next_tag() ) {\n $tag_name = $processor->get_tag();\n if ( 'UL' !== $tag_name && 'OL' !== $tag_name ) {\n continue;\n }\n\n if ( ! $processor->set_bookmark( 'first-list' ) ) {\n return $html;\n }\n\n $list_depth = $processor->get_current_depth();\n $count = 0;\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $list_depth ) {\n $is_direct_child_li_opener =\n '#tag' === $processor->get_token_type() &&\n ! $processor->is_tag_closer() &&\n 'LI' === $processor->get_tag() &&\n $processor->get_current_depth() === $list_depth + 1;\n\n if ( $is_direct_child_li_opener ) {\n $count++;\n }\n }\n\n $scan_finished_cleanly =\n ! $processor->paused_at_incomplete_token() &&\n null === $processor->get_last_error();\n\n if ( ! $scan_finished_cleanly ) {\n $processor->release_bookmark( 'first-list' );\n return $html;\n }\n\n if ( ! $processor->seek( 'first-list' ) ) {\n $processor->release_bookmark( 'first-list' );\n return $html;\n }\n\n $processor->set_attribute( 'data-item-count', (string) $count );\n $processor->release_bookmark( 'first-list' );\n\n return $processor->get_updated_html();\n }\n\n return $html;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure. It finds the first `UL` or `OL` with `next_tag()`, bookmarks the opener, records its depth with `get_current_depth()`, then walks the subtree with `next_token()` and counts only direct child `LI` opening tags by checking `get_token_type()`, `is_tag_closer()`, `get_tag()`, and depth equality. It rejects changes if scanning ended on incomplete input or unsupported markup via `paused_at_incomplete_token()` and `get_last_error()`, then seeks back to the bookmarked list and writes `data-item-count` with `set_attribute()` before returning `get_updated_html()`.", + "confidence": 93 + }, + { + "id": "N03-first-list-count", + "trial": 2, + "ok": true, + "code": "next_tag() ) {\n $tag = $processor->get_tag();\n if ( 'UL' !== $tag && 'OL' !== $tag ) {\n continue;\n }\n\n if ( ! $processor->set_bookmark( 'first-list' ) ) {\n return $html;\n }\n\n $list_depth = $processor->get_current_depth();\n $count = 0;\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $list_depth ) {\n if (\n '#tag' === $processor->get_token_type() &&\n ! $processor->is_tag_closer() &&\n 'LI' === $processor->get_tag() &&\n $processor->get_current_depth() === $list_depth + 1\n ) {\n $count++;\n }\n }\n\n $scan_finished_cleanly =\n ! $processor->paused_at_incomplete_token() &&\n null === $processor->get_last_error();\n\n if ( ! $scan_finished_cleanly ) {\n $processor->release_bookmark( 'first-list' );\n return $html;\n }\n\n if ( ! $processor->seek( 'first-list' ) ) {\n $processor->release_bookmark( 'first-list' );\n return $html;\n }\n\n $processor->set_attribute( 'data-item-count', (string) $count );\n $processor->release_bookmark( 'first-list' );\n\n return $processor->get_updated_html();\n }\n\n return $html;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure. It scans forward with `next_tag()` to find the first `UL` or `OL`, bookmarks that opener, records its depth with `get_current_depth()`, then walks the subtree with `next_token()` and counts only direct child `LI` openers by checking `get_token_type()`, `is_tag_closer()`, `get_tag()`, and depth equality. If the bounded scan ends with truncation or unsupported markup (`paused_at_incomplete_token()` or `get_last_error()`), it returns the original HTML unchanged; otherwise it seeks back to the bookmarked list opener, sets `data-item-count` with `set_attribute()`, and returns `get_updated_html()`.", + "confidence": 91 + }, + { + "id": "N03-first-list-count", + "trial": 3, + "ok": true, + "code": "next_tag() ) {\n $tag = $processor->get_tag();\n if ( 'UL' !== $tag && 'OL' !== $tag ) {\n continue;\n }\n\n $found_list = true;\n if ( ! $processor->set_bookmark( 'first-list' ) ) {\n return $html;\n }\n\n $list_depth = $processor->get_current_depth();\n $count = 0;\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $list_depth ) {\n if (\n '#tag' === $processor->get_token_type() &&\n ! $processor->is_tag_closer() &&\n 'LI' === $processor->get_tag() &&\n $processor->get_current_depth() === $list_depth + 1\n ) {\n $count++;\n }\n }\n\n $scan_finished_cleanly =\n ! $processor->paused_at_incomplete_token() &&\n null === $processor->get_last_error();\n\n if ( ! $scan_finished_cleanly ) {\n $processor->release_bookmark( 'first-list' );\n return $html;\n }\n\n if ( ! $processor->seek( 'first-list' ) ) {\n $processor->release_bookmark( 'first-list' );\n return $html;\n }\n\n $processor->set_attribute( 'data-item-count', (string) $count );\n $processor->release_bookmark( 'first-list' );\n\n return $processor->get_updated_html();\n }\n\n if ( ! $found_list ) {\n return $html;\n }\n\n return $html;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure. It scans forward with `next_tag()` to find the first `UL` or `OL`, bookmarks that opener with `set_bookmark()`, walks its subtree with `next_token()` and `get_current_depth()` to count only direct-child `LI` openers, verifies the scan completed with `paused_at_incomplete_token()` and `get_last_error()`, then seeks back and applies `set_attribute()` before returning `get_updated_html()`.", + "confidence": 93 + }, + { + "id": "N04-normalize-or-placeholder", + "trial": 1, + "ok": true, + "code": "Unsupported HTML

            ';\n\t}\n\n\treturn $normalized;\n}\n", + "explanation": "This uses the documented `WP_HTML_Processor::normalize()` API, which normalizes an HTML fragment in BODY context and returns `null` when the HTML Processor cannot normalize unsupported input. The function therefore returns the normalized serialization on success and the exact fallback placeholder when normalization fails.", + "confidence": 98 + }, + { + "id": "N04-normalize-or-placeholder", + "trial": 2, + "ok": true, + "code": "Unsupported HTML

            ';\n }\n\n return $normalized;\n}\n", + "explanation": "This uses the documented `WP_HTML_Processor::normalize()` API, which is specifically meant to normalize an HTML fragment in BODY context and returns `null` when the HTML Processor cannot normalize unsupported input. The function therefore returns the normalized serialization when available and otherwise falls back to the exact placeholder HTML required.", + "confidence": 98 + }, + { + "id": "N04-normalize-or-placeholder", + "trial": 3, + "ok": true, + "code": "Unsupported HTML

            '\n : $normalized;\n}\n", + "explanation": "This uses the documented `WP_HTML_Processor::normalize()` API, which normalizes HTML fragments in `BODY` context and returns `null` when the HTML Processor cannot safely normalize unsupported input. The function returns that normalized serialization when available, otherwise it falls back to the exact required placeholder HTML.", + "confidence": 99 + }, + { + "id": "N05-document-title", + "trial": 1, + "ok": true, + "code": "next_tag( 'TITLE' ) ) {\n return $processor->get_modifiable_text();\n }\n\n return null;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_full_parser()` because the input is a complete HTML document, then searches for the first `TITLE` element with `next_tag( 'TITLE' )`. For `TITLE`, the documented `get_modifiable_text()` API returns its decoded text content, including the empty string for an empty ``, so the function can distinguish that case from a missing title.", + "confidence": 94 + }, + { + "id": "N05-document-title", + "trial": 2, + "ok": true, + "code": "next_tag( 'TITLE' ) ) {\n return $processor->get_modifiable_text();\n }\n\n return null;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_full_parser()` because the input is a complete HTML document, then scans forward with `next_tag( 'TITLE' )` and returns the TITLE opener token’s `get_modifiable_text()`, which the documentation says provides decoded TITLE text and correctly returns an empty string for an empty ``.", + "confidence": 94 + }, + { + "id": "N05-document-title", + "trial": 3, + "ok": true, + "code": "next_tag( 'title' ) ) {\n return null;\n }\n\n return $processor->get_modifiable_text();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` to scan the document for the first `` tag with `next_tag( 'title' )`. Because `TITLE` is documented as a special element whose contents are exposed as decoded modifiable text on the matched opener token, `get_modifiable_text()` returns the title text, including the empty string for an empty `<title>`, while a missing title returns `null`.", + "confidence": 95 + }, + { + "id": "N06-extract-toc", + "trial": 1, + "ok": true, + "code": " true,\n 'STYLE' => true,\n 'TEXTAREA' => true,\n 'TITLE' => true,\n );\n\n while ( $processor->next_token() ) {\n $token_type = $processor->get_token_type();\n $token_name = $processor->get_token_name();\n\n if ( '#tag' === $token_type && null !== $token_name ) {\n if ( ! $processor->is_tag_closer() && preg_match( '/^H([1-6])$/', $token_name, $matches ) ) {\n $toc[] = array(\n 'level' => (int) $matches[1],\n 'text' => '',\n );\n $current_heading = count( $toc ) - 1;\n continue;\n }\n\n if ( null !== $current_heading ) {\n if ( $processor->is_tag_closer() && $token_name === 'H' . $toc[ $current_heading ]['level'] ) {\n $current_heading = null;\n continue;\n }\n\n if ( ! $processor->is_tag_closer() && isset( $special_text_elements[ $token_name ] ) ) {\n $toc[ $current_heading ]['text'] .= $processor->get_modifiable_text();\n }\n }\n\n continue;\n }\n\n if ( null !== $current_heading && '#text' === $token_type ) {\n $toc[ $current_heading ]['text'] .= $processor->get_modifiable_text();\n }\n }\n\n return $toc;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with structural awareness, then walks the document once with `next_token()`. When it sees an `H1` through `H6` opening tag it starts a TOC entry, appends descendant `#text` token content via `get_modifiable_text()`, includes text carried on special text-only element openers inside headings, and stops collecting when the matching heading closer is reached with `is_tag_closer()`.", + "confidence": 82 + }, + { + "id": "N06-extract-toc", + "trial": 2, + "ok": true, + "code": " 1,\n 'H2' => 2,\n 'H3' => 3,\n 'H4' => 4,\n 'H5' => 5,\n 'H6' => 6,\n );\n\n while ( $processor->next_token() ) {\n if ( '#tag' === $processor->get_token_type() ) {\n $tag = $processor->get_tag();\n\n if ( null !== $current_tag && $processor->is_tag_closer() && $tag === $current_tag ) {\n $toc[] = array(\n 'level' => $current_level,\n 'text' => $current_text,\n );\n\n $current_level = null;\n $current_tag = null;\n $current_text = '';\n continue;\n }\n\n if ( ! $processor->is_tag_closer() && null === $current_tag && isset( $heading_tag_set[ $tag ] ) ) {\n $current_tag = $tag;\n $current_level = $heading_tag_set[ $tag ];\n $current_text = '';\n }\n\n continue;\n }\n\n if ( null !== $current_tag && '#text' === $processor->get_token_type() ) {\n $current_text .= $processor->get_modifiable_text();\n }\n }\n\n return $toc;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with full HTML structure awareness, then performs a single `next_token()` pass. It starts a TOC entry on each `H1`-`H6` opening tag, appends decoded text from `#text` tokens via `get_modifiable_text()`, and finalizes the entry when the matching heading closer is visited.", + "confidence": 90 + }, + { + "id": "N06-extract-toc", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n if ( '#tag' !== $processor->get_token_type() || $processor->is_tag_closer() ) {\n continue;\n }\n\n $tag = $processor->get_tag();\n if ( ! in_array( $tag, array( 'H1', 'H2', 'H3', 'H4', 'H5', 'H6' ), true ) ) {\n continue;\n }\n\n $heading_depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $heading_depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n continue;\n }\n\n if ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) {\n $token_name = $processor->get_token_name();\n if ( in_array( $token_name, array( 'SCRIPT', 'STYLE', 'TITLE', 'TEXTAREA' ), true ) ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n }\n\n $toc[] = array(\n 'level' => (int) substr( $tag, 1 ),\n 'text' => $text,\n );\n }\n\n return $toc;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, scans tokens in document order with `next_token()`, detects heading openers via `get_token_type()`, `is_tag_closer()`, and `get_tag()`, then walks each heading subtree bounded by `get_current_depth()` and concatenates descendant text from `#text` tokens with `get_modifiable_text()`, also opting in to documented special text-carrying element tokens when they appear inside a heading.", + "confidence": 87 + }, + { + "id": "T01-add-image-class", + "trial": 1, + "ok": true, + "code": "next_tag( 'img' ) ) {\n $processor->add_class( 'wp-image' );\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor`, which is the documented API for byte-preserving attribute/class edits on matching tags. It linearly scans with `next_tag( 'img' )` (case-insensitive and limited to real tags, so comment contents are ignored), calls `add_class( 'wp-image' )` for each match, and returns the result with `get_updated_html()`.", + "confidence": 98 + }, + { + "id": "T01-add-image-class", + "trial": 2, + "ok": true, + "code": "next_tag( 'img' ) ) {\n $processor->add_class( 'wp-image' );\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat tag/attribute edit and requires preserving all untouched bytes exactly. The function scans forward with `next_tag( 'img' )`, which matches `IMG` tags case-insensitively and does not match comment contents, then calls `add_class( 'wp-image' )` on each matched tag before returning the result with `get_updated_html()`.", + "confidence": 98 + }, + { + "id": "T01-add-image-class", + "trial": 3, + "ok": true, + "code": "next_tag( 'img' ) ) {\n $processor->add_class( 'wp-image' );\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat, byte-preserving attribute/class edit. The function scans forward with `next_tag( 'img' )`, which matches `IMG` case-insensitively and ignores comment contents and other non-tag text, then calls the documented `add_class( 'wp-image' )` on each matched image before returning the result with `get_updated_html()`.", + "confidence": 98 + }, + { + "id": "T02-link-targets", + "trial": 1, + "ok": true, + "code": "next_tag( 'a' ) ) {\n if ( null === $processor->get_attribute( 'href' ) ) {\n continue;\n }\n\n $processor->set_attribute( 'target', '_blank' );\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat attribute edit where byte-preserving output matters. The function scans `A` tags with `next_tag( 'a' )`, uses `get_attribute( 'href' )` to detect whether `href` is present (including empty-string and boolean forms, since only absence returns `null`), overwrites or adds `target` via `set_attribute( 'target', '_blank' )`, and returns the result with `get_updated_html()`.", + "confidence": 97 + }, + { + "id": "T02-link-targets", + "trial": 2, + "ok": true, + "code": "next_tag( 'A' ) ) {\n if ( null === $processor->get_attribute( 'href' ) ) {\n continue;\n }\n\n $processor->set_attribute( 'target', '_blank' );\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` for byte-preserving attribute edits: it scans only `A` tags with `next_tag( 'A' )`, checks `get_attribute( 'href' )` so present-empty (`\"\"`) and boolean (`true` for ``) both count as present while absent remains `null`, then overwrites or adds `target` via `set_attribute( 'target', '_blank' )`, and returns the modified markup with `get_updated_html()`.", + "confidence": 97 + }, + { + "id": "T02-link-targets", + "trial": 3, + "ok": true, + "code": "next_tag( 'a' ) ) {\n if ( null === $processor->get_attribute( 'href' ) ) {\n continue;\n }\n\n $processor->set_attribute( 'target', '_blank' );\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat, byte-preserving attribute rewrite: it scans all `A` tags with `next_tag( 'a' )`, checks `get_attribute( 'href' )` so present empty-string and boolean `href` attributes both count, and applies `set_attribute( 'target', '_blank' )` to overwrite or create the attribute before returning the result with `get_updated_html()`.", + "confidence": 98 + }, + { + "id": "T03-first-h1-text", + "trial": 1, + "ok": true, + "code": "next_tag( 'H1' ) ) {\n return null;\n }\n\n $h1_depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, `next_tag( 'H1' )` to find the first heading, then a depth-bounded `next_token()` walk starting at that `H1`. During the walk it appends only `#text` token content via `get_modifiable_text()`, which the documentation says already returns decoded text, so nested markup contributes only its descendant text and non-text elements like images contribute nothing.", + "confidence": 93 + }, + { + "id": "T03-first-h1-text", + "trial": 2, + "ok": true, + "code": "next_tag( 'H1' ) ) {\n return null;\n }\n\n $h1_depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction depends on document structure, then finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, and walks its subtree with `next_token()`, concatenating only `#text` tokens via `get_token_type()` and `get_modifiable_text()`, which already returns decoded text.", + "confidence": 91 + }, + { + "id": "T03-first-h1-text", + "trial": 3, + "ok": true, + "code": "next_tag( 'H1' ) ) {\n return null;\n }\n\n $depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n continue;\n }\n\n if ( '#tag' !== $processor->get_token_type() || $processor->is_tag_closer() ) {\n continue;\n }\n\n $tag_name = $processor->get_token_name();\n if (\n 'SCRIPT' === $tag_name ||\n 'STYLE' === $tag_name ||\n 'TEXTAREA' === $tag_name ||\n 'TITLE' === $tag_name\n ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is tree-aware, finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, then walks that subtree with `next_token()`. It appends decoded text from `#text` tokens via `get_modifiable_text()`, and also whitelists special text-carrying element opener tokens (`SCRIPT`, `STYLE`, `TEXTAREA`, `TITLE`) because the processor documents that their text lives on the element token rather than separate `#text` children.", + "confidence": 73 + }, + { + "id": "T04-build-figure", + "trial": 1, + "ok": true, + "code": "\"\"
            .
            ' );\n\n if ( $processor->next_tag( 'img' ) ) {\n $processor->set_attribute( 'src', $url );\n $processor->set_attribute( 'alt', $alt );\n }\n\n while ( $processor->next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $processor->set_modifiable_text( $caption );\n break;\n }\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This builds the fragment from a fixed HTML template using `WP_HTML_Tag_Processor`, which the docs recommend when exact attribute order matters. It updates the existing `src` and `alt` attributes with `set_attribute()`, then walks tokens with `next_token()` to find the placeholder `#text` node inside `figcaption` and replaces it with `set_modifiable_text()`, letting the HTML API handle all required encoding.", + "confidence": 95 + }, + { + "id": "T04-build-figure", + "trial": 2, + "ok": true, + "code": "\"\"
            .
            ' );\n\n if ( $processor->next_tag( 'img' ) ) {\n $processor->set_attribute( 'src', $url );\n $processor->set_attribute( 'alt', $alt );\n }\n\n while ( $processor->next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $processor->set_modifiable_text( $caption );\n break;\n }\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` with a fixed HTML template so the output shape and attribute order stay exact. It updates the existing `src` and `alt` attributes with `set_attribute()` and replaces the placeholder figcaption text node with `set_modifiable_text()`, relying on the documented HTML API to perform the necessary escaping and encoding.", + "confidence": 96 + }, + { + "id": "T04-build-figure", + "trial": 3, + "ok": true, + "code": "\"\"
            .
            ' );\n\n if ( ! $processor->next_tag( 'img' ) ) {\n return '';\n }\n\n $processor->set_attribute( 'src', $url );\n $processor->set_attribute( 'alt', $alt );\n\n while ( $processor->next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $processor->set_modifiable_text( $caption );\n break;\n }\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` with a literal template so the final markup shape and `img` attribute order stay exact. It updates the existing `src` and `alt` attributes with `set_attribute()`, then walks tokens with `next_token()` to find the placeholder text node inside `figcaption` and replaces it with `set_modifiable_text()`, letting the HTML API handle all required escaping.", + "confidence": 94 + }, + { + "id": "T05-text-excerpt", + "trial": 1, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n continue;\n }\n\n if ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) {\n $tag_name = $processor->get_token_name();\n if ( 'TEXTAREA' === $tag_name || 'TITLE' === $tag_name ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n }\n\n if ( preg_match( '/^.{0,' . $max_codepoints . '}/us', $text, $matches ) ) {\n return $matches[0];\n }\n\n return '';\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment with structure awareness, then walks tokens with `next_token()`, appending decoded text from ordinary `#text` tokens and, per the documented special-case rule, from `TITLE` and `TEXTAREA` opening tags via `get_modifiable_text()`. It intentionally excludes other modifiable-text tokens such as `SCRIPT`, `STYLE`, comments, and processing instructions, and truncates the final UTF-8 string to the requested number of Unicode code points with a Unicode-aware regex.", + "confidence": 90 + }, + { + "id": "T05-text-excerpt", + "trial": 2, + "ok": true, + "code": "next_token() ) {\n $chunk = '';\n\n if ( '#text' === $processor->get_token_type() ) {\n $chunk = $processor->get_modifiable_text();\n } elseif ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) {\n $tag_name = $processor->get_tag();\n if ( 'TITLE' === $tag_name || 'TEXTAREA' === $tag_name ) {\n $chunk = $processor->get_modifiable_text();\n }\n }\n\n if ( '' === $chunk ) {\n continue;\n }\n\n $chunk_length = mb_strlen( $chunk, 'UTF-8' );\n $remaining = $max_codepoints - $length;\n\n if ( $chunk_length <= $remaining ) {\n $text .= $chunk;\n $length += $chunk_length;\n continue;\n }\n\n $text .= mb_substr( $chunk, 0, $remaining, 'UTF-8' );\n return $text;\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the HTML fragment with tree awareness, then walks all tokens with `next_token()`, appending decoded text from ordinary `#text` tokens and from `TITLE`/`TEXTAREA` opening tags via `get_modifiable_text()`, while excluding raw-text elements like `SCRIPT` and `STYLE`. It truncates incrementally with `mb_strlen()` and `mb_substr()` in UTF-8 so the result never splits a multi-byte character.", + "confidence": 91 + }, + { + "id": "T05-text-excerpt", + "trial": 3, + "ok": true, + "code": " 0 && $processor->next_token() ) {\n $text = '';\n $type = $processor->get_token_type();\n\n if ( '#text' === $type ) {\n $text = $processor->get_modifiable_text();\n } elseif ( '#tag' === $type && ! $processor->is_tag_closer() ) {\n $name = $processor->get_token_name();\n if ( 'TITLE' === $name || 'TEXTAREA' === $name ) {\n $text = $processor->get_modifiable_text();\n }\n }\n\n if ( '' === $text ) {\n continue;\n }\n\n $length = mb_strlen( $text, 'UTF-8' );\n if ( $length <= $remaining ) {\n $excerpt .= $text;\n $remaining -= $length;\n } else {\n $excerpt .= mb_substr( $text, 0, $remaining, 'UTF-8' );\n break;\n }\n }\n\n return $excerpt;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-context fragment with tree awareness, then walks tokens via `next_token()`, appending decoded text from ordinary `#text` tokens plus the special text-bearing `TITLE` and `TEXTAREA` opening-tag tokens through `get_modifiable_text()`. It truncates incrementally with `mb_strlen()` and `mb_substr()` in UTF-8 so the result is limited by Unicode code points without splitting multibyte characters.", + "confidence": 93 + }, + { + "id": "T06-collect-links", + "trial": 1, + "ok": true, + "code": "next_token() ) {\n if ( '#tag' === $processor->get_token_type() && 'A' === $processor->get_tag() ) {\n if ( $processor->is_tag_closer() ) {\n if ( ! empty( $link_stack ) ) {\n array_pop( $link_stack );\n }\n continue;\n }\n\n $href = $processor->get_attribute( 'href' );\n if ( is_string( $href ) ) {\n $links[] = array(\n 'href' => $href,\n 'text' => '',\n );\n $link_stack[] = count( $links ) - 1;\n }\n\n continue;\n }\n\n if ( '#text' === $processor->get_token_type() && ! empty( $link_stack ) ) {\n $current_index = $link_stack[ count( $link_stack ) - 1 ];\n $links[ $current_index ]['text'] .= $processor->get_modifiable_text();\n }\n }\n\n if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) {\n return array();\n }\n\n return $links;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, then walks the document with `next_token()`. For each `A` opening tag, it reads `href` with `get_attribute()` and includes it only when the returned value is a string, and while that link is open it concatenates descendant ordinary `#text` tokens via `get_token_type()`, `get_tag()`, `is_tag_closer()`, and `get_modifiable_text()`. It rejects truncated or unsupported input by checking `paused_at_incomplete_token()` and `get_last_error()` after the scan.", + "confidence": 84 + }, + { + "id": "T06-collect-links", + "trial": 2, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text = $processor->get_modifiable_text();\n\n foreach ( $active as $open_link ) {\n $links[ $open_link['index'] ]['text'] .= $text;\n }\n\n continue;\n }\n\n if ( '#tag' !== $processor->get_token_type() || 'A' !== $processor->get_tag() ) {\n continue;\n }\n\n if ( $processor->is_tag_closer() ) {\n $closer_depth = $processor->get_current_depth();\n\n while ( ! empty( $active ) ) {\n $last = $active[ count( $active ) - 1 ];\n if ( $last['depth'] !== $closer_depth + 1 ) {\n break;\n }\n\n array_pop( $active );\n }\n\n continue;\n }\n\n $href = $processor->get_attribute( 'href' );\n if ( is_string( $href ) ) {\n $links[] = array(\n 'href' => $href,\n 'text' => '',\n );\n $active[] = array(\n 'depth' => $processor->get_current_depth(),\n 'index' => count( $links ) - 1,\n );\n }\n }\n\n if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) {\n return array();\n }\n\n return $links;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` for tree-aware parsing, then walks the fragment with `next_token()`. It starts a result entry when it sees an opening `A` tag whose `get_attribute( 'href' )` returns a string, appends decoded text from `#text` tokens via `get_modifiable_text()` while that link is open, and ends tracking on the corresponding closing `A` token using `get_current_depth()` and `is_tag_closer()`.", + "confidence": 87 + }, + { + "id": "T06-collect-links", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n if ( '#tag' === $processor->get_token_type() && 'A' === $processor->get_tag() ) {\n if ( $processor->is_tag_closer() ) {\n if ( null !== $current ) {\n $links[] = $current;\n $current = null;\n }\n continue;\n }\n\n $href = $processor->get_attribute( 'href' );\n if ( is_string( $href ) ) {\n $current = array(\n 'href' => $href,\n 'text' => '',\n );\n }\n\n continue;\n }\n\n if ( null !== $current && '#text' === $processor->get_token_type() ) {\n $current['text'] .= $processor->get_modifiable_text();\n }\n }\n\n return $links;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with tree awareness, then performs a single `next_token()` walk. On each `A` opener it reads `href` with `get_attribute()` and keeps only string-valued attributes (excluding missing and boolean `href`), accumulates descendant ordinary text from `#text` tokens via `get_modifiable_text()`, and flushes the collected entry when the matching `A` closer is reached.", + "confidence": 86 + }, + { + "id": "T07-nested-lists", + "trial": 1, + "ok": true, + "code": "next_tag() ) {\n $tag = $processor->get_tag();\n\n if ( 'UL' !== $tag && 'OL' !== $tag ) {\n continue;\n }\n\n $breadcrumbs = $processor->get_breadcrumbs();\n $has_list_ancestor = false;\n $last_index = count( $breadcrumbs ) - 1;\n\n for ( $i = 0; $i < $last_index; $i++ ) {\n if ( 'UL' === $breadcrumbs[ $i ] || 'OL' === $breadcrumbs[ $i ] ) {\n $has_list_ancestor = true;\n break;\n }\n }\n\n if ( $has_list_ancestor ) {\n $processor->add_class( 'nested-list' );\n }\n }\n\n if ( null !== $processor->get_last_error() ) {\n return $html;\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the fragment with structural awareness, scans list openers with `next_tag()`, checks each matched tag’s ancestor path via `get_breadcrumbs()`, and adds the `nested-list` class with `add_class()` only when a `UL` or `OL` appears above it; the final byte-preserving result is returned with `get_updated_html()`, while unsupported-parser failures fall back to the original HTML.", + "confidence": 91 + }, + { + "id": "T07-nested-lists", + "trial": 2, + "ok": true, + "code": "next_tag() ) {\n $tag = $processor->get_tag();\n if ( 'UL' !== $tag && 'OL' !== $tag ) {\n continue;\n }\n\n $breadcrumbs = $processor->get_breadcrumbs();\n $has_list_ancestor = false;\n $last_index = count( $breadcrumbs ) - 1;\n\n for ( $i = 0; $i < $last_index; $i++ ) {\n if ( 'UL' === $breadcrumbs[ $i ] || 'OL' === $breadcrumbs[ $i ] ) {\n $has_list_ancestor = true;\n break;\n }\n }\n\n if ( $has_list_ancestor ) {\n $processor->add_class( 'nested-list' );\n }\n }\n\n if ( null !== $processor->get_last_error() ) {\n return $html;\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because ancestor-aware structure is required. It scans list openers with `next_tag()`, checks each matched tag’s `get_breadcrumbs()` for any earlier `UL` or `OL` ancestor, adds the `nested-list` class with `add_class()`, and returns the byte-preserving edited fragment with `get_updated_html()`.", + "confidence": 90 + }, + { + "id": "T07-nested-lists", + "trial": 3, + "ok": true, + "code": "next_tag() ) {\n $tag = $scanner->get_tag();\n if ( 'UL' !== $tag && 'OL' !== $tag ) {\n continue;\n }\n\n $breadcrumbs = $scanner->get_breadcrumbs();\n $is_nested = false;\n $last_index = count( $breadcrumbs ) - 1;\n\n for ( $i = 0; $i < $last_index; $i++ ) {\n if ( 'UL' === $breadcrumbs[ $i ] || 'OL' === $breadcrumbs[ $i ] ) {\n $is_nested = true;\n break;\n }\n }\n }\n\n if ( $scanner->paused_at_incomplete_token() || null !== $scanner->get_last_error() ) {\n return $html;\n }\n\n $processor = WP_HTML_Processor::create_fragment( $html );\n if ( null === $processor ) {\n return $html;\n }\n\n while ( $processor->next_tag() ) {\n $tag = $processor->get_tag();\n if ( 'UL' !== $tag && 'OL' !== $tag ) {\n continue;\n }\n\n $breadcrumbs = $processor->get_breadcrumbs();\n $is_nested = false;\n $last_index = count( $breadcrumbs ) - 1;\n\n for ( $i = 0; $i < $last_index; $i++ ) {\n if ( 'UL' === $breadcrumbs[ $i ] || 'OL' === $breadcrumbs[ $i ] ) {\n $is_nested = true;\n break;\n }\n }\n\n if ( $is_nested ) {\n $processor->add_class( 'nested-list' );\n }\n }\n\n if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) {\n return $html;\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on ancestor structure. It scans tags with `next_tag()`, identifies `UL`/`OL` elements, checks their ancestry with `get_breadcrumbs()`, and adds the class with `add_class()`. To avoid partial edits on truncated or unsupported HTML, it first performs a validation scan and returns the original fragment unchanged if `paused_at_incomplete_token()` or `get_last_error()` indicates the structure scan was not clean; otherwise it performs the same scan again and returns the byte-preserving result via `get_updated_html()`.", + "confidence": 89 + }, + { + "id": "T08-table-extract", + "trial": 1, + "ok": true, + "code": "next_tag( 'TABLE' ) ) {\n return array();\n }\n\n $table_depth = $processor->get_current_depth();\n $rows = array();\n $current_row = null;\n $current_cell = null;\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n if ( null !== $current_cell ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n continue;\n }\n\n if ( '#tag' !== $processor->get_token_type() ) {\n continue;\n }\n\n $tag_name = $processor->get_tag();\n\n if ( $processor->is_tag_closer() ) {\n if ( ( 'TD' === $tag_name || 'TH' === $tag_name ) && null !== $current_cell ) {\n $current_row[] = $current_cell;\n $current_cell = null;\n } elseif ( 'TR' === $tag_name && null !== $current_row ) {\n $rows[] = $current_row;\n $current_row = null;\n }\n\n continue;\n }\n\n if ( 'TR' === $tag_name ) {\n $current_row = array();\n continue;\n }\n\n if ( ( 'TD' === $tag_name || 'TH' === $tag_name ) && null !== $current_row ) {\n $current_cell = '';\n continue;\n }\n\n if ( null !== $current_cell && in_array( $tag_name, array( 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE' ), true ) ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n }\n\n if ( null !== $current_cell && null !== $current_row ) {\n $current_row[] = $current_cell;\n }\n\n if ( null !== $current_row ) {\n $rows[] = $current_row;\n }\n\n if ( null !== $processor->get_last_error() ) {\n return array();\n }\n\n return $rows;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the HTML fragment with browser-like table handling, finds the first `TABLE` with `next_tag()`, then walks its subtree with `next_token()` and `get_current_depth()`. It collects row boundaries from `TR` open/close tags, cell boundaries from `TD`/`TH`, appends decoded ordinary text from `#text` tokens via `get_modifiable_text()`, and also includes documented special-element text carried on `SCRIPT`, `STYLE`, `TEXTAREA`, and `TITLE` opener tokens.", + "confidence": 87 + }, + { + "id": "T08-table-extract", + "trial": 2, + "ok": true, + "code": "next_tag( 'TABLE' ) ) {\n return array();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return array();\n }\n\n $table_depth = $processor->get_current_depth();\n $rows = array();\n $current_row = null;\n $current_cell = '';\n $inside_cell = false;\n $closed_table = false;\n\n while ( $processor->next_token() ) {\n if ( $processor->get_current_depth() < $table_depth ) {\n $closed_table = true;\n break;\n }\n\n $token_type = $processor->get_token_type();\n\n if ( '#tag' === $token_type ) {\n $tag_name = $processor->get_tag();\n\n if ( $processor->is_tag_closer() ) {\n if ( ( 'TD' === $tag_name || 'TH' === $tag_name ) && $inside_cell ) {\n if ( null === $current_row ) {\n $current_row = array();\n }\n $current_row[] = $current_cell;\n $current_cell = '';\n $inside_cell = false;\n } elseif ( 'TR' === $tag_name && null !== $current_row ) {\n $rows[] = $current_row;\n $current_row = null;\n }\n\n continue;\n }\n\n if ( 'TR' === $tag_name ) {\n $current_row = array();\n continue;\n }\n\n if ( 'TD' === $tag_name || 'TH' === $tag_name ) {\n if ( null === $current_row ) {\n $current_row = array();\n }\n $current_cell = '';\n $inside_cell = true;\n continue;\n }\n\n if (\n $inside_cell &&\n ( 'SCRIPT' === $tag_name || 'STYLE' === $tag_name || 'TEXTAREA' === $tag_name || 'TITLE' === $tag_name )\n ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n\n continue;\n }\n\n if ( $inside_cell && ( '#text' === $token_type || '#cdata-section' === $token_type ) ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n }\n\n if ( ! $closed_table && ( null !== $processor->get_last_error() || $processor->paused_at_incomplete_token() ) ) {\n return array();\n }\n\n return $rows;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` so the fragment is parsed with browser-like table structure, then finds the first `TABLE` with `next_tag()`. It walks the table in one `next_token()` pass, bounded by `get_current_depth()`, opens and closes rows/cells using `get_tag()` and `is_tag_closer()`, and concatenates cell text from `#text`/`#cdata-section` tokens plus special text-carrying element openers via `get_modifiable_text()`. It also rejects unsupported or truncated parses with `get_last_error()` and `paused_at_incomplete_token()`.", + "confidence": 88 + }, + { + "id": "T08-table-extract", + "trial": 3, + "ok": true, + "code": "next_tag( 'TABLE' ) ) {\n return array();\n }\n\n $table_depth = $processor->get_current_depth();\n $rows = array();\n $current_row = null;\n $current_cell = '';\n $in_cell = false;\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) {\n $token_type = $processor->get_token_type();\n\n if ( '#text' === $token_type ) {\n if ( $in_cell ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n continue;\n }\n\n if ( '#tag' !== $token_type ) {\n continue;\n }\n\n $tag = $processor->get_tag();\n if ( null === $tag ) {\n continue;\n }\n\n if ( $processor->is_tag_closer() ) {\n if ( 'TD' === $tag || 'TH' === $tag ) {\n if ( null === $current_row ) {\n $current_row = array();\n }\n $current_row[] = $current_cell;\n $current_cell = '';\n $in_cell = false;\n } elseif ( 'TR' === $tag ) {\n if ( null !== $current_row ) {\n $rows[] = $current_row;\n $current_row = null;\n }\n }\n\n continue;\n }\n\n if ( 'TR' === $tag ) {\n $current_row = array();\n continue;\n }\n\n if ( 'TD' === $tag || 'TH' === $tag ) {\n if ( null === $current_row ) {\n $current_row = array();\n }\n $current_cell = '';\n $in_cell = true;\n continue;\n }\n\n if ( $in_cell && ( 'SCRIPT' === $tag || 'STYLE' === $tag || 'TEXTAREA' === $tag || 'TITLE' === $tag ) ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n }\n\n if ( null !== $processor->get_last_error() ) {\n return array();\n }\n\n if ( $in_cell ) {\n if ( null === $current_row ) {\n $current_row = array();\n }\n $current_row[] = $current_cell;\n }\n\n if ( null !== $current_row ) {\n $rows[] = $current_row;\n }\n\n return $rows;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, finds the first `TABLE` with `next_tag()`, then walks that table subtree with a single `next_token()` loop bounded by `get_current_depth()`. It builds rows from `TR` open/close events, collects cell contents from `TD`/`TH`, appends decoded text from `#text` tokens via `get_modifiable_text()`, and also includes special-element text carried on opener tokens (`SCRIPT`, `STYLE`, `TEXTAREA`, `TITLE`) as documented by the HTML API.", + "confidence": 87 + }, + { + "id": "T09-mark-keyword", + "trial": 1, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text = $processor->get_modifiable_text();\n if ( '' !== $text && str_contains( $text, $keyword ) ) {\n $output .= '' . $processor->serialize_token() . '';\n continue;\n }\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return '';\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment, then walks tokens with `next_token()` and rebuilds normalized HTML with `serialize_token()`. Only ordinary `#text` tokens are considered for wrapping, and their decoded text is read via `get_modifiable_text()` so character references match the keyword correctly; special text-bearing elements like `script`, `style`, `title`, and `textarea` are not wrapped because their text is carried on the element token, not as `#text` children.", + "confidence": 74 + }, + { + "id": "T09-mark-keyword", + "trial": 2, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text = $processor->get_modifiable_text();\n if ( '' !== $text && false !== strpos( $text, $keyword ) ) {\n $output .= '' . $processor->serialize_token() . '';\n continue;\n }\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return '';\n }\n\n return $output;\n}\n", + "explanation": "This walks the fragment with `WP_HTML_Processor::create_fragment()` and `next_token()`, rebuilding normalized HTML token by token with `serialize_token()`. When the current token is an ordinary `#text` node, it checks the decoded text via `get_modifiable_text()` and wraps that token’s normalized serialization in `` if it contains the case-sensitive keyword; special text-bearing elements are naturally excluded because their contents are not exposed as `#text` child tokens by the documented API.", + "confidence": 86 + }, + { + "id": "T09-mark-keyword", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() && str_contains( $processor->get_modifiable_text(), $keyword ) ) {\n $output .= '' . $processor->serialize_token() . '';\n continue;\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n $normalized = WP_HTML_Processor::normalize( $html );\n return null === $normalized ? '' : $normalized;\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-context fragment structurally, walks every token with `next_token()`, checks only ordinary `#text` tokens via `get_token_type()` and decoded text via `get_modifiable_text()`, and rewrites the normalized output token-by-token with `serialize_token()`, inserting `` wrappers around matching text nodes so special-element text, comments, and attributes are left untouched.", + "confidence": 85 + }, + { + "id": "T10-last-h2", + "trial": 1, + "ok": true, + "code": "next_tag( 'H2' ) ) {\n $processor->set_bookmark( 'last-h2' );\n $found_h2 = true;\n }\n\n if ( ! $found_h2 ) {\n return $html;\n }\n\n if ( $processor->seek( 'last-h2' ) ) {\n $processor->add_class( 'final-section' );\n }\n\n $processor->release_bookmark( 'last-h2' );\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat tag-position edit, not a tree-structure query. It scans forward with `next_tag( 'H2' )`, repeatedly moves a single bookmark with `set_bookmark()` to remember the last matching opener, then `seek()`s back to that bookmarked tag, adds the class with `add_class()`, and returns the result with `get_updated_html()`.", + "confidence": 97 + }, + { + "id": "T10-last-h2", + "trial": 2, + "ok": true, + "code": "next_tag( 'H2' ) ) {\n $processor->set_bookmark( 'last-h2' );\n $found_h2 = true;\n }\n\n if ( ! $found_h2 ) {\n return $html;\n }\n\n if ( $processor->seek( 'last-h2' ) ) {\n $processor->add_class( 'final-section' );\n }\n\n $processor->release_bookmark( 'last-h2' );\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat, position-based edit. It scans forward with `next_tag( 'H2' )`, reuses a single literal bookmark via `set_bookmark()` to remember the last matched `H2`, then `seek()`s back to that opener and adds the class with `add_class()`, finally returning the result with `get_updated_html()`.", + "confidence": 96 + }, + { + "id": "T10-last-h2", + "trial": 3, + "ok": true, + "code": "next_tag( 'H2' ) ) {\n $processor->set_bookmark( 'last-h2' );\n }\n\n if ( ! $processor->has_bookmark( 'last-h2' ) ) {\n return $html;\n }\n\n if ( $processor->seek( 'last-h2' ) ) {\n $processor->add_class( 'final-section' );\n }\n\n $processor->release_bookmark( 'last-h2' );\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat scan of the markup, since the task only needs the last `H2` opener in document order. It repeatedly calls `next_tag( 'H2' )`, moves a single bookmark with `set_bookmark()` to remember the most recent match, then `seek()`s back to that bookmarked tag and applies `add_class( 'final-section' )` before returning the result with `get_updated_html()`.", + "confidence": 96 + }, + { + "id": "T11-strip-tracking-attributes", + "trial": 1, + "ok": true, + "code": "next_tag() ) {\n $attributes = $processor->get_attribute_names_with_prefix( 'data-track-' );\n\n if ( null === $attributes ) {\n continue;\n }\n\n foreach ( $attributes as $attribute_name ) {\n $processor->remove_attribute( $attribute_name );\n }\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat attribute rewrite on every tag. It scans all opening tags with `next_tag()`, finds matching attributes with the documented `get_attribute_names_with_prefix( 'data-track-' )`, removes each one via `remove_attribute()`, and returns the modified markup with `get_updated_html()`, preserving all untouched bytes.", + "confidence": 98 + }, + { + "id": "T11-strip-tracking-attributes", + "trial": 2, + "ok": true, + "code": "next_tag() ) {\n $attribute_names = $processor->get_attribute_names_with_prefix( 'data-track-' );\n\n if ( null === $attribute_names ) {\n continue;\n }\n\n foreach ( $attribute_names as $attribute_name ) {\n $processor->remove_attribute( $attribute_name );\n }\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat scan over every tag with `next_tag()`, collects matching attribute names via `get_attribute_names_with_prefix( 'data-track-' )`, removes each one with `remove_attribute()`, and returns the edited markup with `get_updated_html()`. That prefix match is case-insensitive and only targets names beginning with `data-track-`, so similar names like `data-track` and `data-tracker` are preserved.", + "confidence": 98 + }, + { + "id": "T11-strip-tracking-attributes", + "trial": 3, + "ok": true, + "code": "next_tag() ) {\n $attributes = $processor->get_attribute_names_with_prefix( 'data-track-' );\n\n if ( null === $attributes ) {\n continue;\n }\n\n foreach ( $attributes as $attribute_name ) {\n $processor->remove_attribute( $attribute_name );\n }\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat, byte-preserving attribute edit pass: it scans every tag with `next_tag()`, finds matching attributes via `get_attribute_names_with_prefix( 'data-track-' )`, removes each one with `remove_attribute()`, and returns the modified markup with `get_updated_html()`.", + "confidence": 98 + }, + { + "id": "T12-unwrap-spans", + "trial": 1, + "ok": true, + "code": "next_token() ) {\n if ( 'SPAN' === $processor->get_tag() ) {\n continue;\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return '';\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment with full structural awareness, then walks every token with `next_token()`, skips both opening and closing `SPAN` tokens by checking `get_tag()`, and appends every other token’s normalized serialization via `serialize_token()`. That produces normalized HTML while leaving span contents in place.", + "confidence": 88 + }, + { + "id": "T12-unwrap-spans", + "trial": 2, + "ok": true, + "code": "next_token() ) {\n if ( 'SPAN' === $processor->get_tag() ) {\n continue;\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return $html;\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, walks every token with `next_token()`, skips any token whose tag is `SPAN` so both span openers and closers disappear while their child tokens still serialize, and appends all other tokens with `serialize_token()` to produce normalized HTML.", + "confidence": 87 + }, + { + "id": "T12-unwrap-spans", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n if ( '#tag' === $processor->get_token_type() && 'SPAN' === $processor->get_tag() ) {\n continue;\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return '';\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the input as a BODY fragment, then walks every token with `next_token()`. It removes span wrappers by skipping any `#tag` token whose tag name is `SPAN` (both openers and closers) and appends every other token’s normalized serialization with `serialize_token()`, producing normalized HTML while preserving the spans’ contents in place.", + "confidence": 78 + } + ] +} diff --git a/doc-experiment/results/round-42/round-metadata.json b/doc-experiment/results/round-42/round-metadata.json new file mode 100644 index 0000000000000..7c28e49a6f161 --- /dev/null +++ b/doc-experiment/results/round-42/round-metadata.json @@ -0,0 +1,403 @@ +{ + "round": "round-42", + "mode": "checkpoint", + "task_ids": [ + "H04-remove-empty-paragraphs", + "N01-remove-external-class", + "N02-collect-figure-images", + "N03-first-list-count", + "N04-normalize-or-placeholder", + "N05-document-title", + "N06-extract-toc", + "T01-add-image-class", + "T02-link-targets", + "T03-first-h1-text", + "T04-build-figure", + "T05-text-excerpt", + "T06-collect-links", + "T07-nested-lists", + "T08-table-extract", + "T09-mark-keyword", + "T10-last-h2", + "T11-strip-tracking-attributes", + "T12-unwrap-spans" + ], + "task_count": 19, + "splits": { + "holdout": 4, + "train": 15 + }, + "concepts": { + "attributes": 3, + "classes": 2, + "full-document": 1, + "normalization": 1, + "serialization": 3, + "text": 3, + "traversal": 6 + }, + "trials_per_task": 3, + "subject": { + "model": "gpt-5.4", + "reasoning_effort": "medium", + "service_tier": "priority" + }, + "judge": { + "model": "gpt-5.5", + "reasoning_effort": "xhigh", + "service_tier": "priority" + }, + "git_head": "babc0b1dfcf1dcacf0ffb53b7366e31fcd3a2450", + "git_status_short": "", + "source_file_digests": { + "ref": "babc0b1dfcf1dcacf0ffb53b7366e31fcd3a2450", + "algorithm": "sha256", + "php_behavior_fingerprint": "token_get_all with T_COMMENT, T_DOC_COMMENT, and T_WHITESPACE removed; JSON-encoded token names and text", + "files": { + "src/wp-includes/html-api/class-wp-html-tag-processor.php": { + "source_sha256": "9e431d345cc6d5563a65e3456b8a944fe1def8ceb8db299e89f20e8e9f560058", + "php_without_comments_sha256": "d073f5772a64826878e4132ed9bb30717fb1b6f3ddb5e23a014a4fa9db218ad7", + "php_without_comments_token_count": 9881 + }, + "src/wp-includes/html-api/class-wp-html-processor.php": { + "source_sha256": "b115e956af65f69b4e07c7e761ccc9a49464ba3caf1f66944ed8eb3794dce472", + "php_without_comments_sha256": "2da9aa482f295c89a7018c5b74b622b4add7e942599c97785434408cfa72c083", + "php_without_comments_token_count": 16806 + } + } + }, + "corpus_file_digests": { + "ref": "babc0b1dfcf1dcacf0ffb53b7366e31fcd3a2450", + "algorithm": "sha256", + "tasks": { + "H04-remove-empty-paragraphs": { + "labels": { + "split": "holdout", + "role": "core", + "commonness": "high", + "concept": "serialization", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/H04-remove-empty-paragraphs/task.md": "e867539d336b3157a2d010daa13a02c935409df5fa94f18e8fe31e557f9bfe36", + "doc-experiment/corpus/H04-remove-empty-paragraphs/reference.php": "5bb229b691cc6be5fe1581b452d3f2fbda159e53c35851d60f908e139f5b5fd2", + "doc-experiment/corpus/H04-remove-empty-paragraphs/tests.json": "b412fc02bd9d6727e76b891adf72ed0f821707fffe5cbb5117c0f9bd65bb3275" + } + }, + "N01-remove-external-class": { + "labels": { + "split": "holdout", + "role": "core", + "commonness": "high", + "concept": "classes", + "processor": "tag" + }, + "files": { + "doc-experiment/corpus/N01-remove-external-class/task.md": "629be59c48a4540d2a71c3f546585d4c893d1d0a2f38252de3357c032f8ff13d", + "doc-experiment/corpus/N01-remove-external-class/reference.php": "8906e16e332a860e42a849f907cabc7a52f9c669249d1a2d811bc737926aa4b0", + "doc-experiment/corpus/N01-remove-external-class/tests.json": "a8eda184edf4994ad41d32103d5d46534a6c48ce50fa86a312fa91287cc6b38c" + } + }, + "N02-collect-figure-images": { + "labels": { + "split": "holdout", + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/N02-collect-figure-images/task.md": "5680a2b952783fb0aac731ac5a6d9f3fdfb5ae405729c03e830d2e5261be685f", + "doc-experiment/corpus/N02-collect-figure-images/reference.php": "c99770d66e431924e7866e46326b6efbf508f60d820bbdd86cd7acf9431e2dc2", + "doc-experiment/corpus/N02-collect-figure-images/tests.json": "1fcf068cf48b1db68df40a910b686e1a6ef426eb3183aa11d6720fb3614c3769" + } + }, + "N03-first-list-count": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/N03-first-list-count/task.md": "a928648fe82d23863b2e1f8f0a1654ef97e6cb8fb579f3d37fdc0712caaa0082", + "doc-experiment/corpus/N03-first-list-count/reference.php": "588468ac5c4009fd1a1a27a530efa208e1ce08025d34a0e4353ea5b976d705ba", + "doc-experiment/corpus/N03-first-list-count/tests.json": "e0467ccc5475d2c87cc11eb54b4f2cb40fe4d44ef82c761f745d1cca91101314" + } + }, + "N04-normalize-or-placeholder": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "normalization", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/N04-normalize-or-placeholder/task.md": "cef7cd04d639b6b7b435be2d33f3435b8b76f0a4d83748aa7c232204746ad3d0", + "doc-experiment/corpus/N04-normalize-or-placeholder/reference.php": "e47feb6d3be887e1ee5df77e39160ffe812cb8322ca8600898de2a34e56ddeed", + "doc-experiment/corpus/N04-normalize-or-placeholder/tests.json": "417d2f770b341fc46bb8b7b46df899f1fd1e65d343aa8cc6c4e39307afcb4d18" + } + }, + "N05-document-title": { + "labels": { + "split": "holdout", + "role": "core", + "commonness": "high", + "concept": "full-document", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/N05-document-title/task.md": "a450916a3cf8d517a798e540bb580055b8f14ee3d95e13165e5ee872163f81b4", + "doc-experiment/corpus/N05-document-title/reference.php": "d8912a4752f0bb299c4ba6021e6a78514238c9c39f2b5d69f89ddb6017d408c7", + "doc-experiment/corpus/N05-document-title/tests.json": "c025fba051e1b866bef00afa9d2ec4f31d58510108235935c3755dc9bdbc6667" + } + }, + "N06-extract-toc": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/N06-extract-toc/task.md": "ba341a3927bb81410b8662b29f479f23a0887d4ec6f19db95702c1346d670581", + "doc-experiment/corpus/N06-extract-toc/reference.php": "7e5cf569519dc0411da5659e473fc773d82eb634886b5623610b696936c0b2e2", + "doc-experiment/corpus/N06-extract-toc/tests.json": "22f8950aace5d22a842a8c76df372d5d09e770cf4adefb06099eb81f7410317e" + } + }, + "T01-add-image-class": { + "labels": { + "split": "train", + "role": "smoke", + "commonness": "high", + "concept": "classes", + "processor": "tag" + }, + "files": { + "doc-experiment/corpus/T01-add-image-class/task.md": "d3b65e68db2bf26d3423b5474c3adb49b592375afc8005c127d4d20eb2740e28", + "doc-experiment/corpus/T01-add-image-class/reference.php": "8c9cf2ac9194dfce36a45c2f987d7bbfcdb334020aa94a86fb388bb1ce28171f", + "doc-experiment/corpus/T01-add-image-class/tests.json": "0ab7c7527432604d31e5e8d57e810a0167730a111c018d8a05a1bbad24617787" + } + }, + "T02-link-targets": { + "labels": { + "split": "train", + "role": "smoke", + "commonness": "high", + "concept": "attributes", + "processor": "tag" + }, + "files": { + "doc-experiment/corpus/T02-link-targets/task.md": "86f539bd48a600db4d47492e2abdf44d3048ba0463bcca0f3ed628228f82d0f8", + "doc-experiment/corpus/T02-link-targets/reference.php": "8942a4ef55241d6d1deda51046430c3d8cd7c4ca716c8bc79bdf9aa723fdb6f6", + "doc-experiment/corpus/T02-link-targets/tests.json": "c21515f7d43962b86975ccee1b3a39747170edc75ec1af26416494d312870a4a" + } + }, + "T03-first-h1-text": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T03-first-h1-text/task.md": "66a882aef20a6af4430fb6a7eafc54ba306adcf982980967e07eba7881fc0030", + "doc-experiment/corpus/T03-first-h1-text/reference.php": "a64806dcd5f7f91e254c287aad72401502aec44a7661eb691b51e56e17064f2d", + "doc-experiment/corpus/T03-first-h1-text/tests.json": "499c5da8b3d916a6739142c6ed164e8a6a05c43a6c517003bcb31d6929094533" + } + }, + "T04-build-figure": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "attributes", + "processor": "tag" + }, + "files": { + "doc-experiment/corpus/T04-build-figure/task.md": "69c807b773b41e98c0e5122bee32eca3780ccdfd123279590b4d098f8b7848e1", + "doc-experiment/corpus/T04-build-figure/reference.php": "973bcd94265496fe52fe7f9b895c266b4ae91c99460abbfd9088f1931a5d590e", + "doc-experiment/corpus/T04-build-figure/tests.json": "45cea68d9b4606600a7d57cd46e84348edb5599ec43a1fec44002424a0820d3a" + } + }, + "T05-text-excerpt": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T05-text-excerpt/task.md": "5e69766456a6343f99ee24037574ed3f9bdf7795ab8be9a61203f6254b8fa2de", + "doc-experiment/corpus/T05-text-excerpt/reference.php": "1081c297e36fa566e8a6a2cf8098fe8d3117da8ac49d4106691b9bb65b3739b6", + "doc-experiment/corpus/T05-text-excerpt/tests.json": "8b2276d754c887d6246f321b2dfad0de59718bc252c0a6bed52b65ae1bad1496" + } + }, + "T06-collect-links": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T06-collect-links/task.md": "29bc31f664b27ca880bf5cdd6a50d0da8f3828cf59d085d2897df2cb987d7d3e", + "doc-experiment/corpus/T06-collect-links/reference.php": "c0e5ae9ec5d4448a525962f1e27250a7243541d613209e3bbdc86913ff7e7a81", + "doc-experiment/corpus/T06-collect-links/tests.json": "34460376a147fa18739e7743d341479489f46a88b58b9f5aa5fe0d901f2b3140" + } + }, + "T07-nested-lists": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T07-nested-lists/task.md": "481851a0b94a81599154a82cc309e4c0e6e6204ab4270a98faa8bb60fe1401b3", + "doc-experiment/corpus/T07-nested-lists/reference.php": "f082fca4f2419435d0d8fe275068a0d486bda2359a668dc969b6cddf78100a61", + "doc-experiment/corpus/T07-nested-lists/tests.json": "79632d22e7c44696e5779df8ea2351c36e335c20aa316437a8edc885d20c40cd" + } + }, + "T08-table-extract": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "traversal", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T08-table-extract/task.md": "d7aee8a4765781a8f7c36b976f914b495b70989817ba0612aa9fc4589bedfeee", + "doc-experiment/corpus/T08-table-extract/reference.php": "289b8c6d1b69d4ef4dd7810dbdcba35445c30a7f26b631a00835ef737b78709e", + "doc-experiment/corpus/T08-table-extract/tests.json": "cdec7559d63ee541f1b0dad34a4f9f0982ce66e58ddfea98ec482c425e1b2638" + } + }, + "T09-mark-keyword": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "serialization", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T09-mark-keyword/task.md": "5abba63d508cb56284ce7a02e374d35e4cdd45958d664174eb3002c13f1c3bce", + "doc-experiment/corpus/T09-mark-keyword/reference.php": "0141ececbe459f50729e657f59dd75e8b74911cd1d70940b8fd5072d2f1efb60", + "doc-experiment/corpus/T09-mark-keyword/tests.json": "946bb96c922fea63cda3c31b94840b05ffded9f2e1dfa4b1547c755b99cda0c5" + } + }, + "T10-last-h2": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "traversal", + "processor": "tag" + }, + "files": { + "doc-experiment/corpus/T10-last-h2/task.md": "285eca1aece8e63d9ab86d1d72319e1eef13f1e184a95216b373bbab22de2d0d", + "doc-experiment/corpus/T10-last-h2/reference.php": "dabda32a9cf4ffb8a1b6ca613feb0a87791619be3bdfd2502b93f5972ba454a5", + "doc-experiment/corpus/T10-last-h2/tests.json": "4b3be0b61681e587c88a2d2ae0b9dec88306dcdad5b0f41a9b3a6a5883aceb07" + } + }, + "T11-strip-tracking-attributes": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "attributes", + "processor": "tag" + }, + "files": { + "doc-experiment/corpus/T11-strip-tracking-attributes/task.md": "6f2bc727f767a27a8f5d6f7a5e56ebd075db494c444dc0bbc8318c1bf4f8715b", + "doc-experiment/corpus/T11-strip-tracking-attributes/reference.php": "6ab0dceded92febd085f447eafd1ded777bf7ad6d184c6dc05b265ec41f420f0", + "doc-experiment/corpus/T11-strip-tracking-attributes/tests.json": "65032bd058bfa4819f040c3d8f9590224684a6cfff28434ad6708ab1f6ce94fc" + } + }, + "T12-unwrap-spans": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "serialization", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T12-unwrap-spans/task.md": "e95fcebf3ce7e2fd9004c4ff82b66a77d712b81b1317f3d020c4c89ace25eb6b", + "doc-experiment/corpus/T12-unwrap-spans/reference.php": "93d6828161817ec9a0b90765d91e7dae1c48f3b3aac35f6cad6f785b1c715797", + "doc-experiment/corpus/T12-unwrap-spans/tests.json": "fcade4b9f46bbb824b0095f58b99d4c71c8b3e696093ab0d0ae25d666aae5f53" + } + } + } + }, + "created_at_utc": "2026-06-13T15:14:24+00:00", + "isolation": { + "scratch_contains": [ + "html-tag-processor.md", + "html-processor.md", + "tasks/.md" + ], + "subjects_must_not_read": [ + "reference.php", + "tests.json", + "source files", + "logs", + "plans", + "hypothesis docs" + ] + }, + "scratch": "/tmp/html-api-docs-eval/round-42", + "staged_task_files": [ + "tasks/H04-remove-empty-paragraphs.md", + "tasks/N01-remove-external-class.md", + "tasks/N02-collect-figure-images.md", + "tasks/N03-first-list-count.md", + "tasks/N04-normalize-or-placeholder.md", + "tasks/N05-document-title.md", + "tasks/N06-extract-toc.md", + "tasks/T01-add-image-class.md", + "tasks/T02-link-targets.md", + "tasks/T03-first-h1-text.md", + "tasks/T04-build-figure.md", + "tasks/T05-text-excerpt.md", + "tasks/T06-collect-links.md", + "tasks/T07-nested-lists.md", + "tasks/T08-table-extract.md", + "tasks/T09-mark-keyword.md", + "tasks/T10-last-h2.md", + "tasks/T11-strip-tracking-attributes.md", + "tasks/T12-unwrap-spans.md" + ], + "scratch_isolation_check": "OK: /tmp/html-api-docs-eval/round-42 exposes 2 docs and 19 task prompt(s), with no forbidden files.", + "scratch_file_sha256": { + "html-processor.md": "4a4e64bbb3c43c248cb948ca752a01674a3dedc4eb77843d6fb7e63ea0a1f6ea", + "html-tag-processor.md": "8ca4ec32aa54723fe16a0a9f45386ca6ebeecde7a06575d8a782e53fb395a664", + "tasks/H04-remove-empty-paragraphs.md": "e867539d336b3157a2d010daa13a02c935409df5fa94f18e8fe31e557f9bfe36", + "tasks/N01-remove-external-class.md": "629be59c48a4540d2a71c3f546585d4c893d1d0a2f38252de3357c032f8ff13d", + "tasks/N02-collect-figure-images.md": "5680a2b952783fb0aac731ac5a6d9f3fdfb5ae405729c03e830d2e5261be685f", + "tasks/N03-first-list-count.md": "a928648fe82d23863b2e1f8f0a1654ef97e6cb8fb579f3d37fdc0712caaa0082", + "tasks/N04-normalize-or-placeholder.md": "cef7cd04d639b6b7b435be2d33f3435b8b76f0a4d83748aa7c232204746ad3d0", + "tasks/N05-document-title.md": "a450916a3cf8d517a798e540bb580055b8f14ee3d95e13165e5ee872163f81b4", + "tasks/N06-extract-toc.md": "ba341a3927bb81410b8662b29f479f23a0887d4ec6f19db95702c1346d670581", + "tasks/T01-add-image-class.md": "d3b65e68db2bf26d3423b5474c3adb49b592375afc8005c127d4d20eb2740e28", + "tasks/T02-link-targets.md": "86f539bd48a600db4d47492e2abdf44d3048ba0463bcca0f3ed628228f82d0f8", + "tasks/T03-first-h1-text.md": "66a882aef20a6af4430fb6a7eafc54ba306adcf982980967e07eba7881fc0030", + "tasks/T04-build-figure.md": "69c807b773b41e98c0e5122bee32eca3780ccdfd123279590b4d098f8b7848e1", + "tasks/T05-text-excerpt.md": "5e69766456a6343f99ee24037574ed3f9bdf7795ab8be9a61203f6254b8fa2de", + "tasks/T06-collect-links.md": "29bc31f664b27ca880bf5cdd6a50d0da8f3828cf59d085d2897df2cb987d7d3e", + "tasks/T07-nested-lists.md": "481851a0b94a81599154a82cc309e4c0e6e6204ab4270a98faa8bb60fe1401b3", + "tasks/T08-table-extract.md": "d7aee8a4765781a8f7c36b976f914b495b70989817ba0612aa9fc4589bedfeee", + "tasks/T09-mark-keyword.md": "5abba63d508cb56284ce7a02e374d35e4cdd45958d664174eb3002c13f1c3bce", + "tasks/T10-last-h2.md": "285eca1aece8e63d9ab86d1d72319e1eef13f1e184a95216b373bbab22de2d0d", + "tasks/T11-strip-tracking-attributes.md": "6f2bc727f767a27a8f5d6f7a5e56ebd075db494c444dc0bbc8318c1bf4f8715b", + "tasks/T12-unwrap-spans.md": "e95fcebf3ce7e2fd9004c4ff82b66a77d712b81b1317f3d020c4c89ace25eb6b" + } +} diff --git a/doc-experiment/results/round-42/round-summary.json b/doc-experiment/results/round-42/round-summary.json new file mode 100644 index 0000000000000..36204eb33bac7 --- /dev/null +++ b/doc-experiment/results/round-42/round-summary.json @@ -0,0 +1,704 @@ +{ + "round_score": 99.29, + "core_score": 99.21, + "by_split": { + "holdout": 98.38, + "train": 99.54 + }, + "by_concept": { + "attributes": 100.0, + "classes": 100.0, + "full-document": 96.4, + "normalization": 100.0, + "serialization": 98.93, + "text": 99.33, + "traversal": 99.23 + }, + "tasks": { + "H04-remove-empty-paragraphs": { + "score": 98.2, + "trials": [ + { + "trial": "trial-1", + "passed": 11, + "total": 11, + "adherence": 88, + "score": 96.4 + }, + { + "trial": "trial-2", + "passed": 11, + "total": 11, + "adherence": 98, + "score": 99.4 + }, + { + "trial": "trial-3", + "passed": 11, + "total": 11, + "adherence": 96, + "score": 98.8 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "serialization", + "processor": "html", + "split": "holdout" + } + }, + "N01-remove-external-class": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "classes", + "processor": "tag", + "split": "holdout" + } + }, + "N02-collect-figure-images": { + "score": 98.9, + "trials": [ + { + "trial": "trial-1", + "passed": 9, + "total": 9, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 9, + "total": 9, + "adherence": 96, + "score": 98.8 + }, + { + "trial": "trial-3", + "passed": 9, + "total": 9, + "adherence": 93, + "score": 97.9 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html", + "split": "holdout" + } + }, + "N03-first-list-count": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 11, + "total": 11, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 11, + "total": 11, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 11, + "total": 11, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html", + "split": "train" + } + }, + "N04-normalize-or-placeholder": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "normalization", + "processor": "html", + "split": "train" + } + }, + "N05-document-title": { + "score": 96.4, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 95, + "score": 98.5 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 95, + "score": 98.5 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 74, + "score": 92.2 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "full-document", + "processor": "html", + "split": "holdout" + } + }, + "N06-extract-toc": { + "score": 98.9, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 98, + "score": 99.4 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 94, + "score": 98.2 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 97, + "score": 99.1 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html", + "split": "train" + } + }, + "T01-add-image-class": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "smoke", + "commonness": "high", + "concept": "classes", + "processor": "tag", + "split": "train" + } + }, + "T02-link-targets": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "smoke", + "commonness": "high", + "concept": "attributes", + "processor": "tag", + "split": "train" + } + }, + "T03-first-h1-text": { + "score": 99.3, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 93, + "score": 97.9 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html", + "split": "train" + } + }, + "T04-build-figure": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "attributes", + "processor": "tag", + "split": "train" + } + }, + "T05-text-excerpt": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 10, + "total": 10, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 10, + "total": 10, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 10, + "total": 10, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html", + "split": "train" + } + }, + "T06-collect-links": { + "score": 98.7, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 95, + "score": 98.5 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 96, + "score": 98.8 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 96, + "score": 98.8 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html", + "split": "train" + } + }, + "T07-nested-lists": { + "score": 99.0, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 98, + "score": 99.4 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 98, + "score": 99.4 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 94, + "score": 98.2 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html", + "split": "train" + } + }, + "T08-table-extract": { + "score": 98.6, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 95, + "score": 98.5 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 96, + "score": 98.8 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 95, + "score": 98.5 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "traversal", + "processor": "html", + "split": "train" + } + }, + "T09-mark-keyword": { + "score": 99.8, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 98, + "score": 99.4 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "serialization", + "processor": "html", + "split": "train" + } + }, + "T10-last-h2": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 6, + "total": 6, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 6, + "total": 6, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 6, + "total": 6, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "traversal", + "processor": "tag", + "split": "train" + } + }, + "T11-strip-tracking-attributes": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "attributes", + "processor": "tag", + "split": "train" + } + }, + "T12-unwrap-spans": { + "score": 98.8, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 96, + "score": 98.8 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 94, + "score": 98.2 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 98, + "score": 99.4 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "serialization", + "processor": "html", + "split": "train" + } + } + }, + "round_metadata": { + "round": "round-42", + "mode": "checkpoint", + "task_ids": [ + "H04-remove-empty-paragraphs", + "N01-remove-external-class", + "N02-collect-figure-images", + "N03-first-list-count", + "N04-normalize-or-placeholder", + "N05-document-title", + "N06-extract-toc", + "T01-add-image-class", + "T02-link-targets", + "T03-first-h1-text", + "T04-build-figure", + "T05-text-excerpt", + "T06-collect-links", + "T07-nested-lists", + "T08-table-extract", + "T09-mark-keyword", + "T10-last-h2", + "T11-strip-tracking-attributes", + "T12-unwrap-spans" + ], + "task_count": 19, + "trials_per_task": 3, + "subject": { + "model": "gpt-5.4", + "reasoning_effort": "medium", + "service_tier": "priority" + }, + "judge": { + "model": "gpt-5.5", + "reasoning_effort": "xhigh", + "service_tier": "priority" + }, + "git_head": "babc0b1dfcf1dcacf0ffb53b7366e31fcd3a2450", + "git_status_short": "" + }, + "subject_isolation": { + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-42/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." + } +} diff --git a/doc-experiment/results/round-42/subject-isolation.json b/doc-experiment/results/round-42/subject-isolation.json new file mode 100644 index 0000000000000..8659a3370ed48 --- /dev/null +++ b/doc-experiment/results/round-42/subject-isolation.json @@ -0,0 +1,19 @@ +{ + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-42/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." +} diff --git a/doc-experiment/results/round-43/N03-first-list-count/judge.json b/doc-experiment/results/round-43/N03-first-list-count/judge.json new file mode 100644 index 0000000000000..aacd3d72fddc9 --- /dev/null +++ b/doc-experiment/results/round-43/N03-first-list-count/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), documented structural depth APIs, next_token(), bookmarks/seek, set_attribute(), get_updated_html(), paused_at_incomplete_token(), and get_last_error(). No _doing_it_wrong records. The extra finished_scan guard is consistent with the documented bounded subtree scan pattern." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correct processor and fully documented API surface. The depth-bounded next_token() loop, direct-child opener checks, bookmark/seek edit, and clean-scan checks match the docs' recipes. No _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation quality as trial-2: correct fragment processor, no undocumented methods, idiomatic bookmark plus depth-bounded token walk, and appropriate incomplete/unsupported fallback checks. No _doing_it_wrong records." + } + ], + "failure_analysis": "All trials passed all 11 hidden cases, so there were no failed cases to attribute to documentation gaps. The docs did unusually well for this task: the HTML Processor overview explicitly distinguishes it from the Tag Processor for structure-aware work; create_fragment() explains BODY-fragment parsing and null returns; next_tag() explains scanning for the first of multiple tag names; the 'scan a region before editing its opener' and 'test subtree membership and direct children' recipes map directly to bookmark, next_token(), depth, is_tag_closer(), get_token_type(), seek(), and clean-scan checks; get_current_depth() explains why the guard must be >= and why direct child counting must ignore closers; get_last_error() and paused_at_incomplete_token() cover unsupported markup and truncation. The only near-miss is that the correct scoped completeness policy requires combining several passages: after a bounded subtree walk, reject truncation or unsupported markup inside the region, but do not keep scanning unrelated trailing input if the target element was already closed.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::next_token() and WP_HTML_Processor::get_current_depth() docblocks", + "problem": "The scoped completeness rule is spread across multiple sections, while paused_at_incomplete_token() elsewhere says to drain all tokens for whole-document checks. This can confuse callers whose contract only depends on a completed subtree.", + "suggestion": "Add a short bounded-subtree note: once depth drops below the recorded opener depth, the walk has left that subtree; check paused_at_incomplete_token() and get_last_error() before mutating, and only drain to EOF if the caller's contract also depends on the trailing document." + }, + { + "location": "WP_HTML_Processor::get_current_depth() docblock", + "problem": "The method explains closer depth, but the direct-child element test is easier to find in the overview recipe than at the depth API itself.", + "suggestion": "Add a compact direct-child opener formula near the depth examples: require #tag, not is_tag_closer(), and current depth equal to container depth + 1." + }, + { + "location": "WP_HTML_Processor::set_attribute() docblock", + "problem": "Mutation output retrieval is documented elsewhere, but callers using HTML Processor may still reach for serialize() after set_attribute().", + "suggestion": "Add a one-line post-mutation example showing set_attribute() followed by get_updated_html(), with a cross-reference that serialize()/serialize_token() are for normalized serialization workflows, not queued attribute updates." + } + ] +} diff --git a/doc-experiment/results/round-43/N03-first-list-count/trial-1/candidate.php b/doc-experiment/results/round-43/N03-first-list-count/trial-1/candidate.php new file mode 100644 index 0000000000000..9438c424d6b6e --- /dev/null +++ b/doc-experiment/results/round-43/N03-first-list-count/trial-1/candidate.php @@ -0,0 +1,56 @@ +next_tag() ) { + $tag = $processor->get_tag(); + if ( 'UL' !== $tag && 'OL' !== $tag ) { + continue; + } + + if ( ! $processor->set_bookmark( 'first-list' ) ) { + return $html; + } + + $list_depth = $processor->get_current_depth(); + $item_count = 0; + $finished_scan = false; + + while ( $processor->next_token() ) { + if ( $processor->get_current_depth() < $list_depth ) { + $finished_scan = true; + break; + } + + if ( + '#tag' === $processor->get_token_type() && + ! $processor->is_tag_closer() && + 'LI' === $processor->get_tag() && + $processor->get_current_depth() === $list_depth + 1 + ) { + $item_count++; + } + } + + if ( ! $finished_scan || $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) { + $processor->release_bookmark( 'first-list' ); + return $html; + } + + if ( ! $processor->seek( 'first-list' ) ) { + $processor->release_bookmark( 'first-list' ); + return $html; + } + + $processor->set_attribute( 'data-item-count', (string) $item_count ); + $processor->release_bookmark( 'first-list' ); + + return $processor->get_updated_html(); + } + + return $html; +} diff --git a/doc-experiment/results/round-43/N03-first-list-count/trial-1/execution.json b/doc-experiment/results/round-43/N03-first-list-count/trial-1/execution.json new file mode 100644 index 0000000000000..68d87cfcd96ee --- /dev/null +++ b/doc-experiment/results/round-43/N03-first-list-count/trial-1/execution.json @@ -0,0 +1,107 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/N03-first-list-count/trial-1/candidate.php", + "function": "add_first_list_item_count", + "passed": 11, + "total": 11, + "cases": [ + { + "id": "simple-ul", + "status": "pass", + "expected": "
            • A
            • B
            • C
            ", + "actual": "
            • A
            • B
            • C
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ol", + "status": "pass", + "expected": "
            1. A
            2. B
            ", + "actual": "
            1. A
            2. B
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-list", + "status": "pass", + "expected": "

            No list here.

            ", + "actual": "

            No list here.

            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-count-overwritten", + "status": "pass", + "expected": "
            • A
            ", + "actual": "
            • A
            ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-li-closers", + "status": "pass", + "expected": "
            • one
            • two", + "actual": "
              • one
              • two", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-list-counts-direct-children", + "status": "pass", + "expected": "
                  • x
                • y", + "actual": "
                    • x
                  • y", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-token-inside-list", + "status": "pass", + "expected": "
                    • ", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-images", + "status": "pass", + "expected": "

                      Nothing here.

                      ", + "actual": "

                      Nothing here.

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unquoted-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-tag-at-end", + "status": "pass", + "expected": "

                      text

                      text

                      next_tag( 'img' ) ) { + $processor->add_class( 'wp-image' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-43/T01-add-image-class/trial-2/execution.json b/doc-experiment/results/round-43/T01-add-image-class/trial-2/execution.json new file mode 100644 index 0000000000000..cec8628235cb3 --- /dev/null +++ b/doc-experiment/results/round-43/T01-add-image-class/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T01-add-image-class/trial-2/candidate.php", + "function": "add_image_class", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

                      ", + "actual": "

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple", + "status": "pass", + "expected": "
                      ", + "actual": "
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-classes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-tag", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-images", + "status": "pass", + "expected": "

                      Nothing here.

                      ", + "actual": "

                      Nothing here.

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unquoted-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-tag-at-end", + "status": "pass", + "expected": "

                      text

                      text

                      next_tag( 'img' ) ) { + $processor->add_class( 'wp-image' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-43/T01-add-image-class/trial-3/execution.json b/doc-experiment/results/round-43/T01-add-image-class/trial-3/execution.json new file mode 100644 index 0000000000000..6ece4bb4d6519 --- /dev/null +++ b/doc-experiment/results/round-43/T01-add-image-class/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T01-add-image-class/trial-3/candidate.php", + "function": "add_image_class", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

                      ", + "actual": "

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple", + "status": "pass", + "expected": "
                      ", + "actual": "
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-classes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-tag", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-images", + "status": "pass", + "expected": "

                      Nothing here.

                      ", + "actual": "

                      Nothing here.

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unquoted-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-tag-at-end", + "status": "pass", + "expected": "

                      text

                      text

                      next_tag( 'A' ) ) { + if ( null === $processor->get_attribute( 'href' ) ) { + continue; + } + + $processor->set_attribute( 'target', '_blank' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-43/T02-link-targets/trial-1/execution.json b/doc-experiment/results/round-43/T02-link-targets/trial-1/execution.json new file mode 100644 index 0000000000000..29307dd1a9a1b --- /dev/null +++ b/doc-experiment/results/round-43/T02-link-targets/trial-1/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T02-link-targets/trial-1/candidate.php", + "function": "add_link_targets", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "go", + "actual": "
                      go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-skipped", + "status": "pass", + "expected": "staygo", + "actual": "staygo", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-target-overwritten", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup-in-link", + "status": "pass", + "expected": "bold move", + "actual": "bold move", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T02-link-targets/trial-1/response.json b/doc-experiment/results/round-43/T02-link-targets/trial-1/response.json new file mode 100644 index 0000000000000..898f43a238724 --- /dev/null +++ b/doc-experiment/results/round-43/T02-link-targets/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat, byte-preserving attribute edit pass. It scans each `A` tag with `next_tag( 'A' )`, checks `get_attribute( 'href' )` so empty-string and boolean `href` still count as present, overwrites or adds `target` via `set_attribute( 'target', '_blank' )`, and returns the result with `get_updated_html()`.", + "confidence": 97 +} diff --git a/doc-experiment/results/round-43/T02-link-targets/trial-2/candidate.php b/doc-experiment/results/round-43/T02-link-targets/trial-2/candidate.php new file mode 100644 index 0000000000000..972d891e974db --- /dev/null +++ b/doc-experiment/results/round-43/T02-link-targets/trial-2/candidate.php @@ -0,0 +1,14 @@ +next_tag( 'A' ) ) { + if ( null === $processor->get_attribute( 'href' ) ) { + continue; + } + + $processor->set_attribute( 'target', '_blank' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-43/T02-link-targets/trial-2/execution.json b/doc-experiment/results/round-43/T02-link-targets/trial-2/execution.json new file mode 100644 index 0000000000000..d6a027a3bb968 --- /dev/null +++ b/doc-experiment/results/round-43/T02-link-targets/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T02-link-targets/trial-2/candidate.php", + "function": "add_link_targets", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-skipped", + "status": "pass", + "expected": "staygo", + "actual": "staygo", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-target-overwritten", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup-in-link", + "status": "pass", + "expected": "bold move", + "actual": "bold move", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T02-link-targets/trial-2/response.json b/doc-experiment/results/round-43/T02-link-targets/trial-2/response.json new file mode 100644 index 0000000000000..4f46e80ceb0e8 --- /dev/null +++ b/doc-experiment/results/round-43/T02-link-targets/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for byte-preserving attribute edits: it scans only `A` tags with `next_tag( 'A' )`, checks `get_attribute( 'href' )` so empty-string and boolean `href` still count as present, overwrites or inserts `target` via `set_attribute( 'target', '_blank' )`, and returns the result with `get_updated_html()`.", + "confidence": 97 +} diff --git a/doc-experiment/results/round-43/T02-link-targets/trial-3/candidate.php b/doc-experiment/results/round-43/T02-link-targets/trial-3/candidate.php new file mode 100644 index 0000000000000..f64b40aff45c1 --- /dev/null +++ b/doc-experiment/results/round-43/T02-link-targets/trial-3/candidate.php @@ -0,0 +1,14 @@ +next_tag( 'a' ) ) { + if ( null === $processor->get_attribute( 'href' ) ) { + continue; + } + + $processor->set_attribute( 'target', '_blank' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-43/T02-link-targets/trial-3/execution.json b/doc-experiment/results/round-43/T02-link-targets/trial-3/execution.json new file mode 100644 index 0000000000000..8a7f12fad147b --- /dev/null +++ b/doc-experiment/results/round-43/T02-link-targets/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T02-link-targets/trial-3/candidate.php", + "function": "add_link_targets", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-skipped", + "status": "pass", + "expected": "staygo", + "actual": "staygo", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-target-overwritten", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup-in-link", + "status": "pass", + "expected": "bold move", + "actual": "bold move", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T02-link-targets/trial-3/response.json b/doc-experiment/results/round-43/T02-link-targets/trial-3/response.json new file mode 100644 index 0000000000000..8a1f7a50916a1 --- /dev/null +++ b/doc-experiment/results/round-43/T02-link-targets/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for byte-preserving attribute edits: it scans forward with `next_tag( 'a' )`, checks `get_attribute( 'href' ) !== null` so `href=\"\"` and boolean `href` both count as present, then overwrites or inserts `target` via `set_attribute( 'target', '_blank' )`, and finally returns the result with `get_updated_html()`.", + "confidence": 97 +} diff --git a/doc-experiment/results/round-43/T03-first-h1-text/judge.json b/doc-experiment/results/round-43/T03-first-h1-text/judge.json new file mode 100644 index 0000000000000..f7d3ae4dcf053 --- /dev/null +++ b/doc-experiment/results/round-43/T03-first-h1-text/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Used the correct tree-aware `WP_HTML_Processor::create_fragment()` path, depth-bounded `next_token()` walk, `#text` guard, and decoded `get_modifiable_text()`. All called API methods are present in the supplied markdown and execution recorded no `_doing_it_wrong`. Small adherence penalty: it opted into special-element opener text for SCRIPT/STYLE/TEXTAREA/TITLE/NOEMBED/NOFRAMES/XMP, which is documented but broader than the task's plain text-node contract and could include raw non-heading text in untested inputs." + }, + { + "trial_id": "trial-2", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Correct processor choice and essentially the documented subtree text recipe. `create_fragment`, `next_tag`, `get_current_depth`, `next_token`, `get_token_type`, `get_modifiable_text`, `is_tag_closer`, and `get_tag` are all documented; no `_doing_it_wrong` records. Minor penalty for the same unnecessary special-element branch, though this one limits itself to the four elements explicitly called out in the HTML Processor docs." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Matches the canonical documented pattern: create an HTML Processor fragment, find `H1`, record opener depth, walk tokens while depth remains in the subtree, append only `#text` token `get_modifiable_text()`. Handles decoded text, empty headings, no H1, nested markup, and end-of-input virtual closers without undocumented API use." + } + ], + "failure_analysis": "All trials passed all frozen cases, 8/8 each, and none produced `_doing_it_wrong` records. The docs did well on the core path: the 'Which processor should I use?' guidance points text/subtree work to `WP_HTML_Processor`; the 'Recipe: collect DOM-style text from a subtree' example is almost exactly this task; `get_current_depth()` explains why the guard must be `>=`; `next_token()` explains virtual closers for malformed or unclosed input; and `get_modifiable_text()` clearly says returned `#text` content is already decoded. The main near-miss is special elements. Trials 1 and 2 inferred that special element opener text should be included inside the H1 because the docs explain that SCRIPT/STYLE/TITLE/TEXTAREA carry text on the opener token. That behavior is documented, but the broader docs also say ordinary subtree text should append only `#text` tokens unless the caller explicitly opts into special-element content. The hidden cases did not exercise this distinction, so it did not become a functional failure.", + "doc_gaps": [ + { + "location": "`WP_HTML_Processor` overview, 'Recipe: collect DOM-style text from a subtree' plus `next_token()` special-element note", + "problem": "The docs contain both the correct ordinary subtree-text recipe and a nearby special-element exception. Test subjects over-applied the exception for a generic heading-text task.", + "suggestion": "Add a short decision table distinguishing ordinary text-node extraction, DOM-like textContent, and special-element content extraction. State which token types to include for each policy and when SCRIPT/STYLE raw text should be excluded." + }, + { + "location": "`WP_HTML_Processor::get_modifiable_text()`", + "problem": "`get_modifiable_text()` is easy to read as 'text content' for any token, even though comments and special element openers are not ordinary text nodes.", + "suggestion": "Repeat in the method contract that non-`#text` modifiable text is opt-in data, not a text-node match. Recommend checking `get_token_type() === '#text'` for ordinary extracted text, with explicit tag whitelists only for caller-requested special content." + }, + { + "location": "Special self-contained elements docs across Tag Processor and HTML Processor", + "problem": "The exact special-element set is split across sections, and candidates differed on whether to include deprecated rawtext elements such as NOEMBED/NOFRAMES/XMP.", + "suggestion": "Centralize the list of tokens whose text is carried on opener tokens for HTML Processor walks, including whether each returns decoded or raw text, and link to it from both `next_token()` and `get_modifiable_text()`." + } + ] +} diff --git a/doc-experiment/results/round-43/T03-first-h1-text/trial-1/candidate.php b/doc-experiment/results/round-43/T03-first-h1-text/trial-1/candidate.php new file mode 100644 index 0000000000000..67416ded23550 --- /dev/null +++ b/doc-experiment/results/round-43/T03-first-h1-text/trial-1/candidate.php @@ -0,0 +1,32 @@ +next_tag( 'H1' ) ) { + return null; + } + + $text = ''; + $h1_depth = $processor->get_current_depth(); + + while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + continue; + } + + if ( '#tag' !== $processor->get_token_type() || $processor->is_tag_closer() ) { + continue; + } + + $tag_name = $processor->get_token_name(); + if ( in_array( $tag_name, array( 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE', 'NOEMBED', 'NOFRAMES', 'XMP' ), true ) ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-43/T03-first-h1-text/trial-1/execution.json b/doc-experiment/results/round-43/T03-first-h1-text/trial-1/execution.json new file mode 100644 index 0000000000000..420efc0306b77 --- /dev/null +++ b/doc-experiment/results/round-43/T03-first-h1-text/trial-1/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T03-first-h1-text/trial-1/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T03-first-h1-text/trial-1/response.json b/doc-experiment/results/round-43/T03-first-h1-text/trial-1/response.json new file mode 100644 index 0000000000000..2fa96d79bc8a0 --- /dev/null +++ b/doc-experiment/results/round-43/T03-first-h1-text/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation. It finds the first `H1` with `next_tag()`, records that element\u2019s depth with `get_current_depth()`, then walks its subtree with `next_token()`, appending decoded text from `#text` tokens via `get_modifiable_text()`. It also includes text carried on special text-containing element opener tokens, since those do not produce separate `#text` child tokens in the HTML Processor.", + "confidence": 85 +} diff --git a/doc-experiment/results/round-43/T03-first-h1-text/trial-2/candidate.php b/doc-experiment/results/round-43/T03-first-h1-text/trial-2/candidate.php new file mode 100644 index 0000000000000..0a55cafcd9345 --- /dev/null +++ b/doc-experiment/results/round-43/T03-first-h1-text/trial-2/candidate.php @@ -0,0 +1,39 @@ +next_tag( 'H1' ) ) { + return null; + } + + $depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + continue; + } + + if ( '#tag' !== $processor->get_token_type() || $processor->is_tag_closer() ) { + continue; + } + + $tag_name = $processor->get_tag(); + if ( + 'SCRIPT' === $tag_name || + 'STYLE' === $tag_name || + 'TEXTAREA' === $tag_name || + 'TITLE' === $tag_name + ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-43/T03-first-h1-text/trial-2/execution.json b/doc-experiment/results/round-43/T03-first-h1-text/trial-2/execution.json new file mode 100644 index 0000000000000..a1dccca282841 --- /dev/null +++ b/doc-experiment/results/round-43/T03-first-h1-text/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T03-first-h1-text/trial-2/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T03-first-h1-text/trial-2/response.json b/doc-experiment/results/round-43/T03-first-h1-text/trial-2/response.json new file mode 100644 index 0000000000000..f7724c5d2ebff --- /dev/null +++ b/doc-experiment/results/round-43/T03-first-h1-text/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation. It finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, then walks that subtree with `next_token()`, concatenating decoded `#text` token content via `get_modifiable_text()`. It also includes text carried on special element opener tokens (`SCRIPT`, `STYLE`, `TEXTAREA`, `TITLE`), since those elements do not expose separate `#text` child tokens in this API.", + "confidence": 88 +} diff --git a/doc-experiment/results/round-43/T03-first-h1-text/trial-3/candidate.php b/doc-experiment/results/round-43/T03-first-h1-text/trial-3/candidate.php new file mode 100644 index 0000000000000..9853515781c4d --- /dev/null +++ b/doc-experiment/results/round-43/T03-first-h1-text/trial-3/candidate.php @@ -0,0 +1,23 @@ +next_tag( 'H1' ) ) { + return null; + } + + $h1_depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-43/T03-first-h1-text/trial-3/execution.json b/doc-experiment/results/round-43/T03-first-h1-text/trial-3/execution.json new file mode 100644 index 0000000000000..3d7302e5ef40f --- /dev/null +++ b/doc-experiment/results/round-43/T03-first-h1-text/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T03-first-h1-text/trial-3/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T03-first-h1-text/trial-3/response.json b/doc-experiment/results/round-43/T03-first-h1-text/trial-3/response.json new file mode 100644 index 0000000000000..4f7c2ad9ce2bc --- /dev/null +++ b/doc-experiment/results/round-43/T03-first-h1-text/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation, then finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, and walks its subtree with `next_token()`, concatenating only `#text` tokens via `get_token_type()` and `get_modifiable_text()`, which already returns decoded text.", + "confidence": 90 +} diff --git a/doc-experiment/results/round-43/T04-build-figure/judge.json b/doc-experiment/results/round-43/T04-build-figure/judge.json new file mode 100644 index 0000000000000..6aee5b733727a --- /dev/null +++ b/doc-experiment/results/round-43/T04-build-figure/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Tag_Processor, which is the documented fit for filling a known literal template while preserving bytes and attribute order. All called APIs are present in the rendered docs: constructor, next_tag, set_attribute, next_token, get_token_type, set_modifiable_text, and get_updated_html. The solution follows the documented template-building recipe and correctly relies on plain-string input encoding for attributes and #text." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation pattern as trial-1. It uses only documented APIs, chooses the lighter Tag Processor appropriately, predeclares src and alt in template order, walks tokens to the figcaption #text placeholder, and returns get_updated_html(). No _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation pattern as trial-1. It cleanly follows the docs' Building markup from a template example: existing attributes preserve order, placeholder text enables set_modifiable_text(), and all output is read through get_updated_html(). No undocumented calls or misuse." + } + ], + "failure_analysis": "All trials passed all seven hidden cases. The docs did especially well in the Tag Processor page under \"Which processor should I use?\", which distinguishes flat byte-preserving mutation from tree-aware parsing, and under \"Building markup from a template\", which directly explains the winning pattern: start with a literal shape, include attributes in the desired order, include placeholder text, update with set_attribute()/set_modifiable_text(), then call get_updated_html(). The set_attribute section also clearly explains that plain unescaped values are accepted and encoded, and that newly added attributes sort by name rather than call order. The get_modifiable_text/set_modifiable_text sections clarify decoded/plain text handling, preventing the common mistake of manually escaping captions or trying to parse caption HTML as markup. Near miss: the template recipe calls set_modifiable_text() without checking its return value, while the method-level docs say to always check it. In this literal-template case the invariant is strong enough, but the example slightly undercuts the defensive contract.", + "doc_gaps": [ + { + "location": "html-tag-processor.md, \"Building markup from a template\" recipe", + "problem": "The example demonstrates the exact successful pattern but does not check return values from next_tag(), set_attribute(), or set_modifiable_text(), even though set_modifiable_text() later says to always check its return value.", + "suggestion": "Either make the recipe explicitly state that the literal template guarantees these calls in the example, or show a production-safe variant that checks the cursor move and text update before returning get_updated_html()." + }, + { + "location": "html-tag-processor.md, \"Building markup from a template\" recipe", + "problem": "The recipe says the API handles necessary encoding, but the concrete examples of dangerous input are only spread across later method sections.", + "suggestion": "Add one short sentence or example line near the recipe stating that callers should pass plain decoded strings, including strings containing &, <, >, and quotes; set_attribute() and set_modifiable_text() perform the appropriate HTML encoding." + }, + { + "location": "html-tag-processor.md, set_attribute() attribute ordering notes", + "problem": "The ordering rule is documented well, but it lives primarily in set_attribute(); template construction readers may miss why empty attributes should be predeclared.", + "suggestion": "Cross-link the template recipe and set_attribute ordering note both ways, emphasizing the general contract: update existing attributes to preserve written order; newly created attributes are inserted/sorted by the processor." + } + ] +} diff --git a/doc-experiment/results/round-43/T04-build-figure/trial-1/candidate.php b/doc-experiment/results/round-43/T04-build-figure/trial-1/candidate.php new file mode 100644 index 0000000000000..bb347f67e790b --- /dev/null +++ b/doc-experiment/results/round-43/T04-build-figure/trial-1/candidate.php @@ -0,0 +1,18 @@ +
                      .
                      ' ); + + if ( $processor->next_tag( 'img' ) ) { + $processor->set_attribute( 'src', $url ); + $processor->set_attribute( 'alt', $alt ); + } + + while ( $processor->next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $processor->set_modifiable_text( $caption ); + break; + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-43/T04-build-figure/trial-1/execution.json b/doc-experiment/results/round-43/T04-build-figure/trial-1/execution.json new file mode 100644 index 0000000000000..c87922bac8a02 --- /dev/null +++ b/doc-experiment/results/round-43/T04-build-figure/trial-1/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T04-build-figure/trial-1/candidate.php", + "function": "build_figure", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "
                      \"A
                      My dog
                      ", + "actual": "
                      \"A
                      My dog
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ampersand-in-caption", + "status": "pass", + "expected": "
                      \"Pair\"
                      Fish & Chips
                      ", + "actual": "
                      \"Pair\"
                      Fish & Chips
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "quotes-in-alt", + "status": "pass", + "expected": "
                      \"The
                      Caption
                      ", + "actual": "
                      \"The
                      Caption
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "special-chars-in-url", + "status": "pass", + "expected": "
                      \"Alt\"
                      Caption
                      ", + "actual": "
                      \"Alt\"
                      Caption
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "angle-brackets-in-caption", + "status": "pass", + "expected": "
                      \"Code\"
                      Use <em> tags & enjoy
                      ", + "actual": "
                      \"Code\"
                      Use <em> tags & enjoy
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unicode", + "status": "pass", + "expected": "
                      \"Schnée
                      Winter 🌨️ scene
                      ", + "actual": "
                      \"Schnée
                      Winter 🌨️ scene
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "html-in-caption-not-parsed", + "status": "pass", + "expected": "
                      \"alt\"
                      <script>alert(1)</script>
                      ", + "actual": "
                      \"alt\"
                      <script>alert(1)</script>
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T04-build-figure/trial-1/response.json b/doc-experiment/results/round-43/T04-build-figure/trial-1/response.json new file mode 100644 index 0000000000000..5232f4395bc7a --- /dev/null +++ b/doc-experiment/results/round-43/T04-build-figure/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` with a literal template so the `img` attributes already exist in `src` then `alt` order, then updates them with `set_attribute()` so the API performs the required HTML encoding. It then scans tokens with `next_token()` to find the placeholder `#text` node inside `figcaption` and replaces it with `set_modifiable_text()`, finally returning the result with `get_updated_html()`.", + "confidence": 95 +} diff --git a/doc-experiment/results/round-43/T04-build-figure/trial-2/candidate.php b/doc-experiment/results/round-43/T04-build-figure/trial-2/candidate.php new file mode 100644 index 0000000000000..0ec02ccfdbd4c --- /dev/null +++ b/doc-experiment/results/round-43/T04-build-figure/trial-2/candidate.php @@ -0,0 +1,19 @@ +
                      .
                      ' ); + + if ( $processor->next_tag( 'img' ) ) { + $processor->set_attribute( 'src', $url ); + $processor->set_attribute( 'alt', $alt ); + } + + while ( $processor->next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $processor->set_modifiable_text( $caption ); + break; + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-43/T04-build-figure/trial-2/execution.json b/doc-experiment/results/round-43/T04-build-figure/trial-2/execution.json new file mode 100644 index 0000000000000..4f93f1dd4ce95 --- /dev/null +++ b/doc-experiment/results/round-43/T04-build-figure/trial-2/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T04-build-figure/trial-2/candidate.php", + "function": "build_figure", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "
                      \"A
                      My dog
                      ", + "actual": "
                      \"A
                      My dog
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ampersand-in-caption", + "status": "pass", + "expected": "
                      \"Pair\"
                      Fish & Chips
                      ", + "actual": "
                      \"Pair\"
                      Fish & Chips
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "quotes-in-alt", + "status": "pass", + "expected": "
                      \"The
                      Caption
                      ", + "actual": "
                      \"The
                      Caption
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "special-chars-in-url", + "status": "pass", + "expected": "
                      \"Alt\"
                      Caption
                      ", + "actual": "
                      \"Alt\"
                      Caption
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "angle-brackets-in-caption", + "status": "pass", + "expected": "
                      \"Code\"
                      Use <em> tags & enjoy
                      ", + "actual": "
                      \"Code\"
                      Use <em> tags & enjoy
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unicode", + "status": "pass", + "expected": "
                      \"Schnée
                      Winter 🌨️ scene
                      ", + "actual": "
                      \"Schnée
                      Winter 🌨️ scene
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "html-in-caption-not-parsed", + "status": "pass", + "expected": "
                      \"alt\"
                      <script>alert(1)</script>
                      ", + "actual": "
                      \"alt\"
                      <script>alert(1)</script>
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T04-build-figure/trial-2/response.json b/doc-experiment/results/round-43/T04-build-figure/trial-2/response.json new file mode 100644 index 0000000000000..dc7cb881b684e --- /dev/null +++ b/doc-experiment/results/round-43/T04-build-figure/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This builds a fixed HTML template with the required tag structure and predeclared `src` and `alt` attributes so their order stays `src` then `alt`, then uses `WP_HTML_Tag_Processor::next_tag()` and `set_attribute()` to fill those attributes and `next_token()`, `get_token_type()`, and `set_modifiable_text()` to replace a placeholder text node inside `figcaption`, relying on the HTML API to perform all necessary escaping.", + "confidence": 94 +} diff --git a/doc-experiment/results/round-43/T04-build-figure/trial-3/candidate.php b/doc-experiment/results/round-43/T04-build-figure/trial-3/candidate.php new file mode 100644 index 0000000000000..bb347f67e790b --- /dev/null +++ b/doc-experiment/results/round-43/T04-build-figure/trial-3/candidate.php @@ -0,0 +1,18 @@ +
                      .
                      ' ); + + if ( $processor->next_tag( 'img' ) ) { + $processor->set_attribute( 'src', $url ); + $processor->set_attribute( 'alt', $alt ); + } + + while ( $processor->next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $processor->set_modifiable_text( $caption ); + break; + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-43/T04-build-figure/trial-3/execution.json b/doc-experiment/results/round-43/T04-build-figure/trial-3/execution.json new file mode 100644 index 0000000000000..87186c945a4d2 --- /dev/null +++ b/doc-experiment/results/round-43/T04-build-figure/trial-3/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T04-build-figure/trial-3/candidate.php", + "function": "build_figure", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "
                      \"A
                      My dog
                      ", + "actual": "
                      \"A
                      My dog
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ampersand-in-caption", + "status": "pass", + "expected": "
                      \"Pair\"
                      Fish & Chips
                      ", + "actual": "
                      \"Pair\"
                      Fish & Chips
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "quotes-in-alt", + "status": "pass", + "expected": "
                      \"The
                      Caption
                      ", + "actual": "
                      \"The
                      Caption
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "special-chars-in-url", + "status": "pass", + "expected": "
                      \"Alt\"
                      Caption
                      ", + "actual": "
                      \"Alt\"
                      Caption
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "angle-brackets-in-caption", + "status": "pass", + "expected": "
                      \"Code\"
                      Use <em> tags & enjoy
                      ", + "actual": "
                      \"Code\"
                      Use <em> tags & enjoy
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unicode", + "status": "pass", + "expected": "
                      \"Schnée
                      Winter 🌨️ scene
                      ", + "actual": "
                      \"Schnée
                      Winter 🌨️ scene
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "html-in-caption-not-parsed", + "status": "pass", + "expected": "
                      \"alt\"
                      <script>alert(1)</script>
                      ", + "actual": "
                      \"alt\"
                      <script>alert(1)</script>
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T04-build-figure/trial-3/response.json b/doc-experiment/results/round-43/T04-build-figure/trial-3/response.json new file mode 100644 index 0000000000000..d412e298c6172 --- /dev/null +++ b/doc-experiment/results/round-43/T04-build-figure/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This builds from a fixed HTML template so the `img` attributes already exist in the required `src` then `alt` order, then uses `WP_HTML_Tag_Processor::next_tag()` and `set_attribute()` to safely encode the attribute values, `next_token()` plus `get_token_type()` to find the placeholder text node inside `figcaption`, and `set_modifiable_text()` to safely encode the caption before returning `get_updated_html()`.", + "confidence": 95 +} diff --git a/doc-experiment/results/round-43/T05-text-excerpt/judge.json b/doc-experiment/results/round-43/T05-text-excerpt/judge.json new file mode 100644 index 0000000000000..142c2b906590f --- /dev/null +++ b/doc-experiment/results/round-43/T05-text-excerpt/judge.json @@ -0,0 +1,35 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Used the correct `WP_HTML_Processor::create_fragment()` path, walked tokens with `next_token()`, read only `#text` plus whitelisted `TITLE`/`TEXTAREA` opener tokens, and relied on documented decoded `get_modifiable_text()` behavior. No `_doing_it_wrong` records." + }, + { + "trial_id": "trial-2", + "adherence": 90, + "hallucinated_methods": [], + "notes": "HTML API usage is mostly sound and all called processor methods are documented: `create_fragment`, `next_token`, `get_token_type`, `get_modifiable_text`, `is_tag_closer`, and `get_tag`. The 2/10 functional result comes from a PHP bug: `preg_match_all()` returns the number of matches, so the candidate skipped every text chunk longer than one code point. That is not an HTML API misuse." + }, + { + "trial_id": "trial-3", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Used the documented processor, token walk, token-type checks, special-element whitelist, decoded text access, and UTF-8 `mb_*` truncation. No undocumented calls or misuse records." + } + ], + "failure_analysis": "Only trial-2 failed hidden cases. The failures in `no-truncation-needed`, `truncate-mid-link`, `entities-count-decoded`, `multibyte-emoji`, `accented`, `script-excluded`, `textarea-title-counts-script-style-excluded`, and `malformed-nesting` all share the same misconception: the candidate treated `preg_match_all('/./us', $chunk, $matches)` as if success should return `1`. In PHP it returns the number of matches, so text chunks like `Just `, `Fish & Chips`, `before`, `form & field`, and `one` were discarded; only a one-codepoint whitespace chunk survived in the link/whitespace cases. The relevant HTML API docs were adequate: `WP_HTML_Processor::create_fragment()` says body fragments should use the fragment parser; `next_token()` says to use token walking when text matters and that special elements have no `#text` children; `get_modifiable_text()` says `#text`, `TITLE`, and `TEXTAREA` text is decoded UTF-8 and should be measured/sliced with an explicit encoding. This was not caused by an undocumented HTML API behavior.", + "doc_gaps": [ + { + "location": "`WP_HTML_Processor::get_modifiable_text()` inherited docs", + "problem": "The docs mention UTF-8 slicing but only show a minimal `mb_substr()` example in this rendered file; a model still reached for ad hoc regex counting.", + "suggestion": "Show paired examples for measuring and slicing decoded modifiable text with `mb_strlen( $text, 'UTF-8' )` and `mb_substr( $text, 0, $limit, 'UTF-8' )`, without making it specific to excerpts." + }, + { + "location": "`WP_HTML_Processor::next_token()` text-walking recipe", + "problem": "The docs explain ordinary `#text` collection and special-element exceptions, but the guidance is split across sections.", + "suggestion": "Add a compact cross-reference in the text-walking recipe: for mixed token loops, use `get_token_type()` to select ordinary text, and opt into `TITLE`/`TEXTAREA` opener text with `get_token_name()` plus `! is_tag_closer()` when the caller wants those special contents." + } + ] +} diff --git a/doc-experiment/results/round-43/T05-text-excerpt/trial-1/candidate.php b/doc-experiment/results/round-43/T05-text-excerpt/trial-1/candidate.php new file mode 100644 index 0000000000000..6e8de2f427193 --- /dev/null +++ b/doc-experiment/results/round-43/T05-text-excerpt/trial-1/candidate.php @@ -0,0 +1,38 @@ +next_token() ) { + $chunk = ''; + + if ( '#text' === $processor->get_token_type() ) { + $chunk = $processor->get_modifiable_text(); + } elseif ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) { + $token_name = $processor->get_token_name(); + if ( 'TEXTAREA' === $token_name || 'TITLE' === $token_name ) { + $chunk = $processor->get_modifiable_text(); + } + } + + if ( '' === $chunk ) { + continue; + } + + $text .= $chunk; + + if ( mb_strlen( $text, 'UTF-8' ) > $max_codepoints ) { + return mb_substr( $text, 0, $max_codepoints, 'UTF-8' ); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-43/T05-text-excerpt/trial-1/execution.json b/doc-experiment/results/round-43/T05-text-excerpt/trial-1/execution.json new file mode 100644 index 0000000000000..73360984ddcf7 --- /dev/null +++ b/doc-experiment/results/round-43/T05-text-excerpt/trial-1/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T05-text-excerpt/trial-1/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T05-text-excerpt/trial-1/response.json b/doc-experiment/results/round-43/T05-text-excerpt/trial-1/response.json new file mode 100644 index 0000000000000..dd233972b666f --- /dev/null +++ b/doc-experiment/results/round-43/T05-text-excerpt/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, walks tokens with `next_token()`, appends decoded text from ordinary `#text` tokens via `get_modifiable_text()`, and also explicitly includes decoded opener text for `TITLE` and `TEXTAREA` while excluding other special elements like `SCRIPT` and `STYLE`. Truncation is done with `mb_strlen()` and `mb_substr()` using explicit `UTF-8` so the result is limited by Unicode code points without splitting multibyte characters.", + "confidence": 90 +} diff --git a/doc-experiment/results/round-43/T05-text-excerpt/trial-2/candidate.php b/doc-experiment/results/round-43/T05-text-excerpt/trial-2/candidate.php new file mode 100644 index 0000000000000..e1a60c0c10224 --- /dev/null +++ b/doc-experiment/results/round-43/T05-text-excerpt/trial-2/candidate.php @@ -0,0 +1,49 @@ + 0 && $processor->next_token() ) { + $chunk = ''; + + if ( '#text' === $processor->get_token_type() ) { + $chunk = $processor->get_modifiable_text(); + } elseif ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) { + $tag_name = $processor->get_tag(); + if ( 'TEXTAREA' === $tag_name || 'TITLE' === $tag_name ) { + $chunk = $processor->get_modifiable_text(); + } + } + + if ( '' === $chunk ) { + continue; + } + + if ( 1 !== preg_match_all( '/./us', $chunk, $matches ) ) { + continue; + } + + $codepoints = $matches[0]; + $count = count( $codepoints ); + + if ( $count <= $remaining ) { + $excerpt .= $chunk; + $remaining -= $count; + continue; + } + + $excerpt .= implode( '', array_slice( $codepoints, 0, $remaining ) ); + break; + } + + return $excerpt; +} diff --git a/doc-experiment/results/round-43/T05-text-excerpt/trial-2/execution.json b/doc-experiment/results/round-43/T05-text-excerpt/trial-2/execution.json new file mode 100644 index 0000000000000..f6f579306e402 --- /dev/null +++ b/doc-experiment/results/round-43/T05-text-excerpt/trial-2/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T05-text-excerpt/trial-2/candidate.php", + "function": "html_text_excerpt", + "passed": 2, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "fail", + "expected": "Just a link to content.", + "actual": " ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "fail", + "expected": "Just a l", + "actual": " ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "fail", + "expected": "Fish &", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "fail", + "expected": "ab🌨️", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "fail", + "expected": "cafés", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "fail", + "expected": "beforeafter", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "fail", + "expected": "form & fieldDoc & TitleBody", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "fail", + "expected": "onetwotail", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T05-text-excerpt/trial-2/response.json b/doc-experiment/results/round-43/T05-text-excerpt/trial-2/response.json new file mode 100644 index 0000000000000..0ed861ba372b6 --- /dev/null +++ b/doc-experiment/results/round-43/T05-text-excerpt/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-fragment structurally, then walks tokens with `next_token()`, appending decoded text from ordinary `#text` tokens plus the special text-bearing `TITLE` and `TEXTAREA` opening tags via `get_modifiable_text()`, while naturally excluding `SCRIPT` and `STYLE`. Truncation is done by counting Unicode code points with a UTF-8 regex so multi-byte characters are never split.", + "confidence": 83 +} diff --git a/doc-experiment/results/round-43/T05-text-excerpt/trial-3/candidate.php b/doc-experiment/results/round-43/T05-text-excerpt/trial-3/candidate.php new file mode 100644 index 0000000000000..e0143302eff99 --- /dev/null +++ b/doc-experiment/results/round-43/T05-text-excerpt/trial-3/candidate.php @@ -0,0 +1,35 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + continue; + } + + if ( '#tag' !== $processor->get_token_type() || $processor->is_tag_closer() ) { + continue; + } + + $tag_name = $processor->get_token_name(); + if ( 'TITLE' === $tag_name || 'TEXTAREA' === $tag_name ) { + $text .= $processor->get_modifiable_text(); + } + } + + if ( mb_strlen( $text, 'UTF-8' ) <= $max_codepoints ) { + return $text; + } + + return mb_substr( $text, 0, $max_codepoints, 'UTF-8' ); +} diff --git a/doc-experiment/results/round-43/T05-text-excerpt/trial-3/execution.json b/doc-experiment/results/round-43/T05-text-excerpt/trial-3/execution.json new file mode 100644 index 0000000000000..dc2f433f0abb1 --- /dev/null +++ b/doc-experiment/results/round-43/T05-text-excerpt/trial-3/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T05-text-excerpt/trial-3/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T05-text-excerpt/trial-3/response.json b/doc-experiment/results/round-43/T05-text-excerpt/trial-3/response.json new file mode 100644 index 0000000000000..fdbf5be9acb0c --- /dev/null +++ b/doc-experiment/results/round-43/T05-text-excerpt/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment with structural awareness, then walks tokens with `next_token()`, appending decoded text from ordinary `#text` tokens and, per the documented special-case behavior, from `TITLE` and `TEXTAREA` opening tags via `get_modifiable_text()`. It excludes `SCRIPT` and `STYLE` by not whitelisting them, preserves whitespace exactly as reported by the parser, and truncates by Unicode code points with `mb_strlen()`/`mb_substr()` in UTF-8.", + "confidence": 95 +} diff --git a/doc-experiment/results/round-43/T06-collect-links/judge.json b/doc-experiment/results/round-43/T06-collect-links/judge.json new file mode 100644 index 0000000000000..5be34dc4d9f3e --- /dev/null +++ b/doc-experiment/results/round-43/T06-collect-links/judge.json @@ -0,0 +1,45 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), token walking, get_attribute() with is_string(), and #text + get_modifiable_text() correctly. All called APIs are documented and execution recorded no misuse. Slightly less canonical than the reference because it tracks A state manually rather than using a depth-bounded subtree walk, but this matches the docs' single-cursor/state guidance for repeated regions." + }, + { + "trial_id": "trial-2", + "adherence": 90, + "hallucinated_methods": [], + "notes": "Correct processor and documented APIs throughout. The main adherence issue is the final paused_at_incomplete_token() policy: for a read-only extraction task, returning an empty result on any trailing incomplete syntax can discard links already parsed. The docs describe that as a caller policy choice, not a default for extraction. Otherwise handles decoded href/text and valueless href correctly." + }, + { + "trial_id": "trial-3", + "adherence": 94, + "hallucinated_methods": [], + "notes": "Correct processor choice and no undocumented API calls. Uses a documented one-pass next_token() state-machine pattern and the right string-valued href check. The final get_last_error() rejection is defensible for unsupported markup, though the docs could better distinguish strict-abort extraction from best-effort partial extraction." + } + ], + "failure_analysis": "All three trials passed all 8 frozen cases. The docs did well on the essentials: 'Which processor should I use?' and create_fragment() pointed subjects to WP_HTML_Processor for BODY fragments; get_attribute() documented string|true|null, which led all trials to exclude missing and valueless hrefs with is_string(); get_modifiable_text() documented decoded #text behavior; and next_token() documented one shared cursor, virtual closers, and explicit state, which the candidates followed. Near-misses: trial-2 appears to overgeneralize the incomplete-input guidance from next_token()/paused_at_incomplete_token(), treating any trailing incomplete syntax as grounds to erase collected results. The relevant docs say this depends on caller policy, but the examples are mostly mutation/rewrite-oriented, making strict rejection look like a default. Trials also rely on closer-driven A stack state; the is_tag_closer() docs imply this works, but they do not explicitly say get_tag() still names the element being closed on real and virtual closers.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::next_token() / Recipe: collect DOM-style text from a subtree", + "problem": "The docs show single-subtree text extraction and a DT state-machine example, but not a general repeated-element extraction pattern that combines opener attributes, text accumulation, and closer finalization.", + "suggestion": "Add a generalized example for collecting data from repeated elements in one pass: record state on an opener, append only #text token get_modifiable_text(), finalize on the element closer, and explain when a depth-bounded inner walk is appropriate instead." + }, + { + "location": "WP_HTML_Tag_Processor::paused_at_incomplete_token() and WP_HTML_Processor::next_token() incomplete-input notes", + "problem": "The distinction between an unclosed element, which still gets a virtual closer, and an incomplete trailing syntax token, which sets paused_at_incomplete_token(), is easy to blur.", + "suggestion": "State explicitly that unclosed elements at EOF are structurally closed by the processor and are not necessarily 'incomplete tokens'; checking paused_at_incomplete_token() is a strict-source-completeness policy that may discard otherwise valid visited data." + }, + { + "location": "WP_HTML_Processor::get_last_error()", + "problem": "The docs explain how to detect unsupported markup, but mostly frame the response around output-producing methods like serialize()/normalize(). Extraction callers need clearer guidance on partial results.", + "suggestion": "Document that tokens visited before get_last_error() became non-null were parsed, but the traversal is incomplete; callers should choose and document a policy such as reject all, return partial results with a flag, or fall back." + }, + { + "location": "WP_HTML_Processor::is_tag_closer() / get_tag()", + "problem": "Closer-driven state machines depend on get_tag() returning the closed element name on closer tokens, including virtual closers. The docs imply this through examples but do not state the contract directly.", + "suggestion": "Add one sentence and a tiny example showing that when matched on a closer, is_tag_closer() is true, get_tag() returns the element being closed, while breadcrumbs/depth already reflect the parent context." + } + ] +} diff --git a/doc-experiment/results/round-43/T06-collect-links/trial-1/candidate.php b/doc-experiment/results/round-43/T06-collect-links/trial-1/candidate.php new file mode 100644 index 0000000000000..633ddd44bc0e1 --- /dev/null +++ b/doc-experiment/results/round-43/T06-collect-links/trial-1/candidate.php @@ -0,0 +1,46 @@ +next_token() ) { + if ( '#tag' === $processor->get_token_type() && 'A' === $processor->get_tag() ) { + if ( $processor->is_tag_closer() ) { + array_pop( $anchor_map ); + continue; + } + + $href = $processor->get_attribute( 'href' ); + if ( is_string( $href ) ) { + $links[] = array( + 'href' => $href, + 'text' => '', + ); + $anchor_map[] = count( $links ) - 1; + } else { + $anchor_map[] = null; + } + + continue; + } + + if ( '#text' === $processor->get_token_type() ) { + $text = $processor->get_modifiable_text(); + + foreach ( $anchor_map as $link_index ) { + if ( null !== $link_index ) { + $links[ $link_index ]['text'] .= $text; + } + } + } + } + + return $links; +} diff --git a/doc-experiment/results/round-43/T06-collect-links/trial-1/execution.json b/doc-experiment/results/round-43/T06-collect-links/trial-1/execution.json new file mode 100644 index 0000000000000..a9eb42b7f46ee --- /dev/null +++ b/doc-experiment/results/round-43/T06-collect-links/trial-1/execution.json @@ -0,0 +1,148 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T06-collect-links/trial-1/candidate.php", + "function": "collect_links", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "actual": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-excluded", + "status": "pass", + "expected": [ + { + "href": "/only", + "text": "real" + } + ], + "actual": [ + { + "href": "/only", + "text": "real" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-in-href-decoded", + "status": "pass", + "expected": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "actual": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-link-empty-text", + "status": "pass", + "expected": [ + { + "href": "/img", + "text": "" + } + ], + "actual": [ + { + "href": "/img", + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-text", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "actual": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-links", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-link", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "actual": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T06-collect-links/trial-1/response.json b/doc-experiment/results/round-43/T06-collect-links/trial-1/response.json new file mode 100644 index 0000000000000..c114e261380bc --- /dev/null +++ b/doc-experiment/results/round-43/T06-collect-links/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-style fragment, then walks tokens once with `next_token()`. On each opening `A` tag it reads `href` with `get_attribute()` and includes only string-valued attributes, tracks open anchors with a stack keyed to result indexes, appends decoded text from `#text` tokens via `get_modifiable_text()`, and pops tracked anchors on `A` closing tags using `is_tag_closer()`.", + "confidence": 92 +} diff --git a/doc-experiment/results/round-43/T06-collect-links/trial-2/candidate.php b/doc-experiment/results/round-43/T06-collect-links/trial-2/candidate.php new file mode 100644 index 0000000000000..6970bf50cdbf4 --- /dev/null +++ b/doc-experiment/results/round-43/T06-collect-links/trial-2/candidate.php @@ -0,0 +1,51 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text = $processor->get_modifiable_text(); + if ( '' !== $text ) { + foreach ( $open_links as $link_index ) { + if ( null !== $link_index ) { + $links[ $link_index ]['text'] .= $text; + } + } + } + continue; + } + + if ( '#tag' !== $processor->get_token_type() || 'A' !== $processor->get_tag() ) { + continue; + } + + if ( $processor->is_tag_closer() ) { + array_pop( $open_links ); + continue; + } + + $href = $processor->get_attribute( 'href' ); + if ( is_string( $href ) ) { + $links[] = array( + 'href' => $href, + 'text' => '', + ); + $open_links[] = count( $links ) - 1; + } else { + $open_links[] = null; + } + } + + if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) { + return array(); + } + + return $links; +} diff --git a/doc-experiment/results/round-43/T06-collect-links/trial-2/execution.json b/doc-experiment/results/round-43/T06-collect-links/trial-2/execution.json new file mode 100644 index 0000000000000..6050f7262db66 --- /dev/null +++ b/doc-experiment/results/round-43/T06-collect-links/trial-2/execution.json @@ -0,0 +1,148 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T06-collect-links/trial-2/candidate.php", + "function": "collect_links", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "actual": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-excluded", + "status": "pass", + "expected": [ + { + "href": "/only", + "text": "real" + } + ], + "actual": [ + { + "href": "/only", + "text": "real" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-in-href-decoded", + "status": "pass", + "expected": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "actual": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-link-empty-text", + "status": "pass", + "expected": [ + { + "href": "/img", + "text": "" + } + ], + "actual": [ + { + "href": "/img", + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-text", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "actual": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-links", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-link", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "actual": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T06-collect-links/trial-2/response.json b/doc-experiment/results/round-43/T06-collect-links/trial-2/response.json new file mode 100644 index 0000000000000..e7b85498a2001 --- /dev/null +++ b/doc-experiment/results/round-43/T06-collect-links/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with tree awareness, then walks the document once with `next_token()`. On each `A` opener it reads `href` via `get_attribute()` and records only string-valued attributes; on `#text` tokens it appends decoded text from `get_modifiable_text()` to every currently open collected link, and on `A` closers it pops the corresponding open link state. It returns an empty array if fragment parsing cannot be created, pauses on incomplete input, or aborts on unsupported markup.", + "confidence": 87 +} diff --git a/doc-experiment/results/round-43/T06-collect-links/trial-3/candidate.php b/doc-experiment/results/round-43/T06-collect-links/trial-3/candidate.php new file mode 100644 index 0000000000000..ad03d091c7b03 --- /dev/null +++ b/doc-experiment/results/round-43/T06-collect-links/trial-3/candidate.php @@ -0,0 +1,50 @@ +next_token() ) { + if ( '#tag' === $processor->get_token_type() && 'A' === $processor->get_tag() ) { + if ( $processor->is_tag_closer() ) { + array_pop( $link_stack ); + continue; + } + + $href = $processor->get_attribute( 'href' ); + if ( is_string( $href ) ) { + $links[] = array( + 'href' => $href, + 'text' => '', + ); + $link_stack[] = count( $links ) - 1; + } else { + $link_stack[] = null; + } + + continue; + } + + if ( '#text' === $processor->get_token_type() && ! empty( $link_stack ) ) { + $text = $processor->get_modifiable_text(); + + foreach ( $link_stack as $link_index ) { + if ( null !== $link_index ) { + $links[ $link_index ]['text'] .= $text; + } + } + } + } + + if ( null !== $processor->get_last_error() ) { + return array(); + } + + return $links; +} diff --git a/doc-experiment/results/round-43/T06-collect-links/trial-3/execution.json b/doc-experiment/results/round-43/T06-collect-links/trial-3/execution.json new file mode 100644 index 0000000000000..6045dc112196d --- /dev/null +++ b/doc-experiment/results/round-43/T06-collect-links/trial-3/execution.json @@ -0,0 +1,148 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T06-collect-links/trial-3/candidate.php", + "function": "collect_links", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "actual": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-excluded", + "status": "pass", + "expected": [ + { + "href": "/only", + "text": "real" + } + ], + "actual": [ + { + "href": "/only", + "text": "real" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-in-href-decoded", + "status": "pass", + "expected": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "actual": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-link-empty-text", + "status": "pass", + "expected": [ + { + "href": "/img", + "text": "" + } + ], + "actual": [ + { + "href": "/img", + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-text", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "actual": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-links", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-link", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "actual": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T06-collect-links/trial-3/response.json b/doc-experiment/results/round-43/T06-collect-links/trial-3/response.json new file mode 100644 index 0000000000000..db726b30f5216 --- /dev/null +++ b/doc-experiment/results/round-43/T06-collect-links/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment with tree awareness, then walks the document once with `next_token()`. On each `A` opener it reads `href` with `get_attribute()` and keeps only string-valued attributes, and on each `#text` token it appends decoded text from `get_modifiable_text()` to the currently open qualifying link entries until their closing `A` tags are reached.", + "confidence": 90 +} diff --git a/doc-experiment/results/round-43/T07-nested-lists/judge.json b/doc-experiment/results/round-43/T07-nested-lists/judge.json new file mode 100644 index 0000000000000..faf21eae2b685 --- /dev/null +++ b/doc-experiment/results/round-43/T07-nested-lists/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Processor::create_fragment() for structure-aware parsing. All called methods are documented in the rendered files. The implementation uses the intended token walk, get_tag(), get_breadcrumbs(), add_class(), and get_updated_html() pattern, excludes the current node from ancestor checks, handles null factory return, and checks get_last_error(). No _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correct processor choice and fully documented API usage. This is idiomatic for the task: scan openers with next_tag(), inspect breadcrumbs for ancestors, add the class with add_class(), and return get_updated_html(). It also explicitly checks paused_at_incomplete_token() and get_last_error(), which is conservative but documented. No _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly chose the HTML Processor and used only documented methods. The breadcrumb handling is clean: array_pop() removes the current list before testing ancestors. Uses add_class() and get_updated_html() appropriately, handles null factory return and unsupported parser aborts via get_last_error(). No _doing_it_wrong records." + } + ], + "failure_analysis": "All three trials passed all 7 frozen cases, and none produced _doing_it_wrong records. The docs succeeded on the main decision points: the Tag Processor page explicitly says it has no tree awareness and points structural work to WP_HTML_Processor; the HTML Processor page documents create_fragment() for body fragments; next_tag() documents opener-only walking by default; get_breadcrumbs() documents the current-node path including implicit HTML/BODY; add_class() documents class merging; and get_updated_html() documents byte-preserving output after queued edits. The only near-miss is incomplete-input policy: trial-2 rejects any paused incomplete token, while trials 1 and 3 do not. The docs describe both policies as caller-dependent, so this was not an adherence failure for this task, but it is an area where examples could make the choice more explicit for simple mutation loops.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_breadcrumbs() / Breadcrumbs section", + "problem": "The docs state that breadcrumbs include the current matched node, but they do not show the common ancestor-only idiom. This can lead models to accidentally count the current element as its own ancestor.", + "suggestion": "Add a short general note and example showing that ancestor checks should use the breadcrumb array without its last element, because the last item is the current token." + }, + { + "location": "WP_HTML_Processor::next_tag() breadcrumb query docs", + "problem": "The docs explain fixed breadcrumb sub-path matching, but do not clearly distinguish that from arbitrary ancestor membership checks or disjunctions across ancestor names.", + "suggestion": "Clarify that breadcrumb queries match a specified path shape; for conditions like 'has any ancestor matching X' or 'has one of several possible ancestors', scan matching tags and inspect get_breadcrumbs()." + }, + { + "location": "WP_HTML_Processor simple mutation examples / inherited get_updated_html() guidance", + "problem": "Incomplete-token and get_last_error() policy is documented, but mostly in region-scan and serialization contexts. For simple class/attribute mutation loops, it is less obvious whether to return updated HTML, original HTML, or null after a paused incomplete token.", + "suggestion": "Add a brief post-loop policy note for mutation examples: get_updated_html() returns queued byte-preserving edits; check get_last_error() after scanning, and check paused_at_incomplete_token() only when the caller requires complete input rather than best-effort edits to complete tokens." + } + ] +} diff --git a/doc-experiment/results/round-43/T07-nested-lists/trial-1/candidate.php b/doc-experiment/results/round-43/T07-nested-lists/trial-1/candidate.php new file mode 100644 index 0000000000000..da885084813d0 --- /dev/null +++ b/doc-experiment/results/round-43/T07-nested-lists/trial-1/candidate.php @@ -0,0 +1,37 @@ +next_tag() ) { + $tag = $processor->get_tag(); + + if ( 'UL' !== $tag && 'OL' !== $tag ) { + continue; + } + + $breadcrumbs = $processor->get_breadcrumbs(); + $last_index = count( $breadcrumbs ) - 1; + $is_nested = false; + + for ( $i = 0; $i < $last_index; $i++ ) { + if ( 'UL' === $breadcrumbs[ $i ] || 'OL' === $breadcrumbs[ $i ] ) { + $is_nested = true; + break; + } + } + + if ( $is_nested ) { + $processor->add_class( 'nested-list' ); + } + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-43/T07-nested-lists/trial-1/execution.json b/doc-experiment/results/round-43/T07-nested-lists/trial-1/execution.json new file mode 100644 index 0000000000000..7d5510bbc830b --- /dev/null +++ b/doc-experiment/results/round-43/T07-nested-lists/trial-1/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T07-nested-lists/trial-1/candidate.php", + "function": "mark_nested_lists", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple-ol-inside-ul", + "status": "pass", + "expected": "
                      • One
                        1. Nested
                      ", + "actual": "
                      • One
                        1. Nested
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "top-level-lists-untouched", + "status": "pass", + "expected": "
                      1. Top
                      • Also top
                      ", + "actual": "
                      1. Top
                      • Also top
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ul-inside-ol", + "status": "pass", + "expected": "
                      1. One
                        • Nested
                      ", + "actual": "
                      1. One
                        • Nested
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "deep-descendant", + "status": "pass", + "expected": "
                        1. Deep
                      ", + "actual": "
                        1. Deep
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class-preserved", + "status": "pass", + "expected": "
                        1. Nested
                      ", + "actual": "
                        1. Nested
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-nested-levels", + "status": "pass", + "expected": "
                      • A
                        1. B
                          • C
                      ", + "actual": "
                      • A
                        1. B
                          • C
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "mixed-document", + "status": "pass", + "expected": "

                      intro

                      • A
                        1. B
                      1. C
                      ", + "actual": "

                      intro

                      • A
                        1. B
                      1. C
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T07-nested-lists/trial-1/response.json b/doc-experiment/results/round-43/T07-nested-lists/trial-1/response.json new file mode 100644 index 0000000000000..f88d97cda7908 --- /dev/null +++ b/doc-experiment/results/round-43/T07-nested-lists/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure. It walks list openers with `next_tag()`, checks each matched `UL` or `OL` element\u2019s `get_breadcrumbs()` path for any earlier `UL` or `OL` ancestor, and applies `add_class( 'nested-list' )` only to those nested lists. The final HTML is returned with `get_updated_html()`, and unsupported-parser failure is handled by returning the original input unchanged.", + "confidence": 89 +} diff --git a/doc-experiment/results/round-43/T07-nested-lists/trial-2/candidate.php b/doc-experiment/results/round-43/T07-nested-lists/trial-2/candidate.php new file mode 100644 index 0000000000000..60a5e456ac67f --- /dev/null +++ b/doc-experiment/results/round-43/T07-nested-lists/trial-2/candidate.php @@ -0,0 +1,37 @@ +next_tag() ) { + $tag = $processor->get_tag(); + + if ( 'UL' !== $tag && 'OL' !== $tag ) { + continue; + } + + $breadcrumbs = $processor->get_breadcrumbs(); + $has_list_ancestor = false; + $last_index = count( $breadcrumbs ) - 1; + + for ( $i = 0; $i < $last_index; $i++ ) { + if ( 'UL' === $breadcrumbs[ $i ] || 'OL' === $breadcrumbs[ $i ] ) { + $has_list_ancestor = true; + break; + } + } + + if ( $has_list_ancestor ) { + $processor->add_class( 'nested-list' ); + } + } + + if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) { + return $html; + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-43/T07-nested-lists/trial-2/execution.json b/doc-experiment/results/round-43/T07-nested-lists/trial-2/execution.json new file mode 100644 index 0000000000000..840c7cfd9f50f --- /dev/null +++ b/doc-experiment/results/round-43/T07-nested-lists/trial-2/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T07-nested-lists/trial-2/candidate.php", + "function": "mark_nested_lists", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple-ol-inside-ul", + "status": "pass", + "expected": "
                      • One
                        1. Nested
                      ", + "actual": "
                      • One
                        1. Nested
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "top-level-lists-untouched", + "status": "pass", + "expected": "
                      1. Top
                      • Also top
                      ", + "actual": "
                      1. Top
                      • Also top
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ul-inside-ol", + "status": "pass", + "expected": "
                      1. One
                        • Nested
                      ", + "actual": "
                      1. One
                        • Nested
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "deep-descendant", + "status": "pass", + "expected": "
                        1. Deep
                      ", + "actual": "
                        1. Deep
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class-preserved", + "status": "pass", + "expected": "
                        1. Nested
                      ", + "actual": "
                        1. Nested
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-nested-levels", + "status": "pass", + "expected": "
                      • A
                        1. B
                          • C
                      ", + "actual": "
                      • A
                        1. B
                          • C
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "mixed-document", + "status": "pass", + "expected": "

                      intro

                      • A
                        1. B
                      1. C
                      ", + "actual": "

                      intro

                      • A
                        1. B
                      1. C
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T07-nested-lists/trial-2/response.json b/doc-experiment/results/round-43/T07-nested-lists/trial-2/response.json new file mode 100644 index 0000000000000..ee0a379fbd39d --- /dev/null +++ b/doc-experiment/results/round-43/T07-nested-lists/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure, then walks list openers with `next_tag()`, checks `get_breadcrumbs()` for any `UL` or `OL` ancestor above the current list, and applies `add_class( 'nested-list' )` only to those nested lists. It returns `get_updated_html()` for byte-preserving edits, but falls back to the original HTML if parsing was incomplete or aborted by checking `paused_at_incomplete_token()` and `get_last_error()`.", + "confidence": 88 +} diff --git a/doc-experiment/results/round-43/T07-nested-lists/trial-3/candidate.php b/doc-experiment/results/round-43/T07-nested-lists/trial-3/candidate.php new file mode 100644 index 0000000000000..b52ae478b1487 --- /dev/null +++ b/doc-experiment/results/round-43/T07-nested-lists/trial-3/candidate.php @@ -0,0 +1,32 @@ +next_tag() ) { + $tag = $processor->get_tag(); + + if ( 'UL' !== $tag && 'OL' !== $tag ) { + continue; + } + + $breadcrumbs = $processor->get_breadcrumbs(); + array_pop( $breadcrumbs ); + + foreach ( $breadcrumbs as $ancestor_tag ) { + if ( 'UL' === $ancestor_tag || 'OL' === $ancestor_tag ) { + $processor->add_class( 'nested-list' ); + break; + } + } + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-43/T07-nested-lists/trial-3/execution.json b/doc-experiment/results/round-43/T07-nested-lists/trial-3/execution.json new file mode 100644 index 0000000000000..55c4ff9aaa815 --- /dev/null +++ b/doc-experiment/results/round-43/T07-nested-lists/trial-3/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T07-nested-lists/trial-3/candidate.php", + "function": "mark_nested_lists", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple-ol-inside-ul", + "status": "pass", + "expected": "
                      • One
                        1. Nested
                      ", + "actual": "
                      • One
                        1. Nested
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "top-level-lists-untouched", + "status": "pass", + "expected": "
                      1. Top
                      • Also top
                      ", + "actual": "
                      1. Top
                      • Also top
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ul-inside-ol", + "status": "pass", + "expected": "
                      1. One
                        • Nested
                      ", + "actual": "
                      1. One
                        • Nested
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "deep-descendant", + "status": "pass", + "expected": "
                        1. Deep
                      ", + "actual": "
                        1. Deep
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class-preserved", + "status": "pass", + "expected": "
                        1. Nested
                      ", + "actual": "
                        1. Nested
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-nested-levels", + "status": "pass", + "expected": "
                      • A
                        1. B
                          • C
                      ", + "actual": "
                      • A
                        1. B
                          • C
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "mixed-document", + "status": "pass", + "expected": "

                      intro

                      • A
                        1. B
                      1. C
                      ", + "actual": "

                      intro

                      • A
                        1. B
                      1. C
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T07-nested-lists/trial-3/response.json b/doc-experiment/results/round-43/T07-nested-lists/trial-3/response.json new file mode 100644 index 0000000000000..64d5b5f064bd3 --- /dev/null +++ b/doc-experiment/results/round-43/T07-nested-lists/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure. It scans list openers with `next_tag()`, checks each matched tag name with `get_tag()`, inspects ancestor structure via `get_breadcrumbs()`, adds `nested-list` with `add_class()` when any ancestor is `UL` or `OL`, and returns the byte-preserving edited result with `get_updated_html()`. If the HTML Processor aborts on unsupported markup, it falls back to the original input by checking `get_last_error()`.", + "confidence": 92 +} diff --git a/doc-experiment/results/round-43/T08-table-extract/judge.json b/doc-experiment/results/round-43/T08-table-extract/judge.json new file mode 100644 index 0000000000000..010ea3566902e --- /dev/null +++ b/doc-experiment/results/round-43/T08-table-extract/judge.json @@ -0,0 +1,45 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 94, + "hallucinated_methods": [], + "notes": "Correctly chose WP_HTML_Processor::create_fragment(), used a single depth-bounded next_token() walk, and all called HTML API methods are documented. Slight loss for adding special-element opener modifiable text inside cells; that is documented API behavior, but the docs' ordinary subtree-text recipe says to append only #text tokens unless the caller explicitly opts in. No _doing_it_wrong records; passed 8/8." + }, + { + "trial_id": "trial-2", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Best adherence. Correct processor choice, documented methods only, #text-only extraction with get_modifiable_text(), single cursor/state-machine traversal, depth boundary, null processor handling, and get_last_error handling. Minor loss only for not making an explicit paused_at_incomplete_token policy; passed 8/8 with no _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 92, + "hallucinated_methods": [], + "notes": "Correct processor choice and documented token-walking methods, with the right depth-bounded single-loop shape. Loses points for not checking get_last_error after a structural scan and for the same special-element opener-text over-inclusion risk as trial-1. No hallucinated methods or _doing_it_wrong records; passed 8/8." + } + ], + "failure_analysis": "No hidden case failed in execution.json: all three trials passed all 8 cases, and none recorded _doing_it_wrong. The docs did well on the core decision path: the HTML Processor overview says to choose WP_HTML_Processor when structure, containment, subtree text, implied tags, and virtual closers matter; create_fragment() covers body fragments and null returns; next_token() explains virtual closers, inserted TBODY, single-cursor traversal, and avoiding nested loops for repeated regions; get_current_depth() explicitly teaches the >= subtree guard; and the DOM-style text recipe plus get_modifiable_text() led candidates to decoded #text extraction for markup and entities. The main near-miss is special-element text. Trials 1 and 3 whitelisted SCRIPT/STYLE/TEXTAREA/TITLE opener text, and trial 1 guessed additional special tags. The relevant passages document that special elements carry modifiable text on opener tokens, while the ordinary subtree-text recipe says not to include special opener text unless the caller opts in. Those facts are present, but split enough that a reader can over-apply get_modifiable_text() when a task says text content. A hidden case with special elements inside cells would diverge from the canonical #text-only interpretation, especially because SCRIPT/STYLE-like content is raw rather than decoded. A secondary near-miss is error policy: trials 1 and 2 discard accumulated rows when get_last_error() is non-null, while the reference is best-effort for already-visited tokens. The docs correctly say unsupported markup stops the parser, but they do not make partial read-only extraction policy as explicit as mutation/serialization policy.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_modifiable_text() and WP_HTML_Tag_Processor::get_modifiable_text() docs", + "problem": "The method docs emphasize that special elements expose modifiable text, but the warning that generic subtree text should usually read only #text tokens is easier to miss because it lives mostly in the overview recipe.", + "suggestion": "Add an immediate cross-reference and warning in the method docblock: for ordinary subtree text extraction, first require get_token_type() === '#text'; special-element opener text is an explicit opt-in and may be raw or decoded depending on the element." + }, + { + "location": "WP_HTML_Processor::next_token() special-elements paragraph", + "problem": "The paragraph says to read SCRIPT/STYLE/TITLE/TEXTAREA text from the opener token, but does not state the decision boundary between ordinary DOM-style text extraction and an intentionally inclusive special-element policy.", + "suggestion": "Add a compact decision table for token text: #text is ordinary decoded subtree text; TITLE/TEXTAREA opener text is opt-in decoded special text; SCRIPT/STYLE and similar opener text is opt-in raw text; comments and processing instructions are not DOM subtree text." + }, + { + "location": "Special atomic element lists in html-tag-processor.md and html-processor.md", + "problem": "The documented special-element set is not fully consistent or authoritative; candidates guessed extra tag names such as XMP/NOFRAMES after seeing broad wording like 'any other section'.", + "suggestion": "Make the special atomic element list authoritative and consistent across both processor docs, including exact tag names and raw-vs-decoded behavior, or link both docs to one shared list." + }, + { + "location": "WP_HTML_Processor::get_last_error(), create_fragment(), and next_token() docs", + "problem": "The docs say to check get_last_error() after scans, but partial read-only extraction policy is underspecified. Readers may discard already-collected data even when their caller contract would allow best-effort results, or keep partial data without realizing traversal aborted early.", + "suggestion": "Document that already-visited tokens remain usable but the tree was not fully traversed; show the two general policies: fail closed for mutations/normalization or strict completeness, and return accumulated data only when the caller explicitly accepts best-effort extraction." + } + ] +} diff --git a/doc-experiment/results/round-43/T08-table-extract/trial-1/candidate.php b/doc-experiment/results/round-43/T08-table-extract/trial-1/candidate.php new file mode 100644 index 0000000000000..e93fd3a74502d --- /dev/null +++ b/doc-experiment/results/round-43/T08-table-extract/trial-1/candidate.php @@ -0,0 +1,70 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = null; + $special_text_tags = array( 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE', 'IFRAME', 'NOSCRIPT', 'NOEMBED', 'NOFRAMES', 'XMP' ); + + while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) { + $token_type = $processor->get_token_type(); + $tag_name = $processor->get_tag(); + + if ( '#tag' === $token_type ) { + if ( 'TR' === $tag_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_row ) { + $rows[] = $current_row; + $current_row = null; + } + } else { + $current_row = array(); + } + + continue; + } + + if ( 'TD' === $tag_name || 'TH' === $tag_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_row && null !== $current_cell ) { + $current_row[] = $current_cell; + $current_cell = null; + } + } else { + $current_cell = ''; + } + + continue; + } + + if ( + null !== $current_cell && + ! $processor->is_tag_closer() && + in_array( $tag_name, $special_text_tags, true ) + ) { + $current_cell .= $processor->get_modifiable_text(); + } + + continue; + } + + if ( null !== $current_cell && '#text' === $token_type ) { + $current_cell .= $processor->get_modifiable_text(); + } + } + + if ( null !== $processor->get_last_error() ) { + return array(); + } + + return $rows; +} diff --git a/doc-experiment/results/round-43/T08-table-extract/trial-1/execution.json b/doc-experiment/results/round-43/T08-table-extract/trial-1/execution.json new file mode 100644 index 0000000000000..2b8b98ad60e2c --- /dev/null +++ b/doc-experiment/results/round-43/T08-table-extract/trial-1/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T08-table-extract/trial-1/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T08-table-extract/trial-1/response.json b/doc-experiment/results/round-43/T08-table-extract/trial-1/response.json new file mode 100644 index 0000000000000..c70e4fe8fc5fd --- /dev/null +++ b/doc-experiment/results/round-43/T08-table-extract/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, finds the first `TABLE` with `next_tag()`, then walks tokens with `next_token()` while staying inside that table by comparing `get_current_depth()`. It builds rows on `TR` open/close tokens, builds cells on `TD`/`TH` open/close tokens, and concatenates cell text from `#text` tokens via `get_modifiable_text()`, also including documented special-element text carried on opener tokens such as `TEXTAREA` and `SCRIPT`.", + "confidence": 88 +} diff --git a/doc-experiment/results/round-43/T08-table-extract/trial-2/candidate.php b/doc-experiment/results/round-43/T08-table-extract/trial-2/candidate.php new file mode 100644 index 0000000000000..f04b26d9831d8 --- /dev/null +++ b/doc-experiment/results/round-43/T08-table-extract/trial-2/candidate.php @@ -0,0 +1,82 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = null; + + while ( $processor->next_token() ) { + if ( $processor->get_current_depth() < $table_depth ) { + break; + } + + $token_type = $processor->get_token_type(); + $token_name = $processor->get_token_name(); + + if ( '#tag' === $token_type ) { + if ( 'TR' === $token_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_cell && null !== $current_row ) { + $current_row[] = $current_cell; + $current_cell = null; + } + + if ( null !== $current_row ) { + $rows[] = $current_row; + $current_row = null; + } + } else { + $current_row = array(); + $current_cell = null; + } + + continue; + } + + if ( 'TD' === $token_name || 'TH' === $token_name ) { + if ( null === $current_row ) { + continue; + } + + if ( $processor->is_tag_closer() ) { + if ( null !== $current_cell ) { + $current_row[] = $current_cell; + $current_cell = null; + } + } else { + $current_cell = ''; + } + + continue; + } + } + + if ( '#text' === $token_type && null !== $current_cell ) { + $current_cell .= $processor->get_modifiable_text(); + } + } + + if ( null !== $current_cell && null !== $current_row ) { + $current_row[] = $current_cell; + } + + if ( null !== $current_row ) { + $rows[] = $current_row; + } + + if ( null !== $processor->get_last_error() ) { + return array(); + } + + return $rows; +} diff --git a/doc-experiment/results/round-43/T08-table-extract/trial-2/execution.json b/doc-experiment/results/round-43/T08-table-extract/trial-2/execution.json new file mode 100644 index 0000000000000..a470f564db11f --- /dev/null +++ b/doc-experiment/results/round-43/T08-table-extract/trial-2/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T08-table-extract/trial-2/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T08-table-extract/trial-2/response.json b/doc-experiment/results/round-43/T08-table-extract/trial-2/response.json new file mode 100644 index 0000000000000..59e58abdc3f3d --- /dev/null +++ b/doc-experiment/results/round-43/T08-table-extract/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with browser-like table handling, `next_tag('TABLE')` to find the first table, then a single `next_token()` walk bounded by the table depth from `get_current_depth()`. It starts rows on `TR` openers, starts cells on `TD`/`TH` openers, and concatenates only `#text` token contents via `get_modifiable_text()`, which the docs say already returns decoded text.", + "confidence": 86 +} diff --git a/doc-experiment/results/round-43/T08-table-extract/trial-3/candidate.php b/doc-experiment/results/round-43/T08-table-extract/trial-3/candidate.php new file mode 100644 index 0000000000000..a14f7e79f368e --- /dev/null +++ b/doc-experiment/results/round-43/T08-table-extract/trial-3/candidate.php @@ -0,0 +1,67 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = null; + + while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) { + if ( '#text' === $processor->get_token_type() ) { + if ( null !== $current_cell ) { + $current_cell .= $processor->get_modifiable_text(); + } + continue; + } + + if ( '#tag' !== $processor->get_token_type() ) { + continue; + } + + $tag_name = $processor->get_tag(); + if ( null === $tag_name ) { + continue; + } + + if ( $processor->is_tag_closer() ) { + if ( ( 'TD' === $tag_name || 'TH' === $tag_name ) && null !== $current_row && null !== $current_cell ) { + $current_row[] = $current_cell; + $current_cell = null; + } elseif ( 'TR' === $tag_name && null !== $current_row ) { + $rows[] = $current_row; + $current_row = null; + } + continue; + } + + if ( 'TR' === $tag_name ) { + $current_row = array(); + continue; + } + + if ( 'TD' === $tag_name || 'TH' === $tag_name ) { + if ( null === $current_row ) { + $current_row = array(); + } + $current_cell = ''; + continue; + } + + if ( + null !== $current_cell && + ( 'SCRIPT' === $tag_name || 'STYLE' === $tag_name || 'TEXTAREA' === $tag_name || 'TITLE' === $tag_name ) + ) { + $current_cell .= $processor->get_modifiable_text(); + } + } + + return $rows; +} diff --git a/doc-experiment/results/round-43/T08-table-extract/trial-3/execution.json b/doc-experiment/results/round-43/T08-table-extract/trial-3/execution.json new file mode 100644 index 0000000000000..9550bae9858aa --- /dev/null +++ b/doc-experiment/results/round-43/T08-table-extract/trial-3/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T08-table-extract/trial-3/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T08-table-extract/trial-3/response.json b/doc-experiment/results/round-43/T08-table-extract/trial-3/response.json new file mode 100644 index 0000000000000..f71e18d581616 --- /dev/null +++ b/doc-experiment/results/round-43/T08-table-extract/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the fragment with browser-like table handling, finds the first `TABLE` with `next_tag()`, then walks its subtree with `next_token()` bounded by `get_current_depth()`. It builds rows on `TR` open/close, cells on `TD`/`TH` open/close, appends decoded `#text` token content via `get_modifiable_text()`, and also includes text carried on special raw-text element openers inside cells.", + "confidence": 84 +} diff --git a/doc-experiment/results/round-43/T09-mark-keyword/judge.json b/doc-experiment/results/round-43/T09-mark-keyword/judge.json new file mode 100644 index 0000000000000..5011385e2d3ea --- /dev/null +++ b/doc-experiment/results/round-43/T09-mark-keyword/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Used the correct WP_HTML_Processor fragment parser and the documented token-rewrite pattern: next_token(), #text guard, get_modifiable_text() for decoded matching, and serialize_token() for normalized output. All called HTML API methods are documented. Minor deduction: on get_last_error() it returns the original input, which the serialize_token docs explicitly warn is not normalized and discards the rewrite; no frozen case triggered that path." + }, + { + "trial_id": "trial-2", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Same implementation pattern as trial-1. Processor choice, decoded text handling, comment/attribute avoidance, split text-node behavior, special element avoidance, and normalized serialization are all aligned with the docs. Minor deduction for raw-input fallback after parser abort." + }, + { + "trial_id": "trial-3", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Same implementation pattern as trial-1. No undocumented API calls or _doing_it_wrong records. It follows the documented serialize-token rewrite recipe closely. Minor deduction for returning unnormalized raw input on unsupported parser errors." + } + ], + "failure_analysis": "All trials passed all 8 frozen cases, so there are no failed hidden cases to diagnose. The docs did well on this task: 'Which processor should I use?' points readers to WP_HTML_Processor when structure, implied closing tags, and normalized output matter; 'collect DOM-style text from a subtree' says to append only ordinary #text tokens and not use get_modifiable_text() as the text-node test; get_modifiable_text() clearly states decoded text semantics for #text/TITLE/TEXTAREA and raw semantics for SCRIPT/STYLE/comments; and serialize_token() explicitly describes token-by-token rewrites with added wrappers. The main near-miss is that every candidate copied a conservative get_last_error() fallback returning the original HTML. That is documented as preserving source bytes but not normalized output, so it would be wrong for an unsupported-markup case if the function contract still required normalized serialization. No provided test exercised unsupported-parser aborts.", + "doc_gaps": [ + { + "location": "html-processor.md / Recipe: rewrite while serializing tokens and serialize_token()", + "problem": "The docs correctly warn that returning original input discards the rewrite, but examples with string-returning functions can still lead models to choose raw-input fallback after get_last_error().", + "suggestion": "Add a short fallback policy table contrasting accumulated best-effort output, null/error sentinel, empty string, and original input, with explicit notes about which choices preserve normalization and which preserve source bytes only." + }, + { + "location": "html-processor.md / create_fragment()", + "problem": "The null-return guidance says to check before walking, but does not clarify how rare/null-producing conditions relate to the default BODY/UTF-8 path or normalized-output contracts.", + "suggestion": "Clarify that callers should choose a fallback consistent with their contract, and that returning raw input from a normalizer is not a normalized result." + }, + { + "location": "html-tag-processor.md / get_modifiable_text() and html-processor.md / serialize_token()", + "problem": "The decoded-text-read path and normalized-token-output path are documented separately; this task depended on combining them correctly.", + "suggestion": "Cross-reference the common pattern: inspect decoded get_modifiable_text() for #text matching, but emit serialize_token() when preserving normalized markup rather than rebuilding output from the decoded string." + } + ] +} diff --git a/doc-experiment/results/round-43/T09-mark-keyword/trial-1/candidate.php b/doc-experiment/results/round-43/T09-mark-keyword/trial-1/candidate.php new file mode 100644 index 0000000000000..0536af0962689 --- /dev/null +++ b/doc-experiment/results/round-43/T09-mark-keyword/trial-1/candidate.php @@ -0,0 +1,29 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text = $processor->get_modifiable_text(); + + if ( false !== strpos( $text, $keyword ) ) { + $output .= '' . $processor->serialize_token() . ''; + continue; + } + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + return $output; +} diff --git a/doc-experiment/results/round-43/T09-mark-keyword/trial-1/execution.json b/doc-experiment/results/round-43/T09-mark-keyword/trial-1/execution.json new file mode 100644 index 0000000000000..16b8f1c35c2c0 --- /dev/null +++ b/doc-experiment/results/round-43/T09-mark-keyword/trial-1/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T09-mark-keyword/trial-1/candidate.php", + "function": "mark_keyword", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple-unclosed", + "status": "pass", + "expected": "

                      hello world

                      ", + "actual": "

                      hello world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-text-nodes", + "status": "pass", + "expected": "

                      alpha beta

                      beta gamma

                      delta

                      ", + "actual": "

                      alpha beta

                      beta gamma

                      delta

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-attribute-not-wrapped", + "status": "pass", + "expected": "somewhere world", + "actual": "somewhere world", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-encoded-keyword-matches", + "status": "pass", + "expected": "

                      world peace

                      ", + "actual": "

                      world peace

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "split-across-elements-no-match", + "status": "pass", + "expected": "

                      world

                      ", + "actual": "

                      world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-comment-not-wrapped", + "status": "pass", + "expected": "

                      world

                      ", + "actual": "

                      world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive", + "status": "pass", + "expected": "

                      World world

                      ", + "actual": "

                      World world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "normalization-side-effects", + "status": "pass", + "expected": "
                      bold world

                      unclosed & markup

                      ", + "actual": "
                      bold world

                      unclosed & markup

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T09-mark-keyword/trial-1/response.json b/doc-experiment/results/round-43/T09-mark-keyword/trial-1/response.json new file mode 100644 index 0000000000000..663c0791055bd --- /dev/null +++ b/doc-experiment/results/round-43/T09-mark-keyword/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment with full structure awareness, then walks tokens with `next_token()`. For ordinary `#text` tokens only, it checks the decoded text from `get_modifiable_text()` with `strpos()` and, when matched, wraps that token\u2019s normalized serialization from `serialize_token()` in `...`; all other tokens are emitted unchanged via `serialize_token()`, which yields normalized HTML and automatically avoids wrapping special text-bearing elements because their contents are not exposed as `#text` child tokens.", + "confidence": 87 +} diff --git a/doc-experiment/results/round-43/T09-mark-keyword/trial-2/candidate.php b/doc-experiment/results/round-43/T09-mark-keyword/trial-2/candidate.php new file mode 100644 index 0000000000000..537b55a6cb09c --- /dev/null +++ b/doc-experiment/results/round-43/T09-mark-keyword/trial-2/candidate.php @@ -0,0 +1,30 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text = $processor->get_modifiable_text(); + + if ( false !== strpos( $text, $keyword ) ) { + $output .= '' . $processor->serialize_token() . ''; + continue; + } + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + return $output; +} diff --git a/doc-experiment/results/round-43/T09-mark-keyword/trial-2/execution.json b/doc-experiment/results/round-43/T09-mark-keyword/trial-2/execution.json new file mode 100644 index 0000000000000..29b81b82f920d --- /dev/null +++ b/doc-experiment/results/round-43/T09-mark-keyword/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T09-mark-keyword/trial-2/candidate.php", + "function": "mark_keyword", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple-unclosed", + "status": "pass", + "expected": "

                      hello world

                      ", + "actual": "

                      hello world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-text-nodes", + "status": "pass", + "expected": "

                      alpha beta

                      beta gamma

                      delta

                      ", + "actual": "

                      alpha beta

                      beta gamma

                      delta

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-attribute-not-wrapped", + "status": "pass", + "expected": "somewhere world", + "actual": "somewhere world", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-encoded-keyword-matches", + "status": "pass", + "expected": "

                      world peace

                      ", + "actual": "

                      world peace

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "split-across-elements-no-match", + "status": "pass", + "expected": "

                      world

                      ", + "actual": "

                      world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-comment-not-wrapped", + "status": "pass", + "expected": "

                      world

                      ", + "actual": "

                      world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive", + "status": "pass", + "expected": "

                      World world

                      ", + "actual": "

                      World world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "normalization-side-effects", + "status": "pass", + "expected": "
                      bold world

                      unclosed & markup

                      ", + "actual": "
                      bold world

                      unclosed & markup

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T09-mark-keyword/trial-2/response.json b/doc-experiment/results/round-43/T09-mark-keyword/trial-2/response.json new file mode 100644 index 0000000000000..5d133917f43ab --- /dev/null +++ b/doc-experiment/results/round-43/T09-mark-keyword/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-context fragment, then walks tokens with `next_token()`. For each ordinary `#text` token, it checks the decoded text from `get_modifiable_text()` for the case-sensitive keyword and, when matched, emits `` around that token\u2019s normalized serialization from `serialize_token()`. All other tokens are emitted unchanged via `serialize_token()`, producing normalized HTML while naturally excluding special text-bearing elements like `script`, `style`, `title`, and `textarea`, whose text is not exposed as ordinary `#text` tokens.", + "confidence": 84 +} diff --git a/doc-experiment/results/round-43/T09-mark-keyword/trial-3/candidate.php b/doc-experiment/results/round-43/T09-mark-keyword/trial-3/candidate.php new file mode 100644 index 0000000000000..c36e6e7edf7b2 --- /dev/null +++ b/doc-experiment/results/round-43/T09-mark-keyword/trial-3/candidate.php @@ -0,0 +1,29 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text = $processor->get_modifiable_text(); + + if ( false !== strpos( $text, $keyword ) ) { + $output .= '' . $processor->serialize_token() . ''; + continue; + } + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + return $output; +} diff --git a/doc-experiment/results/round-43/T09-mark-keyword/trial-3/execution.json b/doc-experiment/results/round-43/T09-mark-keyword/trial-3/execution.json new file mode 100644 index 0000000000000..0fe788fe1fcc6 --- /dev/null +++ b/doc-experiment/results/round-43/T09-mark-keyword/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T09-mark-keyword/trial-3/candidate.php", + "function": "mark_keyword", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple-unclosed", + "status": "pass", + "expected": "

                      hello world

                      ", + "actual": "

                      hello world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-text-nodes", + "status": "pass", + "expected": "

                      alpha beta

                      beta gamma

                      delta

                      ", + "actual": "

                      alpha beta

                      beta gamma

                      delta

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-attribute-not-wrapped", + "status": "pass", + "expected": "somewhere world", + "actual": "somewhere world", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-encoded-keyword-matches", + "status": "pass", + "expected": "

                      world peace

                      ", + "actual": "

                      world peace

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "split-across-elements-no-match", + "status": "pass", + "expected": "

                      world

                      ", + "actual": "

                      world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "keyword-in-comment-not-wrapped", + "status": "pass", + "expected": "

                      world

                      ", + "actual": "

                      world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive", + "status": "pass", + "expected": "

                      World world

                      ", + "actual": "

                      World world

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "normalization-side-effects", + "status": "pass", + "expected": "
                      bold world

                      unclosed & markup

                      ", + "actual": "
                      bold world

                      unclosed & markup

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T09-mark-keyword/trial-3/response.json b/doc-experiment/results/round-43/T09-mark-keyword/trial-3/response.json new file mode 100644 index 0000000000000..ba9a75eef81ab --- /dev/null +++ b/doc-experiment/results/round-43/T09-mark-keyword/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` and a token-by-token rewrite with `next_token()`, `get_token_type()`, `get_modifiable_text()`, and `serialize_token()`. Ordinary `#text` tokens are matched against the decoded text and wrapped with literal `` tags, while every other token is emitted via `serialize_token()`, which preserves the HTML Processor\u2019s normalized serialization and automatically excludes special text-bearing elements because their contents are not exposed as ordinary `#text` child tokens.", + "confidence": 82 +} diff --git a/doc-experiment/results/round-43/T10-last-h2/judge.json b/doc-experiment/results/round-43/T10-last-h2/judge.json new file mode 100644 index 0000000000000..d1000fd050c25 --- /dev/null +++ b/doc-experiment/results/round-43/T10-last-h2/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Tag_Processor for a flat class edit. All called APIs are documented: constructor, next_tag, set_bookmark, seek, add_class, release_bookmark, get_updated_html. The repeated single bookmark is idiomatic and all 6 hidden cases passed with no _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correct processor choice and fully documented API usage: constructor, next_tag, set_bookmark, has_bookmark, seek, add_class, release_bookmark, get_updated_html. This closely matches the documented bookmark pattern for remembering the last matched tag. All 6 hidden cases passed." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Tag_Processor and only documented methods. The has_bookmark/seek/add_class/get_updated_html flow is idiomatic, preserves existing classes via add_class, and handles the no-H2 case unchanged. All 6 hidden cases passed." + } + ], + "failure_analysis": "All trials passed every hidden case, so there were no failed-case misconceptions to attribute. The docs did especially well in three places: the WP_HTML_Tag_Processor introduction says this class is appropriate for flat attribute/class edits and is constructed with new WP_HTML_Tag_Processor($html); next_tag() documents forward-only token walking and case-insensitive tag-name queries; and set_bookmark() explicitly describes the common use of re-setting one named bookmark to remember the last matching tag before seeking back to edit it. The add_class() section also covered the existing-class case by stating that it creates class when absent and appends without removing or reordering existing classes. A near-miss is that candidates generally did not check set_bookmark()'s return value, but because they used one literal bookmark name this stayed within the documented safe idiom and caused no misuse.", + "doc_gaps": [ + { + "location": "html-tag-processor.md / set_bookmark()", + "problem": "The return value is documented, but examples that rely on one literal bookmark name do not show whether callers should check set_bookmark() failure in ordinary single-bookmark loops.", + "suggestion": "Clarify that reusing one literal bookmark name is expected to succeed unless the processor cannot allocate/bookmark the current token, and show a compact pattern either checking the boolean or using has_bookmark() after the scan." + }, + { + "location": "html-tag-processor.md / next_tag()", + "problem": "The docs explain incomplete-token behavior and that comments/text are not tags, but this is spread across several sections.", + "suggestion": "Add a short note near the string-query examples that next_tag('H2') matches real H2 tag openers only, not text inside comments or incomplete trailing syntax." + }, + { + "location": "html-tag-processor.md / add_class()", + "problem": "The behavior for existing classes is well described in prose, but the examples could make the append-preserve contract more visible.", + "suggestion": "Add a minimal before/after example showing add_class() on an element with an existing class attribute, emphasizing that existing class order is preserved and the new class is appended." + } + ] +} diff --git a/doc-experiment/results/round-43/T10-last-h2/trial-1/candidate.php b/doc-experiment/results/round-43/T10-last-h2/trial-1/candidate.php new file mode 100644 index 0000000000000..debc05e576b84 --- /dev/null +++ b/doc-experiment/results/round-43/T10-last-h2/trial-1/candidate.php @@ -0,0 +1,23 @@ +next_tag( 'H2' ) ) { + $processor->set_bookmark( 'last-h2' ); + $found_h2 = true; + } + + if ( ! $found_h2 ) { + return $html; + } + + if ( $processor->seek( 'last-h2' ) ) { + $processor->add_class( 'final-section' ); + } + + $processor->release_bookmark( 'last-h2' ); + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-43/T10-last-h2/trial-1/execution.json b/doc-experiment/results/round-43/T10-last-h2/trial-1/execution.json new file mode 100644 index 0000000000000..034c47a09c3ce --- /dev/null +++ b/doc-experiment/results/round-43/T10-last-h2/trial-1/execution.json @@ -0,0 +1,62 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T10-last-h2/trial-1/candidate.php", + "function": "mark_last_h2", + "passed": 6, + "total": 6, + "cases": [ + { + "id": "two-headings", + "status": "pass", + "expected": "

                      One

                      a

                      Two

                      b

                      ", + "actual": "

                      One

                      a

                      Two

                      b

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "single-heading", + "status": "pass", + "expected": "

                      Only

                      ", + "actual": "

                      Only

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-headings-unchanged", + "status": "pass", + "expected": "

                      nothing

                      ", + "actual": "

                      nothing

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "many-headings", + "status": "pass", + "expected": "

                      1

                      2

                      3

                      4

                      5

                      6

                      7

                      8

                      9

                      10

                      11

                      12

                      ", + "actual": "

                      1

                      2

                      3

                      4

                      5

                      6

                      7

                      8

                      9

                      10

                      11

                      12

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comment-h2-not-counted", + "status": "pass", + "expected": "

                      Real

                      ", + "actual": "

                      Real

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class", + "status": "pass", + "expected": "

                      A

                      B

                      ", + "actual": "

                      A

                      B

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T10-last-h2/trial-1/response.json b/doc-experiment/results/round-43/T10-last-h2/trial-1/response.json new file mode 100644 index 0000000000000..15f5bb1fa7a50 --- /dev/null +++ b/doc-experiment/results/round-43/T10-last-h2/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat position-based edit: it scans forward with `next_tag( 'H2' )`, reuses a single bookmark via `set_bookmark()` to remember the last matching opener, seeks back to that bookmark with `seek()`, adds the class with `add_class()`, and returns the modified markup with `get_updated_html()`.", + "confidence": 95 +} diff --git a/doc-experiment/results/round-43/T10-last-h2/trial-2/candidate.php b/doc-experiment/results/round-43/T10-last-h2/trial-2/candidate.php new file mode 100644 index 0000000000000..a71b3f81171df --- /dev/null +++ b/doc-experiment/results/round-43/T10-last-h2/trial-2/candidate.php @@ -0,0 +1,17 @@ +next_tag( 'H2' ) ) { + $processor->set_bookmark( 'last-h2' ); + } + + if ( $processor->has_bookmark( 'last-h2' ) && $processor->seek( 'last-h2' ) ) { + $processor->add_class( 'final-section' ); + $processor->release_bookmark( 'last-h2' ); + + return $processor->get_updated_html(); + } + + return $html; +} diff --git a/doc-experiment/results/round-43/T10-last-h2/trial-2/execution.json b/doc-experiment/results/round-43/T10-last-h2/trial-2/execution.json new file mode 100644 index 0000000000000..f7318a4d2f282 --- /dev/null +++ b/doc-experiment/results/round-43/T10-last-h2/trial-2/execution.json @@ -0,0 +1,62 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T10-last-h2/trial-2/candidate.php", + "function": "mark_last_h2", + "passed": 6, + "total": 6, + "cases": [ + { + "id": "two-headings", + "status": "pass", + "expected": "

                      One

                      a

                      Two

                      b

                      ", + "actual": "

                      One

                      a

                      Two

                      b

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "single-heading", + "status": "pass", + "expected": "

                      Only

                      ", + "actual": "

                      Only

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-headings-unchanged", + "status": "pass", + "expected": "

                      nothing

                      ", + "actual": "

                      nothing

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "many-headings", + "status": "pass", + "expected": "

                      1

                      2

                      3

                      4

                      5

                      6

                      7

                      8

                      9

                      10

                      11

                      12

                      ", + "actual": "

                      1

                      2

                      3

                      4

                      5

                      6

                      7

                      8

                      9

                      10

                      11

                      12

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comment-h2-not-counted", + "status": "pass", + "expected": "

                      Real

                      ", + "actual": "

                      Real

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class", + "status": "pass", + "expected": "

                      A

                      B

                      ", + "actual": "

                      A

                      B

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T10-last-h2/trial-2/response.json b/doc-experiment/results/round-43/T10-last-h2/trial-2/response.json new file mode 100644 index 0000000000000..a08c0cbaeb06e --- /dev/null +++ b/doc-experiment/results/round-43/T10-last-h2/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat position-based edit: scan forward with `next_tag( 'H2' )`, move a single bookmark named `last-h2` each time to remember the latest matching opener, then `seek()` back to that bookmark and call `add_class( 'final-section' )`. If no `H2` bookmark was ever set, the original HTML is returned unchanged.", + "confidence": 97 +} diff --git a/doc-experiment/results/round-43/T10-last-h2/trial-3/candidate.php b/doc-experiment/results/round-43/T10-last-h2/trial-3/candidate.php new file mode 100644 index 0000000000000..f9c9cd1d41b97 --- /dev/null +++ b/doc-experiment/results/round-43/T10-last-h2/trial-3/candidate.php @@ -0,0 +1,20 @@ +next_tag( 'H2' ) ) { + $processor->set_bookmark( 'last-h2' ); + } + + if ( ! $processor->has_bookmark( 'last-h2' ) ) { + return $html; + } + + if ( $processor->seek( 'last-h2' ) ) { + $processor->add_class( 'final-section' ); + } + + $processor->release_bookmark( 'last-h2' ); + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-43/T10-last-h2/trial-3/execution.json b/doc-experiment/results/round-43/T10-last-h2/trial-3/execution.json new file mode 100644 index 0000000000000..f7df79c5886c8 --- /dev/null +++ b/doc-experiment/results/round-43/T10-last-h2/trial-3/execution.json @@ -0,0 +1,62 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T10-last-h2/trial-3/candidate.php", + "function": "mark_last_h2", + "passed": 6, + "total": 6, + "cases": [ + { + "id": "two-headings", + "status": "pass", + "expected": "

                      One

                      a

                      Two

                      b

                      ", + "actual": "

                      One

                      a

                      Two

                      b

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "single-heading", + "status": "pass", + "expected": "

                      Only

                      ", + "actual": "

                      Only

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-headings-unchanged", + "status": "pass", + "expected": "

                      nothing

                      ", + "actual": "

                      nothing

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "many-headings", + "status": "pass", + "expected": "

                      1

                      2

                      3

                      4

                      5

                      6

                      7

                      8

                      9

                      10

                      11

                      12

                      ", + "actual": "

                      1

                      2

                      3

                      4

                      5

                      6

                      7

                      8

                      9

                      10

                      11

                      12

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comment-h2-not-counted", + "status": "pass", + "expected": "

                      Real

                      ", + "actual": "

                      Real

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class", + "status": "pass", + "expected": "

                      A

                      B

                      ", + "actual": "

                      A

                      B

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T10-last-h2/trial-3/response.json b/doc-experiment/results/round-43/T10-last-h2/trial-3/response.json new file mode 100644 index 0000000000000..374aa4e74c8ab --- /dev/null +++ b/doc-experiment/results/round-43/T10-last-h2/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat, position-based edit. It scans forward with `next_tag( 'H2' )`, repeatedly moves the same bookmark with `set_bookmark()` to remember the last matching heading, then `seek()`s back to that bookmarked tag, adds the class with `add_class()`, and returns the edited markup via `get_updated_html()`.", + "confidence": 95 +} diff --git a/doc-experiment/results/round-43/T11-strip-tracking-attributes/judge.json b/doc-experiment/results/round-43/T11-strip-tracking-attributes/judge.json new file mode 100644 index 0000000000000..5379d5f1e4098 --- /dev/null +++ b/doc-experiment/results/round-43/T11-strip-tracking-attributes/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Tag_Processor, next_tag(), get_attribute_names_with_prefix(), remove_attribute(), and get_updated_html(), all documented in the rendered Tag Processor docs. This is the correct flat attribute-editing processor choice, uses the documented prefix helper, preserves untouched bytes via get_updated_html(), handles the null return, and produced no _doing_it_wrong records. Execution passed 7/7." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation as trial-1. Correct processor and documented API only; idiomatic linear tag scan plus queued attribute removals and get_updated_html(). No misuse records. Execution passed 7/7." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation as trial-1. Correct Tag Processor use for byte-preserving attribute edits, documented prefix enumeration, documented removal, and documented final serialization through get_updated_html(). No misuse records. Execution passed 7/7." + } + ], + "failure_analysis": "No hidden case failed in any trial. All trials passed single-link, multiple-tags, multiple-matching-attributes, similar-prefixes-kept, uppercase-source-attribute, comments-untouched, and no-matches. The docs did well in four places: the Tag Processor Overview / Which processor should I use? section explicitly says to use the Tag Processor for flat attribute and class edits with byte-exact preservation; next_tag() says it visits real tags while ignoring tag-like text in comments/raw text and preserving source casing; get_attribute_names_with_prefix() directly documents the needed helper, lowercase returned names, and case-insensitive matching; get_updated_html() explains that queued attribute edits are read back without normalizing untouched bytes. Near-misses were not failure-causing: the prefix helper return contract could be more explicit about empty array versus null, remove_attribute() could state its case-insensitive name matching in its own method docs, and the HTML Processor copy of inherited attribute methods could call out virtual-token behavior more clearly.", + "doc_gaps": [ + { + "location": "WP_HTML_Tag_Processor::get_attribute_names_with_prefix() and WP_HTML_Processor::get_attribute_names_with_prefix()", + "problem": "The return docs say null is returned when no tag opener is matched, but they do not explicitly state that a matched opener with zero matching attributes returns an empty array.", + "suggestion": "Add a sentence such as: \"Returns an empty array when currently matched on a real tag opener but no attribute names start with the prefix; returns null only when not matched on an eligible opener.\"" + }, + { + "location": "WP_HTML_Tag_Processor::remove_attribute()", + "problem": "The method-level doc does not state that attribute-name matching is ASCII case-insensitive/lowercased, even though this matters for source attributes written with uppercase or mixed-case names.", + "suggestion": "Add the same case-insensitive attribute-name contract used by the prefix helper, and mention that duplicate case-variant attributes in invalid source are removed together." + }, + { + "location": "WP_HTML_Processor inherited attribute method docs", + "problem": "The HTML Processor override for get_attribute_names_with_prefix() returns null on virtual tokens, but the rendered method text only mentions the no-opener case. This could confuse users doing structural walks over implied elements.", + "suggestion": "In the HTML Processor version, add a short note that inherited attribute mutation/enumeration methods operate only on tokens backed by source HTML and return false/null for virtual/implied tokens." + } + ] +} diff --git a/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-1/candidate.php b/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-1/candidate.php new file mode 100644 index 0000000000000..df0f95c384e15 --- /dev/null +++ b/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-1/candidate.php @@ -0,0 +1,18 @@ +next_tag() ) { + $attributes = $processor->get_attribute_names_with_prefix( 'data-track-' ); + + if ( null === $attributes ) { + continue; + } + + foreach ( $attributes as $attribute_name ) { + $processor->remove_attribute( $attribute_name ); + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-1/execution.json b/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-1/execution.json new file mode 100644 index 0000000000000..27820abd0d7f3 --- /dev/null +++ b/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-1/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-1/candidate.php", + "function": "strip_tracking_attributes", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "single-link", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-tags", + "status": "pass", + "expected": "

                      Text

                      ", + "actual": "

                      Text

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-matching-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "similar-prefixes-kept", + "status": "pass", + "expected": "x", + "actual": "x", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-source-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comments-untouched", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-matches", + "status": "pass", + "expected": "

                      Text

                      ", + "actual": "

                      Text

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-1/response.json b/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-1/response.json new file mode 100644 index 0000000000000..42d41e97f2f6a --- /dev/null +++ b/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat scan of every tag opener with `next_tag()`, then calls the documented `get_attribute_names_with_prefix( 'data-track-' )` helper to find only attributes whose names start with that exact prefix and removes each one with `remove_attribute()`. The final HTML is returned with `get_updated_html()`, preserving all untouched bytes.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-2/candidate.php b/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-2/candidate.php new file mode 100644 index 0000000000000..df0f95c384e15 --- /dev/null +++ b/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-2/candidate.php @@ -0,0 +1,18 @@ +next_tag() ) { + $attributes = $processor->get_attribute_names_with_prefix( 'data-track-' ); + + if ( null === $attributes ) { + continue; + } + + foreach ( $attributes as $attribute_name ) { + $processor->remove_attribute( $attribute_name ); + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-2/execution.json b/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-2/execution.json new file mode 100644 index 0000000000000..ef62ef8da5ea5 --- /dev/null +++ b/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-2/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-2/candidate.php", + "function": "strip_tracking_attributes", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "single-link", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-tags", + "status": "pass", + "expected": "

                      Text

                      ", + "actual": "

                      Text

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-matching-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "similar-prefixes-kept", + "status": "pass", + "expected": "x", + "actual": "x", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-source-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comments-untouched", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-matches", + "status": "pass", + "expected": "

                      Text

                      ", + "actual": "

                      Text

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-2/response.json b/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-2/response.json new file mode 100644 index 0000000000000..3367ac0dab5e9 --- /dev/null +++ b/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat, tag-by-tag scan of the input, which is the documented choice for attribute edits that should otherwise preserve the original HTML bytes. For each matched tag, it calls `get_attribute_names_with_prefix( 'data-track-' )` to find exactly the tracking attributes to remove, then removes each one with `remove_attribute()`, and finally returns the modified markup with `get_updated_html()`.", + "confidence": 96 +} diff --git a/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-3/candidate.php b/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-3/candidate.php new file mode 100644 index 0000000000000..df0f95c384e15 --- /dev/null +++ b/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-3/candidate.php @@ -0,0 +1,18 @@ +next_tag() ) { + $attributes = $processor->get_attribute_names_with_prefix( 'data-track-' ); + + if ( null === $attributes ) { + continue; + } + + foreach ( $attributes as $attribute_name ) { + $processor->remove_attribute( $attribute_name ); + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-3/execution.json b/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-3/execution.json new file mode 100644 index 0000000000000..3d679159fa676 --- /dev/null +++ b/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-3/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-3/candidate.php", + "function": "strip_tracking_attributes", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "single-link", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-tags", + "status": "pass", + "expected": "

                      Text

                      ", + "actual": "

                      Text

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-matching-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "similar-prefixes-kept", + "status": "pass", + "expected": "x", + "actual": "x", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-source-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comments-untouched", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-matches", + "status": "pass", + "expected": "

                      Text

                      ", + "actual": "

                      Text

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-3/response.json b/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-3/response.json new file mode 100644 index 0000000000000..dc9b5a60a3a91 --- /dev/null +++ b/doc-experiment/results/round-43/T11-strip-tracking-attributes/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat scan of every tag in the input, because the task is purely attribute-based and does not require tree awareness. For each matched tag, it calls the documented `get_attribute_names_with_prefix( 'data-track-' )` to find only attributes whose names start with that exact prefix, removes each one with `remove_attribute()`, and returns the modified markup with `get_updated_html()`.", + "confidence": 96 +} diff --git a/doc-experiment/results/round-43/T12-unwrap-spans/judge.json b/doc-experiment/results/round-43/T12-unwrap-spans/judge.json new file mode 100644 index 0000000000000..8c39336575b10 --- /dev/null +++ b/doc-experiment/results/round-43/T12-unwrap-spans/judge.json @@ -0,0 +1,45 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly chose `WP_HTML_Processor::create_fragment()` for a body fragment needing normalized serialization. All called methods are documented: `create_fragment`, `next_token`, `get_tag`, `serialize_token`, and `get_last_error`. The token-walk plus `serialize_token()` pattern is exactly the documented rewrite pattern, and using `get_tag()` alone to skip both SPAN openers and closers matches the `serialize_token()` example. Handles the unclosed-span case through the HTML Processor's virtual closer behavior." + }, + { + "trial_id": "trial-2", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Same correct processor and documented API usage as trial-1, with idiomatic token walking and `serialize_token()`. Minor adherence loss: on `create_fragment()` failure or parser abort it returns the original raw input. The docs allow fallback policies, but the `serialize_token()` guidance explicitly warns that returning original input is neither normalized nor the accumulated rewrite, so this is a near-miss for a function whose contract is normalized output." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly uses the HTML Processor fragment parser, a single `next_token()` loop, `get_tag()` to skip SPAN boundary tokens, and `serialize_token()` to emit normalized output. All API calls are present in the rendered docs and no `_doing_it_wrong` records occurred. The approach naturally handles nested spans, adjacent spans, discarded span attributes, and virtual closing of unclosed elements." + } + ], + "failure_analysis": "All three trials passed all seven hidden cases, so there are no failed hidden cases to attribute to misconceptions. The docs worked well for this task because the `HTML Support` overview tells readers to choose `WP_HTML_Processor` for structure and normalization, `create_fragment()` matches body-fragment input, `next_token()` explains that text and closing tokens are visited, and `serialize_token()` gives the key rewrite pattern: walk tokens, skip tokens to remove them, and append normalized serialization for the rest. The `next_token()` discussion of implicit/end-of-input closers explains why the unclosed-span case succeeds. The main near-miss is trial-2's raw-input fallback after parser failure; the relevant `serialize_token()` passage does warn that returning original input discards the rewrite and is not normalized, but the fallback-policy guidance could be sharper for normalized-output APIs. Another near-miss is that all candidates relied on `get_tag()` returning a tag name for closers and null for non-tags; this is demonstrated indirectly by the `serialize_token()` example, but the `get_tag()` contract itself does not spell out those `next_token()`-walk semantics.", + "doc_gaps": [ + { + "location": "`WP_HTML_Processor::get_tag()` and inherited `WP_HTML_Tag_Processor::get_tag()` docblocks", + "problem": "The method docs show `next_tag()` usage, but do not explicitly define behavior while walking with `next_token()`: start tags, end tags, virtual tags, and non-tag tokens are not distinguished in the contract text.", + "suggestion": "State that during a token walk `get_tag()` returns the uppercase element name for matched tag tokens, including closers and processor-created virtual tags, and returns `null` for text/comment/doctype tokens. Point readers to `get_token_type()` and `is_tag_closer()` when they need to distinguish token kind or opener versus closer." + }, + { + "location": "`WP_HTML_Processor::serialize_token()` docblock", + "problem": "The example teaches the correct skip-and-serialize pattern, but the general rule behind wrapper removal is implicit.", + "suggestion": "Add a short general note that skipping both boundary tokens for an element while serializing intervening tokens removes the wrapper but preserves its children; matching by `get_tag()` in a `next_token()` loop is one documented way to skip both opener and closer." + }, + { + "location": "`WP_HTML_Processor::serialize_token()` fallback guidance and `WP_HTML_Processor::get_last_error()` docblock", + "problem": "Fallback policy is documented as caller-defined, but it is easy to miss that returning original input violates normalized-output or rewrite-output contracts.", + "suggestion": "Add decision guidance: for APIs promising normalized or rewritten HTML, returning original source bytes after parser abort is a fail-open/source-preserving policy, not a normalized result. Recommend an explicit sentinel such as `null` or an empty string unless the caller intentionally wants fail-open behavior." + }, + { + "location": "`WP_HTML_Processor::next_token()` and `WP_HTML_Tag_Processor::paused_at_incomplete_token()` docs", + "problem": "The distinction between unclosed elements that receive virtual closers and truly incomplete trailing syntax is spread across multiple passages.", + "suggestion": "Add paired examples contrasting an unclosed element at end of input, which can still serialize with virtual closers and no last error, with an incomplete trailing token, which is not visited and requires `paused_at_incomplete_token()` if completeness matters." + } + ] +} diff --git a/doc-experiment/results/round-43/T12-unwrap-spans/trial-1/candidate.php b/doc-experiment/results/round-43/T12-unwrap-spans/trial-1/candidate.php new file mode 100644 index 0000000000000..a181d2e93f569 --- /dev/null +++ b/doc-experiment/results/round-43/T12-unwrap-spans/trial-1/candidate.php @@ -0,0 +1,24 @@ +next_token() ) { + if ( 'SPAN' === $processor->get_tag() ) { + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + return $output; +} diff --git a/doc-experiment/results/round-43/T12-unwrap-spans/trial-1/execution.json b/doc-experiment/results/round-43/T12-unwrap-spans/trial-1/execution.json new file mode 100644 index 0000000000000..5ecb4b0e4b01c --- /dev/null +++ b/doc-experiment/results/round-43/T12-unwrap-spans/trial-1/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T12-unwrap-spans/trial-1/candidate.php", + "function": "unwrap_spans", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

                      a b c d

                      ", + "actual": "

                      a b c d

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-spans", + "status": "pass", + "expected": "

                      outer inner tail

                      ", + "actual": "

                      outer inner tail

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-spans-normalized-passthrough", + "status": "pass", + "expected": "

                      plain & simple

                      ", + "actual": "

                      plain & simple

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attributes-discarded", + "status": "pass", + "expected": "styled", + "actual": "styled", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "adjacent-spans", + "status": "pass", + "expected": "

                      ab

                      ", + "actual": "

                      ab

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "span-with-block-content", + "status": "pass", + "expected": "
                      before after
                      ", + "actual": "
                      before after
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-span", + "status": "pass", + "expected": "

                      runs to end

                      ", + "actual": "

                      runs to end

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T12-unwrap-spans/trial-1/response.json b/doc-experiment/results/round-43/T12-unwrap-spans/trial-1/response.json new file mode 100644 index 0000000000000..e11ada87c4619 --- /dev/null +++ b/doc-experiment/results/round-43/T12-unwrap-spans/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment, then walks every token with `next_token()` and rebuilds normalized HTML with `serialize_token()`, skipping any token whose tag is `SPAN`. Skipping both the opening and closing `SPAN` tokens unwraps spans while preserving and normalizing all intervening content, and `get_last_error()` is checked to avoid returning a partial rewrite after unsupported markup aborts parsing.", + "confidence": 90 +} diff --git a/doc-experiment/results/round-43/T12-unwrap-spans/trial-2/candidate.php b/doc-experiment/results/round-43/T12-unwrap-spans/trial-2/candidate.php new file mode 100644 index 0000000000000..cf94fddf1dcb7 --- /dev/null +++ b/doc-experiment/results/round-43/T12-unwrap-spans/trial-2/candidate.php @@ -0,0 +1,24 @@ +next_token() ) { + if ( 'SPAN' === $processor->get_tag() ) { + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + return $output; +} diff --git a/doc-experiment/results/round-43/T12-unwrap-spans/trial-2/execution.json b/doc-experiment/results/round-43/T12-unwrap-spans/trial-2/execution.json new file mode 100644 index 0000000000000..f3757d237894b --- /dev/null +++ b/doc-experiment/results/round-43/T12-unwrap-spans/trial-2/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T12-unwrap-spans/trial-2/candidate.php", + "function": "unwrap_spans", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

                      a b c d

                      ", + "actual": "

                      a b c d

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-spans", + "status": "pass", + "expected": "

                      outer inner tail

                      ", + "actual": "

                      outer inner tail

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-spans-normalized-passthrough", + "status": "pass", + "expected": "

                      plain & simple

                      ", + "actual": "

                      plain & simple

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attributes-discarded", + "status": "pass", + "expected": "styled", + "actual": "styled", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "adjacent-spans", + "status": "pass", + "expected": "

                      ab

                      ", + "actual": "

                      ab

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "span-with-block-content", + "status": "pass", + "expected": "
                      before after
                      ", + "actual": "
                      before after
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-span", + "status": "pass", + "expected": "

                      runs to end

                      ", + "actual": "

                      runs to end

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T12-unwrap-spans/trial-2/response.json b/doc-experiment/results/round-43/T12-unwrap-spans/trial-2/response.json new file mode 100644 index 0000000000000..ca1e346608ce1 --- /dev/null +++ b/doc-experiment/results/round-43/T12-unwrap-spans/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment, walks every token with `next_token()`, skips any token whose tag name is `SPAN` so both span openers and closers are removed, and appends every other token via `serialize_token()`. That produces normalized HTML while preserving all non-span content in place, including nested spans.", + "confidence": 89 +} diff --git a/doc-experiment/results/round-43/T12-unwrap-spans/trial-3/candidate.php b/doc-experiment/results/round-43/T12-unwrap-spans/trial-3/candidate.php new file mode 100644 index 0000000000000..6e7cc87e73eed --- /dev/null +++ b/doc-experiment/results/round-43/T12-unwrap-spans/trial-3/candidate.php @@ -0,0 +1,25 @@ +next_token() ) { + if ( 'SPAN' === $processor->get_tag() ) { + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + return $output; +} diff --git a/doc-experiment/results/round-43/T12-unwrap-spans/trial-3/execution.json b/doc-experiment/results/round-43/T12-unwrap-spans/trial-3/execution.json new file mode 100644 index 0000000000000..2ec2448afd3a8 --- /dev/null +++ b/doc-experiment/results/round-43/T12-unwrap-spans/trial-3/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-43/T12-unwrap-spans/trial-3/candidate.php", + "function": "unwrap_spans", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

                      a b c d

                      ", + "actual": "

                      a b c d

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-spans", + "status": "pass", + "expected": "

                      outer inner tail

                      ", + "actual": "

                      outer inner tail

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-spans-normalized-passthrough", + "status": "pass", + "expected": "

                      plain & simple

                      ", + "actual": "

                      plain & simple

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "attributes-discarded", + "status": "pass", + "expected": "styled", + "actual": "styled", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "adjacent-spans", + "status": "pass", + "expected": "

                      ab

                      ", + "actual": "

                      ab

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "span-with-block-content", + "status": "pass", + "expected": "
                      before after
                      ", + "actual": "
                      before after
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-span", + "status": "pass", + "expected": "

                      runs to end

                      ", + "actual": "

                      runs to end

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-43/T12-unwrap-spans/trial-3/response.json b/doc-experiment/results/round-43/T12-unwrap-spans/trial-3/response.json new file mode 100644 index 0000000000000..8b2398f1d8c16 --- /dev/null +++ b/doc-experiment/results/round-43/T12-unwrap-spans/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body-fragment HTML, then walks every token with `next_token()`. For each token, it skips both opening and closing `SPAN` tags by checking `get_tag()`, and appends every other token\u2019s normalized serialization via `serialize_token()`. That produces normalized HTML while preserving the contents that were inside removed spans.", + "confidence": 90 +} diff --git a/doc-experiment/results/round-43/codex-judges-output.json b/doc-experiment/results/round-43/codex-judges-output.json new file mode 100644 index 0000000000000..196da4d34623d --- /dev/null +++ b/doc-experiment/results/round-43/codex-judges-output.json @@ -0,0 +1,664 @@ +{ + "result": [ + { + "id": "N03-first-list-count", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), documented structural depth APIs, next_token(), bookmarks/seek, set_attribute(), get_updated_html(), paused_at_incomplete_token(), and get_last_error(). No _doing_it_wrong records. The extra finished_scan guard is consistent with the documented bounded subtree scan pattern." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correct processor and fully documented API surface. The depth-bounded next_token() loop, direct-child opener checks, bookmark/seek edit, and clean-scan checks match the docs' recipes. No _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation quality as trial-2: correct fragment processor, no undocumented methods, idiomatic bookmark plus depth-bounded token walk, and appropriate incomplete/unsupported fallback checks. No _doing_it_wrong records." + } + ], + "failure_analysis": "All trials passed all 11 hidden cases, so there were no failed cases to attribute to documentation gaps. The docs did unusually well for this task: the HTML Processor overview explicitly distinguishes it from the Tag Processor for structure-aware work; create_fragment() explains BODY-fragment parsing and null returns; next_tag() explains scanning for the first of multiple tag names; the 'scan a region before editing its opener' and 'test subtree membership and direct children' recipes map directly to bookmark, next_token(), depth, is_tag_closer(), get_token_type(), seek(), and clean-scan checks; get_current_depth() explains why the guard must be >= and why direct child counting must ignore closers; get_last_error() and paused_at_incomplete_token() cover unsupported markup and truncation. The only near-miss is that the correct scoped completeness policy requires combining several passages: after a bounded subtree walk, reject truncation or unsupported markup inside the region, but do not keep scanning unrelated trailing input if the target element was already closed.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::next_token() and WP_HTML_Processor::get_current_depth() docblocks", + "problem": "The scoped completeness rule is spread across multiple sections, while paused_at_incomplete_token() elsewhere says to drain all tokens for whole-document checks. This can confuse callers whose contract only depends on a completed subtree.", + "suggestion": "Add a short bounded-subtree note: once depth drops below the recorded opener depth, the walk has left that subtree; check paused_at_incomplete_token() and get_last_error() before mutating, and only drain to EOF if the caller's contract also depends on the trailing document." + }, + { + "location": "WP_HTML_Processor::get_current_depth() docblock", + "problem": "The method explains closer depth, but the direct-child element test is easier to find in the overview recipe than at the depth API itself.", + "suggestion": "Add a compact direct-child opener formula near the depth examples: require #tag, not is_tag_closer(), and current depth equal to container depth + 1." + }, + { + "location": "WP_HTML_Processor::set_attribute() docblock", + "problem": "Mutation output retrieval is documented elsewhere, but callers using HTML Processor may still reach for serialize() after set_attribute().", + "suggestion": "Add a one-line post-mutation example showing set_attribute() followed by get_updated_html(), with a cross-reference that serialize()/serialize_token() are for normalized serialization workflows, not queued attribute updates." + } + ] + } + }, + { + "id": "N04-normalize-or-placeholder", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly chose `WP_HTML_Processor::normalize()`, which is documented in the rendered HTML Processor docs as a public static normalizer for BODY-context fragments returning `string|null`. It uses a strict `null` fallback check and avoids unnecessary token walking or mutation APIs." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same correct solution as the reference: documented HTML Processor static normalization plus strict mapping of `null` to the placeholder. No undocumented API usage or `_doing_it_wrong` records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correct processor and API choice. The implementation follows the documented `normalize()` contract directly and handles unsupported input via the documented `null` return." + } + ], + "failure_analysis": "All trials passed all seven hidden cases, so there were no functional failures to attribute to documentation gaps. The rendered docs did the important work well: `html-tag-processor.md` explicitly says to use the HTML Processor for producing normalized output, while `html-processor.md` documents `WP_HTML_Processor::normalize()` as normalizing BODY-context fragments, lists normalization effects such as quoted attributes, omitted tags, table structure insertion, and text re-encoding, and states that unsupported markup makes output methods such as `serialize()` and `normalize()` return `null`. That gave subjects a direct, low-risk path to the reference solution. The only near-miss is that unsupported cases record a `trigger_error` from serialization even though `normalize()` returns `null`; because the canonical solution has the same behavior and there are no `_doing_it_wrong` records, this is not candidate misuse, but the docs could make the warning/null behavior less surprising.", + "doc_gaps": [ + { + "location": "html-processor.md `normalize()` return contract", + "problem": "The docs say `string|null`, but do not explicitly contrast unsupported `null` with valid empty-string output for an empty fragment.", + "suggestion": "Add a short return-contract note: callers should use a strict `null` check for inability to normalize; an empty input fragment may normalize to `''` and is not a failure." + }, + { + "location": "html-processor.md `normalize()` / `serialize()` unsupported-markup behavior", + "problem": "Unsupported markup returns `null`, but execution also records a serialization warning. Readers may not know whether that warning is expected API behavior or evidence of misuse.", + "suggestion": "Document whether normalization/serialization may emit a warning when the parser aborts, and distinguish that from `_doing_it_wrong` misuse." + }, + { + "location": "html-processor.md HTML Support unsupported constructs", + "problem": "The unsupported examples cover foster parenting and one mis-nested formatting case, but anchor/adoption-agency failures are less discoverable.", + "suggestion": "Broaden the unsupported-markup examples with a general note that some active-formatting-element and nested-anchor reconstruction cases can abort, with callers expected to treat `null` output as the fallback signal." + } + ] + } + }, + { + "id": "N06-extract-toc", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 91, + "hallucinated_methods": [], + "notes": "Uses the correct WP_HTML_Processor::create_fragment() parser and a documented one-pass next_token() state machine. All called API methods appear in the rendered docs, and execution recorded no _doing_it_wrong misuse. Strong handling of implied/virtual heading closers and empty headings. Main adherence loss: it appends get_modifiable_text() from SCRIPT, STYLE, TEXTAREA, and TITLE opener tokens, while the documented DOM-style subtree text recipe says ordinary text extraction should append only #text tokens unless the caller explicitly opts into special-element contents. It also checks get_last_error() but not paused_at_incomplete_token()." + }, + { + "trial_id": "trial-2", + "adherence": 92, + "hallucinated_methods": [], + "notes": "Uses the correct HTML Processor and documented APIs only, with no _doing_it_wrong records. The closer-driven single next_token() loop matches the documented pattern that every opener receives a closing token, including implied and end-of-input virtual closers. It explicitly checks paused_at_incomplete_token() and get_last_error(). Deductions are for the same special-element over-inclusion as trial-1, and for treating any trailing incomplete syntax as a reason to discard all previously extracted headings, which is a policy choice not established by the task contract." + }, + { + "trial_id": "trial-3", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Closest to the documented subtree-text pattern and the canonical solution: create_fragment(), next_tag() for heading openers, get_current_depth() to bound a subtree walk, next_token(), #text filtering, and decoded get_modifiable_text(). All API methods are documented and there were no misuse records. Minor residual concern: it uses nested token loops for repeated regions despite the docs' broad warning about nested walks, though this bounded use is safe here because the outer loop does not need to process the consumed boundary token." + } + ], + "failure_analysis": "All three trials passed all 7 frozen cases, so there are no failed hidden cases to attribute. The docs did well on the most important decisions: the Tag Processor \"Which processor should I use?\" section clearly pushed subjects toward WP_HTML_Processor for tree-aware text extraction; the HTML Processor \"Recipe: collect DOM-style text from a subtree\", next_token(), and get_current_depth() sections gave the essential #text accumulation, virtual closer, implied-close, and >= depth-boundary rules. That explains why every trial handled nested inline markup, decoded entities, empty headings, uppercase source tags, and implied heading closure.\n\nNear-misses: trials 1 and 2 over-applied the get_modifiable_text() method contract. The get_modifiable_text() section accurately says SCRIPT, STYLE, TEXTAREA, and TITLE carry text on their opener tokens, but models treated that as part of ordinary element text despite the separate subtree-text recipe warning that ordinary DOM-style extraction is only #text tokens unless special-element text is explicitly requested. Trial 2 also over-read the incomplete-token guidance: the docs say fallback behavior is the caller's contract, but do not give enough read-only extraction guidance, so it discarded valid earlier results on trailing incomplete syntax such as a dangling '<'. Trial 3 exposed a documentation tension: the next_token() docs warn against nested walk loops for repeated regions, while the depth-bounded subtree recipe and this task's natural solution use an inner bounded scan safely.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_modifiable_text() docs", + "problem": "The method explains that special elements expose modifiable text, but readers can mistake availability for inclusion in ordinary subtree text extraction.", + "suggestion": "Add a short cross-reference stating that ordinary container text walks should read get_modifiable_text() only from #text tokens; SCRIPT, STYLE, TEXTAREA, and TITLE opener text should be included only when the caller's contract explicitly asks for those element contents." + }, + { + "location": "WP_HTML_Processor::next_token() / nested walk guidance", + "problem": "The warning against nested walk loops is too broad and can appear to conflict with the documented depth-bounded subtree examples.", + "suggestion": "Clarify the distinction: nested bounded scans are acceptable when the outer loop can resume after the consumed boundary token, while a single stateful loop is preferred when the outer loop must observe every boundary or adjacent repeated region token." + }, + { + "location": "paused_at_incomplete_token() guidance and HTML Processor scan recipes", + "problem": "The docs say fallback behavior is caller-defined, but they do not distinguish mutation/rewrite safety from read-only extraction policies.", + "suggestion": "Add general guidance that mutation or complete-normalization workflows often reject incomplete trailing syntax, while read-only extraction may return data from complete tokens already visited unless its contract requires a fully complete source." + } + ] + } + }, + { + "id": "T01-add-image-class", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the correct `WP_HTML_Tag_Processor` for a flat, byte-preserving class edit. Calls only documented APIs: constructor, `next_tag()`, `add_class()`, and `get_updated_html()`. The `while ( next_tag( 'img' ) )` loop is idiomatic, and lowercase `img` is covered by documented case-insensitive tag matching. Edge cases are handled by the documented processor behavior rather than manual parsing." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same fully documented solution shape as the reference: Tag Processor, filtered forward scan, `add_class()`, and `get_updated_html()`. No undocumented calls or `_doing_it_wrong` records. Correctly relies on documented semantics for existing class preservation, comments not matching as tags, and incomplete trailing tags not being modified." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correct processor and API surface throughout. The implementation uses the documented all-matches token-walking pattern with `next_tag( 'img' )`, modifies only matched real tags with `add_class()`, and returns the queued edits with `get_updated_html()`. No attribute null/true/empty-string semantics are misused because it never reads raw attributes." + } + ], + "failure_analysis": "No failed hidden cases across the three trials; all passed 8/8. The docs did well on the exact decision points this task required: the Tag Processor overview says to use it for flat attribute/class edits and byte-precise preservation; `next_tag()` documents the string shorthand, ASCII case-insensitive tag-name matching, skipping tag-like text inside comments/raw-text contexts, and pausing before incomplete trailing tags; `add_class()` documents creating a missing class attribute, appending to existing classes without removing or reordering them, and avoiding duplicates; `get_updated_html()` documents that untouched bytes are preserved and that it is the output method after queued edits. Near-miss: the HTML Processor docs also show `add_class()` in examples, but the processor-choice guidance was strong enough that all subjects picked the lighter Tag Processor.", + "doc_gaps": [ + { + "location": "`WP_HTML_Tag_Processor::add_class()` docblock", + "problem": "The method explains class creation and appending, but the placement of a newly-created `class` attribute is easier to infer from separate attribute-update documentation than from this method itself.", + "suggestion": "Add a short general note that when `add_class()` creates the `class` attribute, it follows the normal added-attribute placement rules while preserving all untouched attributes byte-for-byte." + }, + { + "location": "`WP_HTML_Tag_Processor` Usage / `next_tag()` examples", + "problem": "The first usage example demonstrates a single `if` match; the all-matches `while ( next_tag(...) )` edit-and-return idiom is present indirectly but not foregrounded as the common pattern for bulk edits.", + "suggestion": "Add a generic bulk-edit example using `while ( $processor->next_tag( 'TAG' ) ) { ... }` followed by `get_updated_html()`." + }, + { + "location": "`WP_HTML_Processor::add_class()` inherited method docs", + "problem": "The HTML Processor page lists `add_class()` but gives less detail than the Tag Processor page about append order, no-op duplicate behavior, and class-order preservation.", + "suggestion": "Ensure inherited class-helper docs on the HTML Processor page preserve or link directly to the fuller Tag Processor contract, so users landing there get the same guarantees." + } + ] + } + }, + { + "id": "T02-link-targets", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Tag_Processor for a flat, byte-preserving attribute edit. All calls are documented: direct construction, next_tag, get_attribute, set_attribute, and get_updated_html. The null check handles absent vs empty vs valueless href semantics, and no _doing_it_wrong records appeared." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same fully documented and idiomatic Tag Processor pattern as the reference: scan A openers, test href presence with get_attribute() !== null, set target, return get_updated_html(). Passed all edge semantics without undocumented API use." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Uses lower-case next_tag('a'), which is documented as ASCII case-insensitive. Otherwise matches the canonical documented pattern and correctly relies on get_attribute null/true/empty-string semantics. No hallucinated methods or misuse records." + } + ], + "failure_analysis": "All three trials passed all 8 hidden cases, so there were no failed hidden cases to attribute to a documentation failure. The docs worked well here: the Tag Processor overview and the HTML Processor support section clearly steer byte-exact flat attribute/class edits to WP_HTML_Tag_Processor; the Usage and Finding tags sections show direct construction and next_tag scanning; get_attribute documents null for absent attributes, empty string for empty attributes, and true for valueless boolean attributes; set_attribute documents overwrite behavior and placement of newly-added attributes; get_updated_html documents that queued edits are applied while untouched bytes are preserved. The main near-miss is that the safe attribute-presence idiom has to be inferred from the return-value contract rather than being named directly.", + "doc_gaps": [ + { + "location": "WP_HTML_Tag_Processor::get_attribute docblock", + "problem": "The return contract contains the needed null/empty-string/true distinction, but it does not explicitly name the common presence-test idiom. Less careful readers may use truthiness and skip href=\"\" while still thinking they followed the docs.", + "suggestion": "Add a short note: to test whether an attribute is present, compare the result to null; do not use a truthiness check because empty-string and true are both present attributes." + }, + { + "location": "WP_HTML_Tag_Processor::get_attribute and set_attribute docblocks", + "problem": "Attribute name matching case-insensitivity is not prominent at the exact lookup/update methods. The uppercase-attribute case relies on this behavior.", + "suggestion": "State on both methods that attribute names are matched ASCII case-insensitively, while untouched original attribute spelling is preserved in output." + }, + { + "location": "WP_HTML_Tag_Processor::next_tag docblock", + "problem": "The docs say next_tag finds tags and separately discuss incomplete input, but the skip behavior for markup-like text in comments/raw text is not summarized where users choose next_tag for scanning.", + "suggestion": "Add a compact note that next_tag matches real HTML tag tokens only; markup-looking text inside comments and raw/plaintext regions is not reported as a tag, and incomplete trailing tags are not matched." + }, + { + "location": "WP_HTML_Tag_Processor::set_attribute attribute placement section", + "problem": "The placement rules are documented, but the single-new-attribute case that surprises users most is easy to miss when exact output order matters.", + "suggestion": "Add a general one-line example showing that adding one new attribute to a tag with existing attributes inserts it immediately after the tag name, while updating an existing attribute keeps its position." + } + ] + } + }, + { + "id": "T03-first-h1-text", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Used the correct tree-aware `WP_HTML_Processor::create_fragment()` path, depth-bounded `next_token()` walk, `#text` guard, and decoded `get_modifiable_text()`. All called API methods are present in the supplied markdown and execution recorded no `_doing_it_wrong`. Small adherence penalty: it opted into special-element opener text for SCRIPT/STYLE/TEXTAREA/TITLE/NOEMBED/NOFRAMES/XMP, which is documented but broader than the task's plain text-node contract and could include raw non-heading text in untested inputs." + }, + { + "trial_id": "trial-2", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Correct processor choice and essentially the documented subtree text recipe. `create_fragment`, `next_tag`, `get_current_depth`, `next_token`, `get_token_type`, `get_modifiable_text`, `is_tag_closer`, and `get_tag` are all documented; no `_doing_it_wrong` records. Minor penalty for the same unnecessary special-element branch, though this one limits itself to the four elements explicitly called out in the HTML Processor docs." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Matches the canonical documented pattern: create an HTML Processor fragment, find `H1`, record opener depth, walk tokens while depth remains in the subtree, append only `#text` token `get_modifiable_text()`. Handles decoded text, empty headings, no H1, nested markup, and end-of-input virtual closers without undocumented API use." + } + ], + "failure_analysis": "All trials passed all frozen cases, 8/8 each, and none produced `_doing_it_wrong` records. The docs did well on the core path: the 'Which processor should I use?' guidance points text/subtree work to `WP_HTML_Processor`; the 'Recipe: collect DOM-style text from a subtree' example is almost exactly this task; `get_current_depth()` explains why the guard must be `>=`; `next_token()` explains virtual closers for malformed or unclosed input; and `get_modifiable_text()` clearly says returned `#text` content is already decoded. The main near-miss is special elements. Trials 1 and 2 inferred that special element opener text should be included inside the H1 because the docs explain that SCRIPT/STYLE/TITLE/TEXTAREA carry text on the opener token. That behavior is documented, but the broader docs also say ordinary subtree text should append only `#text` tokens unless the caller explicitly opts into special-element content. The hidden cases did not exercise this distinction, so it did not become a functional failure.", + "doc_gaps": [ + { + "location": "`WP_HTML_Processor` overview, 'Recipe: collect DOM-style text from a subtree' plus `next_token()` special-element note", + "problem": "The docs contain both the correct ordinary subtree-text recipe and a nearby special-element exception. Test subjects over-applied the exception for a generic heading-text task.", + "suggestion": "Add a short decision table distinguishing ordinary text-node extraction, DOM-like textContent, and special-element content extraction. State which token types to include for each policy and when SCRIPT/STYLE raw text should be excluded." + }, + { + "location": "`WP_HTML_Processor::get_modifiable_text()`", + "problem": "`get_modifiable_text()` is easy to read as 'text content' for any token, even though comments and special element openers are not ordinary text nodes.", + "suggestion": "Repeat in the method contract that non-`#text` modifiable text is opt-in data, not a text-node match. Recommend checking `get_token_type() === '#text'` for ordinary extracted text, with explicit tag whitelists only for caller-requested special content." + }, + { + "location": "Special self-contained elements docs across Tag Processor and HTML Processor", + "problem": "The exact special-element set is split across sections, and candidates differed on whether to include deprecated rawtext elements such as NOEMBED/NOFRAMES/XMP.", + "suggestion": "Centralize the list of tokens whose text is carried on opener tokens for HTML Processor walks, including whether each returns decoded or raw text, and link to it from both `next_token()` and `get_modifiable_text()`." + } + ] + } + }, + { + "id": "T04-build-figure", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Tag_Processor, which is the documented fit for filling a known literal template while preserving bytes and attribute order. All called APIs are present in the rendered docs: constructor, next_tag, set_attribute, next_token, get_token_type, set_modifiable_text, and get_updated_html. The solution follows the documented template-building recipe and correctly relies on plain-string input encoding for attributes and #text." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation pattern as trial-1. It uses only documented APIs, chooses the lighter Tag Processor appropriately, predeclares src and alt in template order, walks tokens to the figcaption #text placeholder, and returns get_updated_html(). No _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation pattern as trial-1. It cleanly follows the docs' Building markup from a template example: existing attributes preserve order, placeholder text enables set_modifiable_text(), and all output is read through get_updated_html(). No undocumented calls or misuse." + } + ], + "failure_analysis": "All trials passed all seven hidden cases. The docs did especially well in the Tag Processor page under \"Which processor should I use?\", which distinguishes flat byte-preserving mutation from tree-aware parsing, and under \"Building markup from a template\", which directly explains the winning pattern: start with a literal shape, include attributes in the desired order, include placeholder text, update with set_attribute()/set_modifiable_text(), then call get_updated_html(). The set_attribute section also clearly explains that plain unescaped values are accepted and encoded, and that newly added attributes sort by name rather than call order. The get_modifiable_text/set_modifiable_text sections clarify decoded/plain text handling, preventing the common mistake of manually escaping captions or trying to parse caption HTML as markup. Near miss: the template recipe calls set_modifiable_text() without checking its return value, while the method-level docs say to always check it. In this literal-template case the invariant is strong enough, but the example slightly undercuts the defensive contract.", + "doc_gaps": [ + { + "location": "html-tag-processor.md, \"Building markup from a template\" recipe", + "problem": "The example demonstrates the exact successful pattern but does not check return values from next_tag(), set_attribute(), or set_modifiable_text(), even though set_modifiable_text() later says to always check its return value.", + "suggestion": "Either make the recipe explicitly state that the literal template guarantees these calls in the example, or show a production-safe variant that checks the cursor move and text update before returning get_updated_html()." + }, + { + "location": "html-tag-processor.md, \"Building markup from a template\" recipe", + "problem": "The recipe says the API handles necessary encoding, but the concrete examples of dangerous input are only spread across later method sections.", + "suggestion": "Add one short sentence or example line near the recipe stating that callers should pass plain decoded strings, including strings containing &, <, >, and quotes; set_attribute() and set_modifiable_text() perform the appropriate HTML encoding." + }, + { + "location": "html-tag-processor.md, set_attribute() attribute ordering notes", + "problem": "The ordering rule is documented well, but it lives primarily in set_attribute(); template construction readers may miss why empty attributes should be predeclared.", + "suggestion": "Cross-link the template recipe and set_attribute ordering note both ways, emphasizing the general contract: update existing attributes to preserve written order; newly created attributes are inserted/sorted by the processor." + } + ] + } + }, + { + "id": "T05-text-excerpt", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Used the correct `WP_HTML_Processor::create_fragment()` path, walked tokens with `next_token()`, read only `#text` plus whitelisted `TITLE`/`TEXTAREA` opener tokens, and relied on documented decoded `get_modifiable_text()` behavior. No `_doing_it_wrong` records." + }, + { + "trial_id": "trial-2", + "adherence": 90, + "hallucinated_methods": [], + "notes": "HTML API usage is mostly sound and all called processor methods are documented: `create_fragment`, `next_token`, `get_token_type`, `get_modifiable_text`, `is_tag_closer`, and `get_tag`. The 2/10 functional result comes from a PHP bug: `preg_match_all()` returns the number of matches, so the candidate skipped every text chunk longer than one code point. That is not an HTML API misuse." + }, + { + "trial_id": "trial-3", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Used the documented processor, token walk, token-type checks, special-element whitelist, decoded text access, and UTF-8 `mb_*` truncation. No undocumented calls or misuse records." + } + ], + "failure_analysis": "Only trial-2 failed hidden cases. The failures in `no-truncation-needed`, `truncate-mid-link`, `entities-count-decoded`, `multibyte-emoji`, `accented`, `script-excluded`, `textarea-title-counts-script-style-excluded`, and `malformed-nesting` all share the same misconception: the candidate treated `preg_match_all('/./us', $chunk, $matches)` as if success should return `1`. In PHP it returns the number of matches, so text chunks like `Just `, `Fish & Chips`, `before`, `form & field`, and `one` were discarded; only a one-codepoint whitespace chunk survived in the link/whitespace cases. The relevant HTML API docs were adequate: `WP_HTML_Processor::create_fragment()` says body fragments should use the fragment parser; `next_token()` says to use token walking when text matters and that special elements have no `#text` children; `get_modifiable_text()` says `#text`, `TITLE`, and `TEXTAREA` text is decoded UTF-8 and should be measured/sliced with an explicit encoding. This was not caused by an undocumented HTML API behavior.", + "doc_gaps": [ + { + "location": "`WP_HTML_Processor::get_modifiable_text()` inherited docs", + "problem": "The docs mention UTF-8 slicing but only show a minimal `mb_substr()` example in this rendered file; a model still reached for ad hoc regex counting.", + "suggestion": "Show paired examples for measuring and slicing decoded modifiable text with `mb_strlen( $text, 'UTF-8' )` and `mb_substr( $text, 0, $limit, 'UTF-8' )`, without making it specific to excerpts." + }, + { + "location": "`WP_HTML_Processor::next_token()` text-walking recipe", + "problem": "The docs explain ordinary `#text` collection and special-element exceptions, but the guidance is split across sections.", + "suggestion": "Add a compact cross-reference in the text-walking recipe: for mixed token loops, use `get_token_type()` to select ordinary text, and opt into `TITLE`/`TEXTAREA` opener text with `get_token_name()` plus `! is_tag_closer()` when the caller wants those special contents." + } + ] + } + }, + { + "id": "T06-collect-links", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), token walking, get_attribute() with is_string(), and #text + get_modifiable_text() correctly. All called APIs are documented and execution recorded no misuse. Slightly less canonical than the reference because it tracks A state manually rather than using a depth-bounded subtree walk, but this matches the docs' single-cursor/state guidance for repeated regions." + }, + { + "trial_id": "trial-2", + "adherence": 90, + "hallucinated_methods": [], + "notes": "Correct processor and documented APIs throughout. The main adherence issue is the final paused_at_incomplete_token() policy: for a read-only extraction task, returning an empty result on any trailing incomplete syntax can discard links already parsed. The docs describe that as a caller policy choice, not a default for extraction. Otherwise handles decoded href/text and valueless href correctly." + }, + { + "trial_id": "trial-3", + "adherence": 94, + "hallucinated_methods": [], + "notes": "Correct processor choice and no undocumented API calls. Uses a documented one-pass next_token() state-machine pattern and the right string-valued href check. The final get_last_error() rejection is defensible for unsupported markup, though the docs could better distinguish strict-abort extraction from best-effort partial extraction." + } + ], + "failure_analysis": "All three trials passed all 8 frozen cases. The docs did well on the essentials: 'Which processor should I use?' and create_fragment() pointed subjects to WP_HTML_Processor for BODY fragments; get_attribute() documented string|true|null, which led all trials to exclude missing and valueless hrefs with is_string(); get_modifiable_text() documented decoded #text behavior; and next_token() documented one shared cursor, virtual closers, and explicit state, which the candidates followed. Near-misses: trial-2 appears to overgeneralize the incomplete-input guidance from next_token()/paused_at_incomplete_token(), treating any trailing incomplete syntax as grounds to erase collected results. The relevant docs say this depends on caller policy, but the examples are mostly mutation/rewrite-oriented, making strict rejection look like a default. Trials also rely on closer-driven A stack state; the is_tag_closer() docs imply this works, but they do not explicitly say get_tag() still names the element being closed on real and virtual closers.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::next_token() / Recipe: collect DOM-style text from a subtree", + "problem": "The docs show single-subtree text extraction and a DT state-machine example, but not a general repeated-element extraction pattern that combines opener attributes, text accumulation, and closer finalization.", + "suggestion": "Add a generalized example for collecting data from repeated elements in one pass: record state on an opener, append only #text token get_modifiable_text(), finalize on the element closer, and explain when a depth-bounded inner walk is appropriate instead." + }, + { + "location": "WP_HTML_Tag_Processor::paused_at_incomplete_token() and WP_HTML_Processor::next_token() incomplete-input notes", + "problem": "The distinction between an unclosed element, which still gets a virtual closer, and an incomplete trailing syntax token, which sets paused_at_incomplete_token(), is easy to blur.", + "suggestion": "State explicitly that unclosed elements at EOF are structurally closed by the processor and are not necessarily 'incomplete tokens'; checking paused_at_incomplete_token() is a strict-source-completeness policy that may discard otherwise valid visited data." + }, + { + "location": "WP_HTML_Processor::get_last_error()", + "problem": "The docs explain how to detect unsupported markup, but mostly frame the response around output-producing methods like serialize()/normalize(). Extraction callers need clearer guidance on partial results.", + "suggestion": "Document that tokens visited before get_last_error() became non-null were parsed, but the traversal is incomplete; callers should choose and document a policy such as reject all, return partial results with a flag, or fall back." + }, + { + "location": "WP_HTML_Processor::is_tag_closer() / get_tag()", + "problem": "Closer-driven state machines depend on get_tag() returning the closed element name on closer tokens, including virtual closers. The docs imply this through examples but do not state the contract directly.", + "suggestion": "Add one sentence and a tiny example showing that when matched on a closer, is_tag_closer() is true, get_tag() returns the element being closed, while breadcrumbs/depth already reflect the parent context." + } + ] + } + }, + { + "id": "T07-nested-lists", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Processor::create_fragment() for structure-aware parsing. All called methods are documented in the rendered files. The implementation uses the intended token walk, get_tag(), get_breadcrumbs(), add_class(), and get_updated_html() pattern, excludes the current node from ancestor checks, handles null factory return, and checks get_last_error(). No _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correct processor choice and fully documented API usage. This is idiomatic for the task: scan openers with next_tag(), inspect breadcrumbs for ancestors, add the class with add_class(), and return get_updated_html(). It also explicitly checks paused_at_incomplete_token() and get_last_error(), which is conservative but documented. No _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly chose the HTML Processor and used only documented methods. The breadcrumb handling is clean: array_pop() removes the current list before testing ancestors. Uses add_class() and get_updated_html() appropriately, handles null factory return and unsupported parser aborts via get_last_error(). No _doing_it_wrong records." + } + ], + "failure_analysis": "All three trials passed all 7 frozen cases, and none produced _doing_it_wrong records. The docs succeeded on the main decision points: the Tag Processor page explicitly says it has no tree awareness and points structural work to WP_HTML_Processor; the HTML Processor page documents create_fragment() for body fragments; next_tag() documents opener-only walking by default; get_breadcrumbs() documents the current-node path including implicit HTML/BODY; add_class() documents class merging; and get_updated_html() documents byte-preserving output after queued edits. The only near-miss is incomplete-input policy: trial-2 rejects any paused incomplete token, while trials 1 and 3 do not. The docs describe both policies as caller-dependent, so this was not an adherence failure for this task, but it is an area where examples could make the choice more explicit for simple mutation loops.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_breadcrumbs() / Breadcrumbs section", + "problem": "The docs state that breadcrumbs include the current matched node, but they do not show the common ancestor-only idiom. This can lead models to accidentally count the current element as its own ancestor.", + "suggestion": "Add a short general note and example showing that ancestor checks should use the breadcrumb array without its last element, because the last item is the current token." + }, + { + "location": "WP_HTML_Processor::next_tag() breadcrumb query docs", + "problem": "The docs explain fixed breadcrumb sub-path matching, but do not clearly distinguish that from arbitrary ancestor membership checks or disjunctions across ancestor names.", + "suggestion": "Clarify that breadcrumb queries match a specified path shape; for conditions like 'has any ancestor matching X' or 'has one of several possible ancestors', scan matching tags and inspect get_breadcrumbs()." + }, + { + "location": "WP_HTML_Processor simple mutation examples / inherited get_updated_html() guidance", + "problem": "Incomplete-token and get_last_error() policy is documented, but mostly in region-scan and serialization contexts. For simple class/attribute mutation loops, it is less obvious whether to return updated HTML, original HTML, or null after a paused incomplete token.", + "suggestion": "Add a brief post-loop policy note for mutation examples: get_updated_html() returns queued byte-preserving edits; check get_last_error() after scanning, and check paused_at_incomplete_token() only when the caller requires complete input rather than best-effort edits to complete tokens." + } + ] + } + }, + { + "id": "T08-table-extract", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 94, + "hallucinated_methods": [], + "notes": "Correctly chose WP_HTML_Processor::create_fragment(), used a single depth-bounded next_token() walk, and all called HTML API methods are documented. Slight loss for adding special-element opener modifiable text inside cells; that is documented API behavior, but the docs' ordinary subtree-text recipe says to append only #text tokens unless the caller explicitly opts in. No _doing_it_wrong records; passed 8/8." + }, + { + "trial_id": "trial-2", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Best adherence. Correct processor choice, documented methods only, #text-only extraction with get_modifiable_text(), single cursor/state-machine traversal, depth boundary, null processor handling, and get_last_error handling. Minor loss only for not making an explicit paused_at_incomplete_token policy; passed 8/8 with no _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 92, + "hallucinated_methods": [], + "notes": "Correct processor choice and documented token-walking methods, with the right depth-bounded single-loop shape. Loses points for not checking get_last_error after a structural scan and for the same special-element opener-text over-inclusion risk as trial-1. No hallucinated methods or _doing_it_wrong records; passed 8/8." + } + ], + "failure_analysis": "No hidden case failed in execution.json: all three trials passed all 8 cases, and none recorded _doing_it_wrong. The docs did well on the core decision path: the HTML Processor overview says to choose WP_HTML_Processor when structure, containment, subtree text, implied tags, and virtual closers matter; create_fragment() covers body fragments and null returns; next_token() explains virtual closers, inserted TBODY, single-cursor traversal, and avoiding nested loops for repeated regions; get_current_depth() explicitly teaches the >= subtree guard; and the DOM-style text recipe plus get_modifiable_text() led candidates to decoded #text extraction for markup and entities. The main near-miss is special-element text. Trials 1 and 3 whitelisted SCRIPT/STYLE/TEXTAREA/TITLE opener text, and trial 1 guessed additional special tags. The relevant passages document that special elements carry modifiable text on opener tokens, while the ordinary subtree-text recipe says not to include special opener text unless the caller opts in. Those facts are present, but split enough that a reader can over-apply get_modifiable_text() when a task says text content. A hidden case with special elements inside cells would diverge from the canonical #text-only interpretation, especially because SCRIPT/STYLE-like content is raw rather than decoded. A secondary near-miss is error policy: trials 1 and 2 discard accumulated rows when get_last_error() is non-null, while the reference is best-effort for already-visited tokens. The docs correctly say unsupported markup stops the parser, but they do not make partial read-only extraction policy as explicit as mutation/serialization policy.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_modifiable_text() and WP_HTML_Tag_Processor::get_modifiable_text() docs", + "problem": "The method docs emphasize that special elements expose modifiable text, but the warning that generic subtree text should usually read only #text tokens is easier to miss because it lives mostly in the overview recipe.", + "suggestion": "Add an immediate cross-reference and warning in the method docblock: for ordinary subtree text extraction, first require get_token_type() === '#text'; special-element opener text is an explicit opt-in and may be raw or decoded depending on the element." + }, + { + "location": "WP_HTML_Processor::next_token() special-elements paragraph", + "problem": "The paragraph says to read SCRIPT/STYLE/TITLE/TEXTAREA text from the opener token, but does not state the decision boundary between ordinary DOM-style text extraction and an intentionally inclusive special-element policy.", + "suggestion": "Add a compact decision table for token text: #text is ordinary decoded subtree text; TITLE/TEXTAREA opener text is opt-in decoded special text; SCRIPT/STYLE and similar opener text is opt-in raw text; comments and processing instructions are not DOM subtree text." + }, + { + "location": "Special atomic element lists in html-tag-processor.md and html-processor.md", + "problem": "The documented special-element set is not fully consistent or authoritative; candidates guessed extra tag names such as XMP/NOFRAMES after seeing broad wording like 'any other section'.", + "suggestion": "Make the special atomic element list authoritative and consistent across both processor docs, including exact tag names and raw-vs-decoded behavior, or link both docs to one shared list." + }, + { + "location": "WP_HTML_Processor::get_last_error(), create_fragment(), and next_token() docs", + "problem": "The docs say to check get_last_error() after scans, but partial read-only extraction policy is underspecified. Readers may discard already-collected data even when their caller contract would allow best-effort results, or keep partial data without realizing traversal aborted early.", + "suggestion": "Document that already-visited tokens remain usable but the tree was not fully traversed; show the two general policies: fail closed for mutations/normalization or strict completeness, and return accumulated data only when the caller explicitly accepts best-effort extraction." + } + ] + } + }, + { + "id": "T09-mark-keyword", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Used the correct WP_HTML_Processor fragment parser and the documented token-rewrite pattern: next_token(), #text guard, get_modifiable_text() for decoded matching, and serialize_token() for normalized output. All called HTML API methods are documented. Minor deduction: on get_last_error() it returns the original input, which the serialize_token docs explicitly warn is not normalized and discards the rewrite; no frozen case triggered that path." + }, + { + "trial_id": "trial-2", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Same implementation pattern as trial-1. Processor choice, decoded text handling, comment/attribute avoidance, split text-node behavior, special element avoidance, and normalized serialization are all aligned with the docs. Minor deduction for raw-input fallback after parser abort." + }, + { + "trial_id": "trial-3", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Same implementation pattern as trial-1. No undocumented API calls or _doing_it_wrong records. It follows the documented serialize-token rewrite recipe closely. Minor deduction for returning unnormalized raw input on unsupported parser errors." + } + ], + "failure_analysis": "All trials passed all 8 frozen cases, so there are no failed hidden cases to diagnose. The docs did well on this task: 'Which processor should I use?' points readers to WP_HTML_Processor when structure, implied closing tags, and normalized output matter; 'collect DOM-style text from a subtree' says to append only ordinary #text tokens and not use get_modifiable_text() as the text-node test; get_modifiable_text() clearly states decoded text semantics for #text/TITLE/TEXTAREA and raw semantics for SCRIPT/STYLE/comments; and serialize_token() explicitly describes token-by-token rewrites with added wrappers. The main near-miss is that every candidate copied a conservative get_last_error() fallback returning the original HTML. That is documented as preserving source bytes but not normalized output, so it would be wrong for an unsupported-markup case if the function contract still required normalized serialization. No provided test exercised unsupported-parser aborts.", + "doc_gaps": [ + { + "location": "html-processor.md / Recipe: rewrite while serializing tokens and serialize_token()", + "problem": "The docs correctly warn that returning original input discards the rewrite, but examples with string-returning functions can still lead models to choose raw-input fallback after get_last_error().", + "suggestion": "Add a short fallback policy table contrasting accumulated best-effort output, null/error sentinel, empty string, and original input, with explicit notes about which choices preserve normalization and which preserve source bytes only." + }, + { + "location": "html-processor.md / create_fragment()", + "problem": "The null-return guidance says to check before walking, but does not clarify how rare/null-producing conditions relate to the default BODY/UTF-8 path or normalized-output contracts.", + "suggestion": "Clarify that callers should choose a fallback consistent with their contract, and that returning raw input from a normalizer is not a normalized result." + }, + { + "location": "html-tag-processor.md / get_modifiable_text() and html-processor.md / serialize_token()", + "problem": "The decoded-text-read path and normalized-token-output path are documented separately; this task depended on combining them correctly.", + "suggestion": "Cross-reference the common pattern: inspect decoded get_modifiable_text() for #text matching, but emit serialize_token() when preserving normalized markup rather than rebuilding output from the decoded string." + } + ] + } + }, + { + "id": "T10-last-h2", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Tag_Processor for a flat class edit. All called APIs are documented: constructor, next_tag, set_bookmark, seek, add_class, release_bookmark, get_updated_html. The repeated single bookmark is idiomatic and all 6 hidden cases passed with no _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correct processor choice and fully documented API usage: constructor, next_tag, set_bookmark, has_bookmark, seek, add_class, release_bookmark, get_updated_html. This closely matches the documented bookmark pattern for remembering the last matched tag. All 6 hidden cases passed." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Tag_Processor and only documented methods. The has_bookmark/seek/add_class/get_updated_html flow is idiomatic, preserves existing classes via add_class, and handles the no-H2 case unchanged. All 6 hidden cases passed." + } + ], + "failure_analysis": "All trials passed every hidden case, so there were no failed-case misconceptions to attribute. The docs did especially well in three places: the WP_HTML_Tag_Processor introduction says this class is appropriate for flat attribute/class edits and is constructed with new WP_HTML_Tag_Processor($html); next_tag() documents forward-only token walking and case-insensitive tag-name queries; and set_bookmark() explicitly describes the common use of re-setting one named bookmark to remember the last matching tag before seeking back to edit it. The add_class() section also covered the existing-class case by stating that it creates class when absent and appends without removing or reordering existing classes. A near-miss is that candidates generally did not check set_bookmark()'s return value, but because they used one literal bookmark name this stayed within the documented safe idiom and caused no misuse.", + "doc_gaps": [ + { + "location": "html-tag-processor.md / set_bookmark()", + "problem": "The return value is documented, but examples that rely on one literal bookmark name do not show whether callers should check set_bookmark() failure in ordinary single-bookmark loops.", + "suggestion": "Clarify that reusing one literal bookmark name is expected to succeed unless the processor cannot allocate/bookmark the current token, and show a compact pattern either checking the boolean or using has_bookmark() after the scan." + }, + { + "location": "html-tag-processor.md / next_tag()", + "problem": "The docs explain incomplete-token behavior and that comments/text are not tags, but this is spread across several sections.", + "suggestion": "Add a short note near the string-query examples that next_tag('H2') matches real H2 tag openers only, not text inside comments or incomplete trailing syntax." + }, + { + "location": "html-tag-processor.md / add_class()", + "problem": "The behavior for existing classes is well described in prose, but the examples could make the append-preserve contract more visible.", + "suggestion": "Add a minimal before/after example showing add_class() on an element with an existing class attribute, emphasizing that existing class order is preserved and the new class is appended." + } + ] + } + }, + { + "id": "T11-strip-tracking-attributes", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Tag_Processor, next_tag(), get_attribute_names_with_prefix(), remove_attribute(), and get_updated_html(), all documented in the rendered Tag Processor docs. This is the correct flat attribute-editing processor choice, uses the documented prefix helper, preserves untouched bytes via get_updated_html(), handles the null return, and produced no _doing_it_wrong records. Execution passed 7/7." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation as trial-1. Correct processor and documented API only; idiomatic linear tag scan plus queued attribute removals and get_updated_html(). No misuse records. Execution passed 7/7." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same implementation as trial-1. Correct Tag Processor use for byte-preserving attribute edits, documented prefix enumeration, documented removal, and documented final serialization through get_updated_html(). No misuse records. Execution passed 7/7." + } + ], + "failure_analysis": "No hidden case failed in any trial. All trials passed single-link, multiple-tags, multiple-matching-attributes, similar-prefixes-kept, uppercase-source-attribute, comments-untouched, and no-matches. The docs did well in four places: the Tag Processor Overview / Which processor should I use? section explicitly says to use the Tag Processor for flat attribute and class edits with byte-exact preservation; next_tag() says it visits real tags while ignoring tag-like text in comments/raw text and preserving source casing; get_attribute_names_with_prefix() directly documents the needed helper, lowercase returned names, and case-insensitive matching; get_updated_html() explains that queued attribute edits are read back without normalizing untouched bytes. Near-misses were not failure-causing: the prefix helper return contract could be more explicit about empty array versus null, remove_attribute() could state its case-insensitive name matching in its own method docs, and the HTML Processor copy of inherited attribute methods could call out virtual-token behavior more clearly.", + "doc_gaps": [ + { + "location": "WP_HTML_Tag_Processor::get_attribute_names_with_prefix() and WP_HTML_Processor::get_attribute_names_with_prefix()", + "problem": "The return docs say null is returned when no tag opener is matched, but they do not explicitly state that a matched opener with zero matching attributes returns an empty array.", + "suggestion": "Add a sentence such as: \"Returns an empty array when currently matched on a real tag opener but no attribute names start with the prefix; returns null only when not matched on an eligible opener.\"" + }, + { + "location": "WP_HTML_Tag_Processor::remove_attribute()", + "problem": "The method-level doc does not state that attribute-name matching is ASCII case-insensitive/lowercased, even though this matters for source attributes written with uppercase or mixed-case names.", + "suggestion": "Add the same case-insensitive attribute-name contract used by the prefix helper, and mention that duplicate case-variant attributes in invalid source are removed together." + }, + { + "location": "WP_HTML_Processor inherited attribute method docs", + "problem": "The HTML Processor override for get_attribute_names_with_prefix() returns null on virtual tokens, but the rendered method text only mentions the no-opener case. This could confuse users doing structural walks over implied elements.", + "suggestion": "In the HTML Processor version, add a short note that inherited attribute mutation/enumeration methods operate only on tokens backed by source HTML and return false/null for virtual/implied tokens." + } + ] + } + }, + { + "id": "T12-unwrap-spans", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly chose `WP_HTML_Processor::create_fragment()` for a body fragment needing normalized serialization. All called methods are documented: `create_fragment`, `next_token`, `get_tag`, `serialize_token`, and `get_last_error`. The token-walk plus `serialize_token()` pattern is exactly the documented rewrite pattern, and using `get_tag()` alone to skip both SPAN openers and closers matches the `serialize_token()` example. Handles the unclosed-span case through the HTML Processor's virtual closer behavior." + }, + { + "trial_id": "trial-2", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Same correct processor and documented API usage as trial-1, with idiomatic token walking and `serialize_token()`. Minor adherence loss: on `create_fragment()` failure or parser abort it returns the original raw input. The docs allow fallback policies, but the `serialize_token()` guidance explicitly warns that returning original input is neither normalized nor the accumulated rewrite, so this is a near-miss for a function whose contract is normalized output." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly uses the HTML Processor fragment parser, a single `next_token()` loop, `get_tag()` to skip SPAN boundary tokens, and `serialize_token()` to emit normalized output. All API calls are present in the rendered docs and no `_doing_it_wrong` records occurred. The approach naturally handles nested spans, adjacent spans, discarded span attributes, and virtual closing of unclosed elements." + } + ], + "failure_analysis": "All three trials passed all seven hidden cases, so there are no failed hidden cases to attribute to misconceptions. The docs worked well for this task because the `HTML Support` overview tells readers to choose `WP_HTML_Processor` for structure and normalization, `create_fragment()` matches body-fragment input, `next_token()` explains that text and closing tokens are visited, and `serialize_token()` gives the key rewrite pattern: walk tokens, skip tokens to remove them, and append normalized serialization for the rest. The `next_token()` discussion of implicit/end-of-input closers explains why the unclosed-span case succeeds. The main near-miss is trial-2's raw-input fallback after parser failure; the relevant `serialize_token()` passage does warn that returning original input discards the rewrite and is not normalized, but the fallback-policy guidance could be sharper for normalized-output APIs. Another near-miss is that all candidates relied on `get_tag()` returning a tag name for closers and null for non-tags; this is demonstrated indirectly by the `serialize_token()` example, but the `get_tag()` contract itself does not spell out those `next_token()`-walk semantics.", + "doc_gaps": [ + { + "location": "`WP_HTML_Processor::get_tag()` and inherited `WP_HTML_Tag_Processor::get_tag()` docblocks", + "problem": "The method docs show `next_tag()` usage, but do not explicitly define behavior while walking with `next_token()`: start tags, end tags, virtual tags, and non-tag tokens are not distinguished in the contract text.", + "suggestion": "State that during a token walk `get_tag()` returns the uppercase element name for matched tag tokens, including closers and processor-created virtual tags, and returns `null` for text/comment/doctype tokens. Point readers to `get_token_type()` and `is_tag_closer()` when they need to distinguish token kind or opener versus closer." + }, + { + "location": "`WP_HTML_Processor::serialize_token()` docblock", + "problem": "The example teaches the correct skip-and-serialize pattern, but the general rule behind wrapper removal is implicit.", + "suggestion": "Add a short general note that skipping both boundary tokens for an element while serializing intervening tokens removes the wrapper but preserves its children; matching by `get_tag()` in a `next_token()` loop is one documented way to skip both opener and closer." + }, + { + "location": "`WP_HTML_Processor::serialize_token()` fallback guidance and `WP_HTML_Processor::get_last_error()` docblock", + "problem": "Fallback policy is documented as caller-defined, but it is easy to miss that returning original input violates normalized-output or rewrite-output contracts.", + "suggestion": "Add decision guidance: for APIs promising normalized or rewritten HTML, returning original source bytes after parser abort is a fail-open/source-preserving policy, not a normalized result. Recommend an explicit sentinel such as `null` or an empty string unless the caller intentionally wants fail-open behavior." + }, + { + "location": "`WP_HTML_Processor::next_token()` and `WP_HTML_Tag_Processor::paused_at_incomplete_token()` docs", + "problem": "The distinction between unclosed elements that receive virtual closers and truly incomplete trailing syntax is spread across multiple passages.", + "suggestion": "Add paired examples contrasting an unclosed element at end of input, which can still serialize with virtual closers and no last error, with an incomplete trailing token, which is not visited and requires `paused_at_incomplete_token()` if completeness matters." + } + ] + } + } + ] +} diff --git a/doc-experiment/results/round-43/codex-trials-output.json b/doc-experiment/results/round-43/codex-trials-output.json new file mode 100644 index 0000000000000..41463ba4a3f34 --- /dev/null +++ b/doc-experiment/results/round-43/codex-trials-output.json @@ -0,0 +1,383 @@ +{ + "subject_isolation": { + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-43/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." + }, + "result": [ + { + "id": "N03-first-list-count", + "trial": 1, + "ok": true, + "code": "next_tag() ) {\n $tag = $processor->get_tag();\n if ( 'UL' !== $tag && 'OL' !== $tag ) {\n continue;\n }\n\n if ( ! $processor->set_bookmark( 'first-list' ) ) {\n return $html;\n }\n\n $list_depth = $processor->get_current_depth();\n $item_count = 0;\n $finished_scan = false;\n\n while ( $processor->next_token() ) {\n if ( $processor->get_current_depth() < $list_depth ) {\n $finished_scan = true;\n break;\n }\n\n if (\n '#tag' === $processor->get_token_type() &&\n ! $processor->is_tag_closer() &&\n 'LI' === $processor->get_tag() &&\n $processor->get_current_depth() === $list_depth + 1\n ) {\n $item_count++;\n }\n }\n\n if ( ! $finished_scan || $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) {\n $processor->release_bookmark( 'first-list' );\n return $html;\n }\n\n if ( ! $processor->seek( 'first-list' ) ) {\n $processor->release_bookmark( 'first-list' );\n return $html;\n }\n\n $processor->set_attribute( 'data-item-count', (string) $item_count );\n $processor->release_bookmark( 'first-list' );\n\n return $processor->get_updated_html();\n }\n\n return $html;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because direct-child counting is tree-aware. It finds the first `UL` or `OL` with `next_tag()`, bookmarks its opener, walks forward with `next_token()` while comparing `get_current_depth()` to count only direct `LI` child openers, rejects the change if the list scan does not finish cleanly (`paused_at_incomplete_token()` or `get_last_error()`), then seeks back to the bookmarked list opener, sets `data-item-count` with `set_attribute()`, and returns the edited fragment via `get_updated_html()`.", + "confidence": 89 + }, + { + "id": "N03-first-list-count", + "trial": 2, + "ok": true, + "code": "next_tag() ) {\n $tag = $processor->get_tag();\n if ( 'UL' !== $tag && 'OL' !== $tag ) {\n continue;\n }\n\n if ( ! $processor->set_bookmark( 'first-list' ) ) {\n return $html;\n }\n\n $list_depth = $processor->get_current_depth();\n $count = 0;\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $list_depth ) {\n if (\n '#tag' === $processor->get_token_type() &&\n ! $processor->is_tag_closer() &&\n 'LI' === $processor->get_tag() &&\n $processor->get_current_depth() === $list_depth + 1\n ) {\n $count++;\n }\n }\n\n $scan_finished_cleanly =\n ! $processor->paused_at_incomplete_token() &&\n null === $processor->get_last_error();\n\n if ( ! $scan_finished_cleanly ) {\n $processor->release_bookmark( 'first-list' );\n return $html;\n }\n\n if ( ! $processor->seek( 'first-list' ) ) {\n $processor->release_bookmark( 'first-list' );\n return $html;\n }\n\n $processor->set_attribute( 'data-item-count', (string) $count );\n $processor->release_bookmark( 'first-list' );\n\n return $processor->get_updated_html();\n }\n\n return $html;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure. It finds the first `UL` or `OL` by scanning tags in document order, bookmarks that opener, records its depth with `get_current_depth()`, then walks forward with `next_token()` and counts only direct child `LI` openers using `get_token_type()`, `is_tag_closer()`, `get_tag()`, and depth equality. If the subtree scan ends with truncation or unsupported markup (`paused_at_incomplete_token()` or `get_last_error()`), it returns the original HTML unchanged; otherwise it seeks back, sets `data-item-count` via `set_attribute()`, and returns `get_updated_html()`.", + "confidence": 92 + }, + { + "id": "N03-first-list-count", + "trial": 3, + "ok": true, + "code": "next_tag() ) {\n $tag = $processor->get_tag();\n if ( 'UL' !== $tag && 'OL' !== $tag ) {\n continue;\n }\n\n if ( ! $processor->set_bookmark( 'first-list' ) ) {\n return $html;\n }\n\n $list_depth = $processor->get_current_depth();\n $count = 0;\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $list_depth ) {\n if (\n '#tag' === $processor->get_token_type() &&\n ! $processor->is_tag_closer() &&\n 'LI' === $processor->get_tag() &&\n $processor->get_current_depth() === $list_depth + 1\n ) {\n $count++;\n }\n }\n\n $scan_finished_cleanly =\n ! $processor->paused_at_incomplete_token() &&\n null === $processor->get_last_error();\n\n if ( ! $scan_finished_cleanly ) {\n $processor->release_bookmark( 'first-list' );\n return $html;\n }\n\n if ( ! $processor->seek( 'first-list' ) ) {\n $processor->release_bookmark( 'first-list' );\n return $html;\n }\n\n $processor->set_attribute( 'data-item-count', (string) $count );\n $processor->release_bookmark( 'first-list' );\n\n return $processor->get_updated_html();\n }\n\n return $html;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure. It scans forward with `next_tag()` to find the first `UL` or `OL`, bookmarks that opener with `set_bookmark()`, records its depth with `get_current_depth()`, then walks the subtree with `next_token()` and counts only direct child `LI` openers by checking `get_token_type()`, `is_tag_closer()`, `get_tag()`, and depth. It rejects truncated or unsupported scans via `paused_at_incomplete_token()` and `get_last_error()`, then seeks back with `seek()`, writes `data-item-count` using `set_attribute()`, and returns `get_updated_html()`.", + "confidence": 90 + }, + { + "id": "N04-normalize-or-placeholder", + "trial": 1, + "ok": true, + "code": "Unsupported HTML

                      '\n\t\t: $normalized;\n}\n", + "explanation": "This uses the documented `WP_HTML_Processor::normalize()` API, which normalizes an HTML fragment in BODY context and returns `null` when it cannot normalize unsupported input. The function simply returns that normalized serialization when available, otherwise it returns the exact required fallback HTML.", + "confidence": 98 + }, + { + "id": "N04-normalize-or-placeholder", + "trial": 2, + "ok": true, + "code": "Unsupported HTML

                      ';\n }\n\n return $normalized;\n}\n", + "explanation": "This uses the documented `WP_HTML_Processor::normalize()` API, which normalizes an HTML fragment in `` context and returns `null` when it cannot normalize due to unsupported input. The function maps that `null` case to the required fallback HTML and otherwise returns the normalized serialization.", + "confidence": 98 + }, + { + "id": "N04-normalize-or-placeholder", + "trial": 3, + "ok": true, + "code": "Unsupported HTML

                      ';\n\t}\n\n\treturn $normalized;\n}\n", + "explanation": "This uses the documented `WP_HTML_Processor::normalize()` API, which normalizes a BODY-context HTML fragment and returns `null` when the HTML Processor cannot normalize it. The function maps that `null` case to the required fallback HTML and otherwise returns the normalized serialization.", + "confidence": 98 + }, + { + "id": "N06-extract-toc", + "trial": 1, + "ok": true, + "code": " 1,\n 'H2' => 2,\n 'H3' => 3,\n 'H4' => 4,\n 'H5' => 5,\n 'H6' => 6,\n );\n\n while ( $processor->next_token() ) {\n $tag = $processor->get_tag();\n\n if ( null !== $tag && isset( $heading_tags[ $tag ] ) ) {\n if ( $processor->is_tag_closer() ) {\n if ( null !== $current_heading ) {\n $toc[] = $current_heading;\n $current_heading = null;\n }\n } else {\n $current_heading = array(\n 'level' => $heading_tags[ $tag ],\n 'text' => '',\n );\n }\n\n continue;\n }\n\n if ( null === $current_heading ) {\n continue;\n }\n\n if ( '#text' === $processor->get_token_type() ) {\n $current_heading['text'] .= $processor->get_modifiable_text();\n continue;\n }\n\n if ( '#tag' === $processor->get_token_type() ) {\n $token_name = $processor->get_token_name();\n if ( null !== $token_name && ! $processor->is_tag_closer() && in_array( $token_name, array( 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE' ), true ) ) {\n $current_heading['text'] .= $processor->get_modifiable_text();\n }\n }\n }\n\n if ( null !== $processor->get_last_error() ) {\n return array();\n }\n\n return $toc;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with tree awareness, then walks the document once with `next_token()`. It starts a TOC entry on heading openers, appends heading text from `#text` tokens via `get_modifiable_text()`, includes text carried on special-element opener tokens when they appear inside headings, and closes each entry on the corresponding heading closer using `get_tag()`, `get_token_type()`, `get_token_name()`, and `is_tag_closer()`.", + "confidence": 84 + }, + { + "id": "N06-extract-toc", + "trial": 2, + "ok": true, + "code": "next_token() ) {\n $token_type = $processor->get_token_type();\n $tag_name = $processor->get_tag();\n\n if ( '#tag' === $token_type && null !== $tag_name ) {\n if ( ! $processor->is_tag_closer() && preg_match( '/^H([1-6])$/', $tag_name, $matches ) ) {\n $current_level = (int) $matches[1];\n $current_tag = $tag_name;\n $current_text = '';\n continue;\n }\n\n if ( null !== $current_level && $processor->is_tag_closer() && $tag_name === $current_tag ) {\n $toc[] = array(\n 'level' => $current_level,\n 'text' => $current_text,\n );\n\n $current_level = null;\n $current_tag = null;\n $current_text = '';\n continue;\n }\n\n if ( null !== $current_level && ! $processor->is_tag_closer() && in_array( $tag_name, $special_textual, true ) ) {\n $current_text .= $processor->get_modifiable_text();\n continue;\n }\n }\n\n if ( null !== $current_level && '#text' === $token_type ) {\n $current_text .= $processor->get_modifiable_text();\n }\n }\n\n if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) {\n return array();\n }\n\n return $toc;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because heading text extraction is structure-aware. It walks the fragment once with `next_token()`, starts collecting when it sees an `H1` through `H6` opener, appends text from `#text` tokens via `get_modifiable_text()`, also includes text carried on special textual element openers (`SCRIPT`, `STYLE`, `TEXTAREA`, `TITLE`), and finalizes each entry when the matching heading closer is visited. It also checks `paused_at_incomplete_token()` and `get_last_error()` to avoid returning a partial TOC from truncated or unsupported input.", + "confidence": 79 + }, + { + "id": "N06-extract-toc", + "trial": 3, + "ok": true, + "code": "next_tag() ) {\n $tag = $processor->get_tag();\n\n if ( null === $tag || ! preg_match( '/^H([1-6])$/', $tag, $matches ) ) {\n continue;\n }\n\n $level = (int) $matches[1];\n $depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n $toc[] = array(\n 'level' => $level,\n 'text' => $text,\n );\n\n if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) {\n break;\n }\n }\n\n return $toc;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because heading text extraction is tree-aware. It scans heading openers with `next_tag()`, records each heading level from `get_tag()`, then walks that heading’s subtree with `next_token()` bounded by `get_current_depth()`, appending only ordinary `#text` token content via `get_token_type()` and `get_modifiable_text()` so nested markup contributes text but not tags.", + "confidence": 89 + }, + { + "id": "T01-add-image-class", + "trial": 1, + "ok": true, + "code": "next_tag( 'img' ) ) {\n $processor->add_class( 'wp-image' );\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor`, which is the documented API for flat, byte-preserving attribute/class edits. It scans forward with `next_tag( 'img' )`, which matches `IMG` tags case-insensitively and ignores tag-like text inside comments or raw-text contexts, then applies `add_class( 'wp-image' )` to each match and returns the result with `get_updated_html()`.", + "confidence": 97 + }, + { + "id": "T01-add-image-class", + "trial": 2, + "ok": true, + "code": "next_tag( 'img' ) ) {\n $processor->add_class( 'wp-image' );\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat, byte-preserving attribute/class edit. The function constructs the processor with the input HTML, loops through every `IMG` tag via `next_tag( 'img' )` (documented as case-insensitive and as ignoring tag-like text inside comments), adds the class with `add_class( 'wp-image' )`, and returns the result with `get_updated_html()` so untouched bytes remain unchanged.", + "confidence": 98 + }, + { + "id": "T01-add-image-class", + "trial": 3, + "ok": true, + "code": "next_tag( 'img' ) ) {\n $processor->add_class( 'wp-image' );\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor`, which is the documented API for byte-preserving attribute/class edits. The function linearly scans with `next_tag( 'img' )` to match real `IMG` tags case-insensitively, skips comment text automatically, adds `wp-image` with `add_class()`, and returns the minimally modified result with `get_updated_html()`.", + "confidence": 98 + }, + { + "id": "T02-link-targets", + "trial": 1, + "ok": true, + "code": "next_tag( 'A' ) ) {\n if ( null === $processor->get_attribute( 'href' ) ) {\n continue;\n }\n\n $processor->set_attribute( 'target', '_blank' );\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat, byte-preserving attribute edit pass. It scans each `A` tag with `next_tag( 'A' )`, checks `get_attribute( 'href' )` so empty-string and boolean `href` still count as present, overwrites or adds `target` via `set_attribute( 'target', '_blank' )`, and returns the result with `get_updated_html()`.", + "confidence": 97 + }, + { + "id": "T02-link-targets", + "trial": 2, + "ok": true, + "code": "next_tag( 'A' ) ) {\n if ( null === $processor->get_attribute( 'href' ) ) {\n continue;\n }\n\n $processor->set_attribute( 'target', '_blank' );\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` for byte-preserving attribute edits: it scans only `A` tags with `next_tag( 'A' )`, checks `get_attribute( 'href' )` so empty-string and boolean `href` still count as present, overwrites or inserts `target` via `set_attribute( 'target', '_blank' )`, and returns the result with `get_updated_html()`.", + "confidence": 97 + }, + { + "id": "T02-link-targets", + "trial": 3, + "ok": true, + "code": "next_tag( 'a' ) ) {\n if ( null === $processor->get_attribute( 'href' ) ) {\n continue;\n }\n\n $processor->set_attribute( 'target', '_blank' );\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` for byte-preserving attribute edits: it scans forward with `next_tag( 'a' )`, checks `get_attribute( 'href' ) !== null` so `href=\"\"` and boolean `href` both count as present, then overwrites or inserts `target` via `set_attribute( 'target', '_blank' )`, and finally returns the result with `get_updated_html()`.", + "confidence": 97 + }, + { + "id": "T03-first-h1-text", + "trial": 1, + "ok": true, + "code": "next_tag( 'H1' ) ) {\n return null;\n }\n\n $text = '';\n $h1_depth = $processor->get_current_depth();\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n continue;\n }\n\n if ( '#tag' !== $processor->get_token_type() || $processor->is_tag_closer() ) {\n continue;\n }\n\n $tag_name = $processor->get_token_name();\n if ( in_array( $tag_name, array( 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE', 'NOEMBED', 'NOFRAMES', 'XMP' ), true ) ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation. It finds the first `H1` with `next_tag()`, records that element’s depth with `get_current_depth()`, then walks its subtree with `next_token()`, appending decoded text from `#text` tokens via `get_modifiable_text()`. It also includes text carried on special text-containing element opener tokens, since those do not produce separate `#text` child tokens in the HTML Processor.", + "confidence": 85 + }, + { + "id": "T03-first-h1-text", + "trial": 2, + "ok": true, + "code": "next_tag( 'H1' ) ) {\n return null;\n }\n\n $depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n continue;\n }\n\n if ( '#tag' !== $processor->get_token_type() || $processor->is_tag_closer() ) {\n continue;\n }\n\n $tag_name = $processor->get_tag();\n if (\n 'SCRIPT' === $tag_name ||\n 'STYLE' === $tag_name ||\n 'TEXTAREA' === $tag_name ||\n 'TITLE' === $tag_name\n ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation. It finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, then walks that subtree with `next_token()`, concatenating decoded `#text` token content via `get_modifiable_text()`. It also includes text carried on special element opener tokens (`SCRIPT`, `STYLE`, `TEXTAREA`, `TITLE`), since those elements do not expose separate `#text` child tokens in this API.", + "confidence": 88 + }, + { + "id": "T03-first-h1-text", + "trial": 3, + "ok": true, + "code": "next_tag( 'H1' ) ) {\n return null;\n }\n\n $h1_depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation, then finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, and walks its subtree with `next_token()`, concatenating only `#text` tokens via `get_token_type()` and `get_modifiable_text()`, which already returns decoded text.", + "confidence": 90 + }, + { + "id": "T04-build-figure", + "trial": 1, + "ok": true, + "code": "\"\"
                      .
                      ' );\n\n if ( $processor->next_tag( 'img' ) ) {\n $processor->set_attribute( 'src', $url );\n $processor->set_attribute( 'alt', $alt );\n }\n\n while ( $processor->next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $processor->set_modifiable_text( $caption );\n break;\n }\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` with a literal template so the `img` attributes already exist in `src` then `alt` order, then updates them with `set_attribute()` so the API performs the required HTML encoding. It then scans tokens with `next_token()` to find the placeholder `#text` node inside `figcaption` and replaces it with `set_modifiable_text()`, finally returning the result with `get_updated_html()`.", + "confidence": 95 + }, + { + "id": "T04-build-figure", + "trial": 2, + "ok": true, + "code": "\"\"
                      .
                      ' );\n\n if ( $processor->next_tag( 'img' ) ) {\n $processor->set_attribute( 'src', $url );\n $processor->set_attribute( 'alt', $alt );\n }\n\n while ( $processor->next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $processor->set_modifiable_text( $caption );\n break;\n }\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This builds a fixed HTML template with the required tag structure and predeclared `src` and `alt` attributes so their order stays `src` then `alt`, then uses `WP_HTML_Tag_Processor::next_tag()` and `set_attribute()` to fill those attributes and `next_token()`, `get_token_type()`, and `set_modifiable_text()` to replace a placeholder text node inside `figcaption`, relying on the HTML API to perform all necessary escaping.", + "confidence": 94 + }, + { + "id": "T04-build-figure", + "trial": 3, + "ok": true, + "code": "\"\"
                      .
                      ' );\n\n if ( $processor->next_tag( 'img' ) ) {\n $processor->set_attribute( 'src', $url );\n $processor->set_attribute( 'alt', $alt );\n }\n\n while ( $processor->next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $processor->set_modifiable_text( $caption );\n break;\n }\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This builds from a fixed HTML template so the `img` attributes already exist in the required `src` then `alt` order, then uses `WP_HTML_Tag_Processor::next_tag()` and `set_attribute()` to safely encode the attribute values, `next_token()` plus `get_token_type()` to find the placeholder text node inside `figcaption`, and `set_modifiable_text()` to safely encode the caption before returning `get_updated_html()`.", + "confidence": 95 + }, + { + "id": "T05-text-excerpt", + "trial": 1, + "ok": true, + "code": "next_token() ) {\n $chunk = '';\n\n if ( '#text' === $processor->get_token_type() ) {\n $chunk = $processor->get_modifiable_text();\n } elseif ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) {\n $token_name = $processor->get_token_name();\n if ( 'TEXTAREA' === $token_name || 'TITLE' === $token_name ) {\n $chunk = $processor->get_modifiable_text();\n }\n }\n\n if ( '' === $chunk ) {\n continue;\n }\n\n $text .= $chunk;\n\n if ( mb_strlen( $text, 'UTF-8' ) > $max_codepoints ) {\n return mb_substr( $text, 0, $max_codepoints, 'UTF-8' );\n }\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, walks tokens with `next_token()`, appends decoded text from ordinary `#text` tokens via `get_modifiable_text()`, and also explicitly includes decoded opener text for `TITLE` and `TEXTAREA` while excluding other special elements like `SCRIPT` and `STYLE`. Truncation is done with `mb_strlen()` and `mb_substr()` using explicit `UTF-8` so the result is limited by Unicode code points without splitting multibyte characters.", + "confidence": 90 + }, + { + "id": "T05-text-excerpt", + "trial": 2, + "ok": true, + "code": " 0 && $processor->next_token() ) {\n $chunk = '';\n\n if ( '#text' === $processor->get_token_type() ) {\n $chunk = $processor->get_modifiable_text();\n } elseif ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) {\n $tag_name = $processor->get_tag();\n if ( 'TEXTAREA' === $tag_name || 'TITLE' === $tag_name ) {\n $chunk = $processor->get_modifiable_text();\n }\n }\n\n if ( '' === $chunk ) {\n continue;\n }\n\n if ( 1 !== preg_match_all( '/./us', $chunk, $matches ) ) {\n continue;\n }\n\n $codepoints = $matches[0];\n $count = count( $codepoints );\n\n if ( $count <= $remaining ) {\n $excerpt .= $chunk;\n $remaining -= $count;\n continue;\n }\n\n $excerpt .= implode( '', array_slice( $codepoints, 0, $remaining ) );\n break;\n }\n\n return $excerpt;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-fragment structurally, then walks tokens with `next_token()`, appending decoded text from ordinary `#text` tokens plus the special text-bearing `TITLE` and `TEXTAREA` opening tags via `get_modifiable_text()`, while naturally excluding `SCRIPT` and `STYLE`. Truncation is done by counting Unicode code points with a UTF-8 regex so multi-byte characters are never split.", + "confidence": 83 + }, + { + "id": "T05-text-excerpt", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n continue;\n }\n\n if ( '#tag' !== $processor->get_token_type() || $processor->is_tag_closer() ) {\n continue;\n }\n\n $tag_name = $processor->get_token_name();\n if ( 'TITLE' === $tag_name || 'TEXTAREA' === $tag_name ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n if ( mb_strlen( $text, 'UTF-8' ) <= $max_codepoints ) {\n return $text;\n }\n\n return mb_substr( $text, 0, $max_codepoints, 'UTF-8' );\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment with structural awareness, then walks tokens with `next_token()`, appending decoded text from ordinary `#text` tokens and, per the documented special-case behavior, from `TITLE` and `TEXTAREA` opening tags via `get_modifiable_text()`. It excludes `SCRIPT` and `STYLE` by not whitelisting them, preserves whitespace exactly as reported by the parser, and truncates by Unicode code points with `mb_strlen()`/`mb_substr()` in UTF-8.", + "confidence": 95 + }, + { + "id": "T06-collect-links", + "trial": 1, + "ok": true, + "code": "next_token() ) {\n if ( '#tag' === $processor->get_token_type() && 'A' === $processor->get_tag() ) {\n if ( $processor->is_tag_closer() ) {\n array_pop( $anchor_map );\n continue;\n }\n\n $href = $processor->get_attribute( 'href' );\n if ( is_string( $href ) ) {\n $links[] = array(\n 'href' => $href,\n 'text' => '',\n );\n $anchor_map[] = count( $links ) - 1;\n } else {\n $anchor_map[] = null;\n }\n\n continue;\n }\n\n if ( '#text' === $processor->get_token_type() ) {\n $text = $processor->get_modifiable_text();\n\n foreach ( $anchor_map as $link_index ) {\n if ( null !== $link_index ) {\n $links[ $link_index ]['text'] .= $text;\n }\n }\n }\n }\n\n return $links;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-style fragment, then walks tokens once with `next_token()`. On each opening `A` tag it reads `href` with `get_attribute()` and includes only string-valued attributes, tracks open anchors with a stack keyed to result indexes, appends decoded text from `#text` tokens via `get_modifiable_text()`, and pops tracked anchors on `A` closing tags using `is_tag_closer()`.", + "confidence": 92 + }, + { + "id": "T06-collect-links", + "trial": 2, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text = $processor->get_modifiable_text();\n if ( '' !== $text ) {\n foreach ( $open_links as $link_index ) {\n if ( null !== $link_index ) {\n $links[ $link_index ]['text'] .= $text;\n }\n }\n }\n continue;\n }\n\n if ( '#tag' !== $processor->get_token_type() || 'A' !== $processor->get_tag() ) {\n continue;\n }\n\n if ( $processor->is_tag_closer() ) {\n array_pop( $open_links );\n continue;\n }\n\n $href = $processor->get_attribute( 'href' );\n if ( is_string( $href ) ) {\n $links[] = array(\n 'href' => $href,\n 'text' => '',\n );\n $open_links[] = count( $links ) - 1;\n } else {\n $open_links[] = null;\n }\n }\n\n if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) {\n return array();\n }\n\n return $links;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with tree awareness, then walks the document once with `next_token()`. On each `A` opener it reads `href` via `get_attribute()` and records only string-valued attributes; on `#text` tokens it appends decoded text from `get_modifiable_text()` to every currently open collected link, and on `A` closers it pops the corresponding open link state. It returns an empty array if fragment parsing cannot be created, pauses on incomplete input, or aborts on unsupported markup.", + "confidence": 87 + }, + { + "id": "T06-collect-links", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n if ( '#tag' === $processor->get_token_type() && 'A' === $processor->get_tag() ) {\n if ( $processor->is_tag_closer() ) {\n array_pop( $link_stack );\n continue;\n }\n\n $href = $processor->get_attribute( 'href' );\n if ( is_string( $href ) ) {\n $links[] = array(\n 'href' => $href,\n 'text' => '',\n );\n $link_stack[] = count( $links ) - 1;\n } else {\n $link_stack[] = null;\n }\n\n continue;\n }\n\n if ( '#text' === $processor->get_token_type() && ! empty( $link_stack ) ) {\n $text = $processor->get_modifiable_text();\n\n foreach ( $link_stack as $link_index ) {\n if ( null !== $link_index ) {\n $links[ $link_index ]['text'] .= $text;\n }\n }\n }\n }\n\n if ( null !== $processor->get_last_error() ) {\n return array();\n }\n\n return $links;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment with tree awareness, then walks the document once with `next_token()`. On each `A` opener it reads `href` with `get_attribute()` and keeps only string-valued attributes, and on each `#text` token it appends decoded text from `get_modifiable_text()` to the currently open qualifying link entries until their closing `A` tags are reached.", + "confidence": 90 + }, + { + "id": "T07-nested-lists", + "trial": 1, + "ok": true, + "code": "next_tag() ) {\n $tag = $processor->get_tag();\n\n if ( 'UL' !== $tag && 'OL' !== $tag ) {\n continue;\n }\n\n $breadcrumbs = $processor->get_breadcrumbs();\n $last_index = count( $breadcrumbs ) - 1;\n $is_nested = false;\n\n for ( $i = 0; $i < $last_index; $i++ ) {\n if ( 'UL' === $breadcrumbs[ $i ] || 'OL' === $breadcrumbs[ $i ] ) {\n $is_nested = true;\n break;\n }\n }\n\n if ( $is_nested ) {\n $processor->add_class( 'nested-list' );\n }\n }\n\n if ( null !== $processor->get_last_error() ) {\n return $html;\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure. It walks list openers with `next_tag()`, checks each matched `UL` or `OL` element’s `get_breadcrumbs()` path for any earlier `UL` or `OL` ancestor, and applies `add_class( 'nested-list' )` only to those nested lists. The final HTML is returned with `get_updated_html()`, and unsupported-parser failure is handled by returning the original input unchanged.", + "confidence": 89 + }, + { + "id": "T07-nested-lists", + "trial": 2, + "ok": true, + "code": "next_tag() ) {\n $tag = $processor->get_tag();\n\n if ( 'UL' !== $tag && 'OL' !== $tag ) {\n continue;\n }\n\n $breadcrumbs = $processor->get_breadcrumbs();\n $has_list_ancestor = false;\n $last_index = count( $breadcrumbs ) - 1;\n\n for ( $i = 0; $i < $last_index; $i++ ) {\n if ( 'UL' === $breadcrumbs[ $i ] || 'OL' === $breadcrumbs[ $i ] ) {\n $has_list_ancestor = true;\n break;\n }\n }\n\n if ( $has_list_ancestor ) {\n $processor->add_class( 'nested-list' );\n }\n }\n\n if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) {\n return $html;\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure, then walks list openers with `next_tag()`, checks `get_breadcrumbs()` for any `UL` or `OL` ancestor above the current list, and applies `add_class( 'nested-list' )` only to those nested lists. It returns `get_updated_html()` for byte-preserving edits, but falls back to the original HTML if parsing was incomplete or aborted by checking `paused_at_incomplete_token()` and `get_last_error()`.", + "confidence": 88 + }, + { + "id": "T07-nested-lists", + "trial": 3, + "ok": true, + "code": "next_tag() ) {\n $tag = $processor->get_tag();\n\n if ( 'UL' !== $tag && 'OL' !== $tag ) {\n continue;\n }\n\n $breadcrumbs = $processor->get_breadcrumbs();\n array_pop( $breadcrumbs );\n\n foreach ( $breadcrumbs as $ancestor_tag ) {\n if ( 'UL' === $ancestor_tag || 'OL' === $ancestor_tag ) {\n $processor->add_class( 'nested-list' );\n break;\n }\n }\n }\n\n if ( null !== $processor->get_last_error() ) {\n return $html;\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure. It scans list openers with `next_tag()`, checks each matched tag name with `get_tag()`, inspects ancestor structure via `get_breadcrumbs()`, adds `nested-list` with `add_class()` when any ancestor is `UL` or `OL`, and returns the byte-preserving edited result with `get_updated_html()`. If the HTML Processor aborts on unsupported markup, it falls back to the original input by checking `get_last_error()`.", + "confidence": 92 + }, + { + "id": "T08-table-extract", + "trial": 1, + "ok": true, + "code": "next_tag( 'TABLE' ) ) {\n return array();\n }\n\n $table_depth = $processor->get_current_depth();\n $rows = array();\n $current_row = null;\n $current_cell = null;\n $special_text_tags = array( 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE', 'IFRAME', 'NOSCRIPT', 'NOEMBED', 'NOFRAMES', 'XMP' );\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) {\n $token_type = $processor->get_token_type();\n $tag_name = $processor->get_tag();\n\n if ( '#tag' === $token_type ) {\n if ( 'TR' === $tag_name ) {\n if ( $processor->is_tag_closer() ) {\n if ( null !== $current_row ) {\n $rows[] = $current_row;\n $current_row = null;\n }\n } else {\n $current_row = array();\n }\n\n continue;\n }\n\n if ( 'TD' === $tag_name || 'TH' === $tag_name ) {\n if ( $processor->is_tag_closer() ) {\n if ( null !== $current_row && null !== $current_cell ) {\n $current_row[] = $current_cell;\n $current_cell = null;\n }\n } else {\n $current_cell = '';\n }\n\n continue;\n }\n\n if (\n null !== $current_cell &&\n ! $processor->is_tag_closer() &&\n in_array( $tag_name, $special_text_tags, true )\n ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n\n continue;\n }\n\n if ( null !== $current_cell && '#text' === $token_type ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n }\n\n if ( null !== $processor->get_last_error() ) {\n return array();\n }\n\n return $rows;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, finds the first `TABLE` with `next_tag()`, then walks tokens with `next_token()` while staying inside that table by comparing `get_current_depth()`. It builds rows on `TR` open/close tokens, builds cells on `TD`/`TH` open/close tokens, and concatenates cell text from `#text` tokens via `get_modifiable_text()`, also including documented special-element text carried on opener tokens such as `TEXTAREA` and `SCRIPT`.", + "confidence": 88 + }, + { + "id": "T08-table-extract", + "trial": 2, + "ok": true, + "code": "next_tag( 'TABLE' ) ) {\n return array();\n }\n\n $table_depth = $processor->get_current_depth();\n $rows = array();\n $current_row = null;\n $current_cell = null;\n\n while ( $processor->next_token() ) {\n if ( $processor->get_current_depth() < $table_depth ) {\n break;\n }\n\n $token_type = $processor->get_token_type();\n $token_name = $processor->get_token_name();\n\n if ( '#tag' === $token_type ) {\n if ( 'TR' === $token_name ) {\n if ( $processor->is_tag_closer() ) {\n if ( null !== $current_cell && null !== $current_row ) {\n $current_row[] = $current_cell;\n $current_cell = null;\n }\n\n if ( null !== $current_row ) {\n $rows[] = $current_row;\n $current_row = null;\n }\n } else {\n $current_row = array();\n $current_cell = null;\n }\n\n continue;\n }\n\n if ( 'TD' === $token_name || 'TH' === $token_name ) {\n if ( null === $current_row ) {\n continue;\n }\n\n if ( $processor->is_tag_closer() ) {\n if ( null !== $current_cell ) {\n $current_row[] = $current_cell;\n $current_cell = null;\n }\n } else {\n $current_cell = '';\n }\n\n continue;\n }\n }\n\n if ( '#text' === $token_type && null !== $current_cell ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n }\n\n if ( null !== $current_cell && null !== $current_row ) {\n $current_row[] = $current_cell;\n }\n\n if ( null !== $current_row ) {\n $rows[] = $current_row;\n }\n\n if ( null !== $processor->get_last_error() ) {\n return array();\n }\n\n return $rows;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with browser-like table handling, `next_tag('TABLE')` to find the first table, then a single `next_token()` walk bounded by the table depth from `get_current_depth()`. It starts rows on `TR` openers, starts cells on `TD`/`TH` openers, and concatenates only `#text` token contents via `get_modifiable_text()`, which the docs say already returns decoded text.", + "confidence": 86 + }, + { + "id": "T08-table-extract", + "trial": 3, + "ok": true, + "code": "next_tag( 'TABLE' ) ) {\n return array();\n }\n\n $table_depth = $processor->get_current_depth();\n $rows = array();\n $current_row = null;\n $current_cell = null;\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n if ( null !== $current_cell ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n continue;\n }\n\n if ( '#tag' !== $processor->get_token_type() ) {\n continue;\n }\n\n $tag_name = $processor->get_tag();\n if ( null === $tag_name ) {\n continue;\n }\n\n if ( $processor->is_tag_closer() ) {\n if ( ( 'TD' === $tag_name || 'TH' === $tag_name ) && null !== $current_row && null !== $current_cell ) {\n $current_row[] = $current_cell;\n $current_cell = null;\n } elseif ( 'TR' === $tag_name && null !== $current_row ) {\n $rows[] = $current_row;\n $current_row = null;\n }\n continue;\n }\n\n if ( 'TR' === $tag_name ) {\n $current_row = array();\n continue;\n }\n\n if ( 'TD' === $tag_name || 'TH' === $tag_name ) {\n if ( null === $current_row ) {\n $current_row = array();\n }\n $current_cell = '';\n continue;\n }\n\n if (\n null !== $current_cell &&\n ( 'SCRIPT' === $tag_name || 'STYLE' === $tag_name || 'TEXTAREA' === $tag_name || 'TITLE' === $tag_name )\n ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n }\n\n return $rows;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the fragment with browser-like table handling, finds the first `TABLE` with `next_tag()`, then walks its subtree with `next_token()` bounded by `get_current_depth()`. It builds rows on `TR` open/close, cells on `TD`/`TH` open/close, appends decoded `#text` token content via `get_modifiable_text()`, and also includes text carried on special raw-text element openers inside cells.", + "confidence": 84 + }, + { + "id": "T09-mark-keyword", + "trial": 1, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text = $processor->get_modifiable_text();\n\n if ( false !== strpos( $text, $keyword ) ) {\n $output .= '' . $processor->serialize_token() . '';\n continue;\n }\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return $html;\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment with full structure awareness, then walks tokens with `next_token()`. For ordinary `#text` tokens only, it checks the decoded text from `get_modifiable_text()` with `strpos()` and, when matched, wraps that token’s normalized serialization from `serialize_token()` in `...`; all other tokens are emitted unchanged via `serialize_token()`, which yields normalized HTML and automatically avoids wrapping special text-bearing elements because their contents are not exposed as `#text` child tokens.", + "confidence": 87 + }, + { + "id": "T09-mark-keyword", + "trial": 2, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text = $processor->get_modifiable_text();\n\n if ( false !== strpos( $text, $keyword ) ) {\n $output .= '' . $processor->serialize_token() . '';\n continue;\n }\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return $html;\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-context fragment, then walks tokens with `next_token()`. For each ordinary `#text` token, it checks the decoded text from `get_modifiable_text()` for the case-sensitive keyword and, when matched, emits `` around that token’s normalized serialization from `serialize_token()`. All other tokens are emitted unchanged via `serialize_token()`, producing normalized HTML while naturally excluding special text-bearing elements like `script`, `style`, `title`, and `textarea`, whose text is not exposed as ordinary `#text` tokens.", + "confidence": 84 + }, + { + "id": "T09-mark-keyword", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text = $processor->get_modifiable_text();\n\n if ( false !== strpos( $text, $keyword ) ) {\n $output .= '' . $processor->serialize_token() . '';\n continue;\n }\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return $html;\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` and a token-by-token rewrite with `next_token()`, `get_token_type()`, `get_modifiable_text()`, and `serialize_token()`. Ordinary `#text` tokens are matched against the decoded text and wrapped with literal `` tags, while every other token is emitted via `serialize_token()`, which preserves the HTML Processor’s normalized serialization and automatically excludes special text-bearing elements because their contents are not exposed as ordinary `#text` child tokens.", + "confidence": 82 + }, + { + "id": "T10-last-h2", + "trial": 1, + "ok": true, + "code": "next_tag( 'H2' ) ) {\n $processor->set_bookmark( 'last-h2' );\n $found_h2 = true;\n }\n\n if ( ! $found_h2 ) {\n return $html;\n }\n\n if ( $processor->seek( 'last-h2' ) ) {\n $processor->add_class( 'final-section' );\n }\n\n $processor->release_bookmark( 'last-h2' );\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat position-based edit: it scans forward with `next_tag( 'H2' )`, reuses a single bookmark via `set_bookmark()` to remember the last matching opener, seeks back to that bookmark with `seek()`, adds the class with `add_class()`, and returns the modified markup with `get_updated_html()`.", + "confidence": 95 + }, + { + "id": "T10-last-h2", + "trial": 2, + "ok": true, + "code": "next_tag( 'H2' ) ) {\n $processor->set_bookmark( 'last-h2' );\n }\n\n if ( $processor->has_bookmark( 'last-h2' ) && $processor->seek( 'last-h2' ) ) {\n $processor->add_class( 'final-section' );\n $processor->release_bookmark( 'last-h2' );\n\n return $processor->get_updated_html();\n }\n\n return $html;\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat position-based edit: scan forward with `next_tag( 'H2' )`, move a single bookmark named `last-h2` each time to remember the latest matching opener, then `seek()` back to that bookmark and call `add_class( 'final-section' )`. If no `H2` bookmark was ever set, the original HTML is returned unchanged.", + "confidence": 97 + }, + { + "id": "T10-last-h2", + "trial": 3, + "ok": true, + "code": "next_tag( 'H2' ) ) {\n $processor->set_bookmark( 'last-h2' );\n }\n\n if ( ! $processor->has_bookmark( 'last-h2' ) ) {\n return $html;\n }\n\n if ( $processor->seek( 'last-h2' ) ) {\n $processor->add_class( 'final-section' );\n }\n\n $processor->release_bookmark( 'last-h2' );\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat, position-based edit. It scans forward with `next_tag( 'H2' )`, repeatedly moves the same bookmark with `set_bookmark()` to remember the last matching heading, then `seek()`s back to that bookmarked tag, adds the class with `add_class()`, and returns the edited markup via `get_updated_html()`.", + "confidence": 95 + }, + { + "id": "T11-strip-tracking-attributes", + "trial": 1, + "ok": true, + "code": "next_tag() ) {\n $attributes = $processor->get_attribute_names_with_prefix( 'data-track-' );\n\n if ( null === $attributes ) {\n continue;\n }\n\n foreach ( $attributes as $attribute_name ) {\n $processor->remove_attribute( $attribute_name );\n }\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat scan of every tag opener with `next_tag()`, then calls the documented `get_attribute_names_with_prefix( 'data-track-' )` helper to find only attributes whose names start with that exact prefix and removes each one with `remove_attribute()`. The final HTML is returned with `get_updated_html()`, preserving all untouched bytes.", + "confidence": 98 + }, + { + "id": "T11-strip-tracking-attributes", + "trial": 2, + "ok": true, + "code": "next_tag() ) {\n $attributes = $processor->get_attribute_names_with_prefix( 'data-track-' );\n\n if ( null === $attributes ) {\n continue;\n }\n\n foreach ( $attributes as $attribute_name ) {\n $processor->remove_attribute( $attribute_name );\n }\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat, tag-by-tag scan of the input, which is the documented choice for attribute edits that should otherwise preserve the original HTML bytes. For each matched tag, it calls `get_attribute_names_with_prefix( 'data-track-' )` to find exactly the tracking attributes to remove, then removes each one with `remove_attribute()`, and finally returns the modified markup with `get_updated_html()`.", + "confidence": 96 + }, + { + "id": "T11-strip-tracking-attributes", + "trial": 3, + "ok": true, + "code": "next_tag() ) {\n $attributes = $processor->get_attribute_names_with_prefix( 'data-track-' );\n\n if ( null === $attributes ) {\n continue;\n }\n\n foreach ( $attributes as $attribute_name ) {\n $processor->remove_attribute( $attribute_name );\n }\n }\n\n return $processor->get_updated_html();\n}\n", + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat scan of every tag in the input, because the task is purely attribute-based and does not require tree awareness. For each matched tag, it calls the documented `get_attribute_names_with_prefix( 'data-track-' )` to find only attributes whose names start with that exact prefix, removes each one with `remove_attribute()`, and returns the modified markup with `get_updated_html()`.", + "confidence": 96 + }, + { + "id": "T12-unwrap-spans", + "trial": 1, + "ok": true, + "code": "next_token() ) {\n if ( 'SPAN' === $processor->get_tag() ) {\n continue;\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return '';\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment, then walks every token with `next_token()` and rebuilds normalized HTML with `serialize_token()`, skipping any token whose tag is `SPAN`. Skipping both the opening and closing `SPAN` tokens unwraps spans while preserving and normalizing all intervening content, and `get_last_error()` is checked to avoid returning a partial rewrite after unsupported markup aborts parsing.", + "confidence": 90 + }, + { + "id": "T12-unwrap-spans", + "trial": 2, + "ok": true, + "code": "next_token() ) {\n if ( 'SPAN' === $processor->get_tag() ) {\n continue;\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return $html;\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment, walks every token with `next_token()`, skips any token whose tag name is `SPAN` so both span openers and closers are removed, and appends every other token via `serialize_token()`. That produces normalized HTML while preserving all non-span content in place, including nested spans.", + "confidence": 89 + }, + { + "id": "T12-unwrap-spans", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n if ( 'SPAN' === $processor->get_tag() ) {\n continue;\n }\n\n $output .= $processor->serialize_token();\n }\n\n if ( null !== $processor->get_last_error() ) {\n return '';\n }\n\n return $output;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body-fragment HTML, then walks every token with `next_token()`. For each token, it skips both opening and closing `SPAN` tags by checking `get_tag()`, and appends every other token’s normalized serialization via `serialize_token()`. That produces normalized HTML while preserving the contents that were inside removed spans.", + "confidence": 90 + } + ] +} diff --git a/doc-experiment/results/round-43/round-metadata.json b/doc-experiment/results/round-43/round-metadata.json new file mode 100644 index 0000000000000..78c3f033e013c --- /dev/null +++ b/doc-experiment/results/round-43/round-metadata.json @@ -0,0 +1,333 @@ +{ + "round": "round-43", + "mode": "scored-train", + "task_ids": [ + "N03-first-list-count", + "N04-normalize-or-placeholder", + "N06-extract-toc", + "T01-add-image-class", + "T02-link-targets", + "T03-first-h1-text", + "T04-build-figure", + "T05-text-excerpt", + "T06-collect-links", + "T07-nested-lists", + "T08-table-extract", + "T09-mark-keyword", + "T10-last-h2", + "T11-strip-tracking-attributes", + "T12-unwrap-spans" + ], + "task_count": 15, + "splits": { + "train": 15 + }, + "concepts": { + "attributes": 3, + "classes": 1, + "normalization": 1, + "serialization": 2, + "text": 3, + "traversal": 5 + }, + "trials_per_task": 3, + "subject": { + "model": "gpt-5.4", + "reasoning_effort": "medium", + "service_tier": "priority" + }, + "judge": { + "model": "gpt-5.5", + "reasoning_effort": "xhigh", + "service_tier": "priority" + }, + "git_head": "27c764f6f0c68e20466d1489c46c34697e903555", + "git_status_short": "", + "source_file_digests": { + "ref": "27c764f6f0c68e20466d1489c46c34697e903555", + "algorithm": "sha256", + "php_behavior_fingerprint": "token_get_all with T_COMMENT, T_DOC_COMMENT, and T_WHITESPACE removed; JSON-encoded token names and text", + "files": { + "src/wp-includes/html-api/class-wp-html-tag-processor.php": { + "source_sha256": "9e431d345cc6d5563a65e3456b8a944fe1def8ceb8db299e89f20e8e9f560058", + "php_without_comments_sha256": "d073f5772a64826878e4132ed9bb30717fb1b6f3ddb5e23a014a4fa9db218ad7", + "php_without_comments_token_count": 9881 + }, + "src/wp-includes/html-api/class-wp-html-processor.php": { + "source_sha256": "74724f1a228f65ed967dfa42def5ab6e70bfb0e36c0521d1f7649827e95b12ff", + "php_without_comments_sha256": "2da9aa482f295c89a7018c5b74b622b4add7e942599c97785434408cfa72c083", + "php_without_comments_token_count": 16806 + } + } + }, + "corpus_file_digests": { + "ref": "27c764f6f0c68e20466d1489c46c34697e903555", + "algorithm": "sha256", + "tasks": { + "N03-first-list-count": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/N03-first-list-count/task.md": "a928648fe82d23863b2e1f8f0a1654ef97e6cb8fb579f3d37fdc0712caaa0082", + "doc-experiment/corpus/N03-first-list-count/reference.php": "588468ac5c4009fd1a1a27a530efa208e1ce08025d34a0e4353ea5b976d705ba", + "doc-experiment/corpus/N03-first-list-count/tests.json": "e0467ccc5475d2c87cc11eb54b4f2cb40fe4d44ef82c761f745d1cca91101314" + } + }, + "N04-normalize-or-placeholder": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "normalization", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/N04-normalize-or-placeholder/task.md": "cef7cd04d639b6b7b435be2d33f3435b8b76f0a4d83748aa7c232204746ad3d0", + "doc-experiment/corpus/N04-normalize-or-placeholder/reference.php": "e47feb6d3be887e1ee5df77e39160ffe812cb8322ca8600898de2a34e56ddeed", + "doc-experiment/corpus/N04-normalize-or-placeholder/tests.json": "417d2f770b341fc46bb8b7b46df899f1fd1e65d343aa8cc6c4e39307afcb4d18" + } + }, + "N06-extract-toc": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/N06-extract-toc/task.md": "ba341a3927bb81410b8662b29f479f23a0887d4ec6f19db95702c1346d670581", + "doc-experiment/corpus/N06-extract-toc/reference.php": "7e5cf569519dc0411da5659e473fc773d82eb634886b5623610b696936c0b2e2", + "doc-experiment/corpus/N06-extract-toc/tests.json": "22f8950aace5d22a842a8c76df372d5d09e770cf4adefb06099eb81f7410317e" + } + }, + "T01-add-image-class": { + "labels": { + "split": "train", + "role": "smoke", + "commonness": "high", + "concept": "classes", + "processor": "tag" + }, + "files": { + "doc-experiment/corpus/T01-add-image-class/task.md": "d3b65e68db2bf26d3423b5474c3adb49b592375afc8005c127d4d20eb2740e28", + "doc-experiment/corpus/T01-add-image-class/reference.php": "8c9cf2ac9194dfce36a45c2f987d7bbfcdb334020aa94a86fb388bb1ce28171f", + "doc-experiment/corpus/T01-add-image-class/tests.json": "0ab7c7527432604d31e5e8d57e810a0167730a111c018d8a05a1bbad24617787" + } + }, + "T02-link-targets": { + "labels": { + "split": "train", + "role": "smoke", + "commonness": "high", + "concept": "attributes", + "processor": "tag" + }, + "files": { + "doc-experiment/corpus/T02-link-targets/task.md": "86f539bd48a600db4d47492e2abdf44d3048ba0463bcca0f3ed628228f82d0f8", + "doc-experiment/corpus/T02-link-targets/reference.php": "8942a4ef55241d6d1deda51046430c3d8cd7c4ca716c8bc79bdf9aa723fdb6f6", + "doc-experiment/corpus/T02-link-targets/tests.json": "c21515f7d43962b86975ccee1b3a39747170edc75ec1af26416494d312870a4a" + } + }, + "T03-first-h1-text": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T03-first-h1-text/task.md": "66a882aef20a6af4430fb6a7eafc54ba306adcf982980967e07eba7881fc0030", + "doc-experiment/corpus/T03-first-h1-text/reference.php": "a64806dcd5f7f91e254c287aad72401502aec44a7661eb691b51e56e17064f2d", + "doc-experiment/corpus/T03-first-h1-text/tests.json": "499c5da8b3d916a6739142c6ed164e8a6a05c43a6c517003bcb31d6929094533" + } + }, + "T04-build-figure": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "attributes", + "processor": "tag" + }, + "files": { + "doc-experiment/corpus/T04-build-figure/task.md": "69c807b773b41e98c0e5122bee32eca3780ccdfd123279590b4d098f8b7848e1", + "doc-experiment/corpus/T04-build-figure/reference.php": "973bcd94265496fe52fe7f9b895c266b4ae91c99460abbfd9088f1931a5d590e", + "doc-experiment/corpus/T04-build-figure/tests.json": "45cea68d9b4606600a7d57cd46e84348edb5599ec43a1fec44002424a0820d3a" + } + }, + "T05-text-excerpt": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T05-text-excerpt/task.md": "5e69766456a6343f99ee24037574ed3f9bdf7795ab8be9a61203f6254b8fa2de", + "doc-experiment/corpus/T05-text-excerpt/reference.php": "1081c297e36fa566e8a6a2cf8098fe8d3117da8ac49d4106691b9bb65b3739b6", + "doc-experiment/corpus/T05-text-excerpt/tests.json": "8b2276d754c887d6246f321b2dfad0de59718bc252c0a6bed52b65ae1bad1496" + } + }, + "T06-collect-links": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T06-collect-links/task.md": "29bc31f664b27ca880bf5cdd6a50d0da8f3828cf59d085d2897df2cb987d7d3e", + "doc-experiment/corpus/T06-collect-links/reference.php": "c0e5ae9ec5d4448a525962f1e27250a7243541d613209e3bbdc86913ff7e7a81", + "doc-experiment/corpus/T06-collect-links/tests.json": "34460376a147fa18739e7743d341479489f46a88b58b9f5aa5fe0d901f2b3140" + } + }, + "T07-nested-lists": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T07-nested-lists/task.md": "481851a0b94a81599154a82cc309e4c0e6e6204ab4270a98faa8bb60fe1401b3", + "doc-experiment/corpus/T07-nested-lists/reference.php": "f082fca4f2419435d0d8fe275068a0d486bda2359a668dc969b6cddf78100a61", + "doc-experiment/corpus/T07-nested-lists/tests.json": "79632d22e7c44696e5779df8ea2351c36e335c20aa316437a8edc885d20c40cd" + } + }, + "T08-table-extract": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "traversal", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T08-table-extract/task.md": "d7aee8a4765781a8f7c36b976f914b495b70989817ba0612aa9fc4589bedfeee", + "doc-experiment/corpus/T08-table-extract/reference.php": "289b8c6d1b69d4ef4dd7810dbdcba35445c30a7f26b631a00835ef737b78709e", + "doc-experiment/corpus/T08-table-extract/tests.json": "cdec7559d63ee541f1b0dad34a4f9f0982ce66e58ddfea98ec482c425e1b2638" + } + }, + "T09-mark-keyword": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "serialization", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T09-mark-keyword/task.md": "5abba63d508cb56284ce7a02e374d35e4cdd45958d664174eb3002c13f1c3bce", + "doc-experiment/corpus/T09-mark-keyword/reference.php": "0141ececbe459f50729e657f59dd75e8b74911cd1d70940b8fd5072d2f1efb60", + "doc-experiment/corpus/T09-mark-keyword/tests.json": "946bb96c922fea63cda3c31b94840b05ffded9f2e1dfa4b1547c755b99cda0c5" + } + }, + "T10-last-h2": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "traversal", + "processor": "tag" + }, + "files": { + "doc-experiment/corpus/T10-last-h2/task.md": "285eca1aece8e63d9ab86d1d72319e1eef13f1e184a95216b373bbab22de2d0d", + "doc-experiment/corpus/T10-last-h2/reference.php": "dabda32a9cf4ffb8a1b6ca613feb0a87791619be3bdfd2502b93f5972ba454a5", + "doc-experiment/corpus/T10-last-h2/tests.json": "4b3be0b61681e587c88a2d2ae0b9dec88306dcdad5b0f41a9b3a6a5883aceb07" + } + }, + "T11-strip-tracking-attributes": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "attributes", + "processor": "tag" + }, + "files": { + "doc-experiment/corpus/T11-strip-tracking-attributes/task.md": "6f2bc727f767a27a8f5d6f7a5e56ebd075db494c444dc0bbc8318c1bf4f8715b", + "doc-experiment/corpus/T11-strip-tracking-attributes/reference.php": "6ab0dceded92febd085f447eafd1ded777bf7ad6d184c6dc05b265ec41f420f0", + "doc-experiment/corpus/T11-strip-tracking-attributes/tests.json": "65032bd058bfa4819f040c3d8f9590224684a6cfff28434ad6708ab1f6ce94fc" + } + }, + "T12-unwrap-spans": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "serialization", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T12-unwrap-spans/task.md": "e95fcebf3ce7e2fd9004c4ff82b66a77d712b81b1317f3d020c4c89ace25eb6b", + "doc-experiment/corpus/T12-unwrap-spans/reference.php": "93d6828161817ec9a0b90765d91e7dae1c48f3b3aac35f6cad6f785b1c715797", + "doc-experiment/corpus/T12-unwrap-spans/tests.json": "fcade4b9f46bbb824b0095f58b99d4c71c8b3e696093ab0d0ae25d666aae5f53" + } + } + } + }, + "created_at_utc": "2026-06-13T15:38:33+00:00", + "isolation": { + "scratch_contains": [ + "html-tag-processor.md", + "html-processor.md", + "tasks/.md" + ], + "subjects_must_not_read": [ + "reference.php", + "tests.json", + "source files", + "logs", + "plans", + "hypothesis docs" + ] + }, + "scratch": "/tmp/html-api-docs-eval/round-43", + "staged_task_files": [ + "tasks/N03-first-list-count.md", + "tasks/N04-normalize-or-placeholder.md", + "tasks/N06-extract-toc.md", + "tasks/T01-add-image-class.md", + "tasks/T02-link-targets.md", + "tasks/T03-first-h1-text.md", + "tasks/T04-build-figure.md", + "tasks/T05-text-excerpt.md", + "tasks/T06-collect-links.md", + "tasks/T07-nested-lists.md", + "tasks/T08-table-extract.md", + "tasks/T09-mark-keyword.md", + "tasks/T10-last-h2.md", + "tasks/T11-strip-tracking-attributes.md", + "tasks/T12-unwrap-spans.md" + ], + "scratch_isolation_check": "OK: /tmp/html-api-docs-eval/round-43 exposes 2 docs and 15 task prompt(s), with no forbidden files.", + "scratch_file_sha256": { + "html-processor.md": "852fa4613b5c99ae9fea547f6284eee27e4f459d7b38a0d4dec5080cc657b123", + "html-tag-processor.md": "8ca4ec32aa54723fe16a0a9f45386ca6ebeecde7a06575d8a782e53fb395a664", + "tasks/N03-first-list-count.md": "a928648fe82d23863b2e1f8f0a1654ef97e6cb8fb579f3d37fdc0712caaa0082", + "tasks/N04-normalize-or-placeholder.md": "cef7cd04d639b6b7b435be2d33f3435b8b76f0a4d83748aa7c232204746ad3d0", + "tasks/N06-extract-toc.md": "ba341a3927bb81410b8662b29f479f23a0887d4ec6f19db95702c1346d670581", + "tasks/T01-add-image-class.md": "d3b65e68db2bf26d3423b5474c3adb49b592375afc8005c127d4d20eb2740e28", + "tasks/T02-link-targets.md": "86f539bd48a600db4d47492e2abdf44d3048ba0463bcca0f3ed628228f82d0f8", + "tasks/T03-first-h1-text.md": "66a882aef20a6af4430fb6a7eafc54ba306adcf982980967e07eba7881fc0030", + "tasks/T04-build-figure.md": "69c807b773b41e98c0e5122bee32eca3780ccdfd123279590b4d098f8b7848e1", + "tasks/T05-text-excerpt.md": "5e69766456a6343f99ee24037574ed3f9bdf7795ab8be9a61203f6254b8fa2de", + "tasks/T06-collect-links.md": "29bc31f664b27ca880bf5cdd6a50d0da8f3828cf59d085d2897df2cb987d7d3e", + "tasks/T07-nested-lists.md": "481851a0b94a81599154a82cc309e4c0e6e6204ab4270a98faa8bb60fe1401b3", + "tasks/T08-table-extract.md": "d7aee8a4765781a8f7c36b976f914b495b70989817ba0612aa9fc4589bedfeee", + "tasks/T09-mark-keyword.md": "5abba63d508cb56284ce7a02e374d35e4cdd45958d664174eb3002c13f1c3bce", + "tasks/T10-last-h2.md": "285eca1aece8e63d9ab86d1d72319e1eef13f1e184a95216b373bbab22de2d0d", + "tasks/T11-strip-tracking-attributes.md": "6f2bc727f767a27a8f5d6f7a5e56ebd075db494c444dc0bbc8318c1bf4f8715b", + "tasks/T12-unwrap-spans.md": "e95fcebf3ce7e2fd9004c4ff82b66a77d712b81b1317f3d020c4c89ace25eb6b" + } +} diff --git a/doc-experiment/results/round-43/round-summary.json b/doc-experiment/results/round-43/round-summary.json new file mode 100644 index 0000000000000..b819cd6bbaa05 --- /dev/null +++ b/doc-experiment/results/round-43/round-summary.json @@ -0,0 +1,566 @@ +{ + "round_score": 98.18, + "core_score": 97.89, + "by_split": { + "train": 98.18 + }, + "by_concept": { + "attributes": 100.0, + "classes": 100.0, + "normalization": 100.0, + "serialization": 99.45, + "text": 92.41, + "traversal": 99.3 + }, + "tasks": { + "N03-first-list-count": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 11, + "total": 11, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 11, + "total": 11, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 11, + "total": 11, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html", + "split": "train" + } + }, + "N04-normalize-or-placeholder": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "normalization", + "processor": "html", + "split": "train" + } + }, + "N06-extract-toc": { + "score": 98.1, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 91, + "score": 97.3 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 92, + "score": 97.6 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 98, + "score": 99.4 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html", + "split": "train" + } + }, + "T01-add-image-class": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "smoke", + "commonness": "high", + "concept": "classes", + "processor": "tag", + "split": "train" + } + }, + "T02-link-targets": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "smoke", + "commonness": "high", + "concept": "attributes", + "processor": "tag", + "split": "train" + } + }, + "T03-first-h1-text": { + "score": 99.3, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 96, + "score": 98.8 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 97, + "score": 99.1 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html", + "split": "train" + } + }, + "T04-build-figure": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "attributes", + "processor": "tag", + "split": "train" + } + }, + "T05-text-excerpt": { + "score": 79.93, + "trials": [ + { + "trial": "trial-1", + "passed": 10, + "total": 10, + "adherence": 98, + "score": 99.4 + }, + { + "trial": "trial-2", + "passed": 2, + "total": 10, + "adherence": 90, + "score": 41.0 + }, + { + "trial": "trial-3", + "passed": 10, + "total": 10, + "adherence": 98, + "score": 99.4 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html", + "split": "train" + } + }, + "T06-collect-links": { + "score": 98.0, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 96, + "score": 98.8 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 90, + "score": 97.0 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 94, + "score": 98.2 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html", + "split": "train" + } + }, + "T07-nested-lists": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html", + "split": "train" + } + }, + "T08-table-extract": { + "score": 98.4, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 94, + "score": 98.2 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 98, + "score": 99.4 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 92, + "score": 97.6 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "traversal", + "processor": "html", + "split": "train" + } + }, + "T09-mark-keyword": { + "score": 99.1, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 97, + "score": 99.1 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 97, + "score": 99.1 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 97, + "score": 99.1 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "serialization", + "processor": "html", + "split": "train" + } + }, + "T10-last-h2": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 6, + "total": 6, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 6, + "total": 6, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 6, + "total": 6, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "traversal", + "processor": "tag", + "split": "train" + } + }, + "T11-strip-tracking-attributes": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "attributes", + "processor": "tag", + "split": "train" + } + }, + "T12-unwrap-spans": { + "score": 99.8, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 98, + "score": 99.4 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "serialization", + "processor": "html", + "split": "train" + } + } + }, + "round_metadata": { + "round": "round-43", + "mode": "scored-train", + "task_ids": [ + "N03-first-list-count", + "N04-normalize-or-placeholder", + "N06-extract-toc", + "T01-add-image-class", + "T02-link-targets", + "T03-first-h1-text", + "T04-build-figure", + "T05-text-excerpt", + "T06-collect-links", + "T07-nested-lists", + "T08-table-extract", + "T09-mark-keyword", + "T10-last-h2", + "T11-strip-tracking-attributes", + "T12-unwrap-spans" + ], + "task_count": 15, + "trials_per_task": 3, + "subject": { + "model": "gpt-5.4", + "reasoning_effort": "medium", + "service_tier": "priority" + }, + "judge": { + "model": "gpt-5.5", + "reasoning_effort": "xhigh", + "service_tier": "priority" + }, + "git_head": "27c764f6f0c68e20466d1489c46c34697e903555", + "git_status_short": "" + }, + "subject_isolation": { + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-43/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." + } +} diff --git a/doc-experiment/results/round-43/subject-isolation.json b/doc-experiment/results/round-43/subject-isolation.json new file mode 100644 index 0000000000000..7b67ba1a81606 --- /dev/null +++ b/doc-experiment/results/round-43/subject-isolation.json @@ -0,0 +1,19 @@ +{ + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-43/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." +} diff --git a/doc-experiment/results/round-44/N06-extract-toc/judge.json b/doc-experiment/results/round-44/N06-extract-toc/judge.json new file mode 100644 index 0000000000000..b55d0c0c1d646 --- /dev/null +++ b/doc-experiment/results/round-44/N06-extract-toc/judge.json @@ -0,0 +1,45 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Correctly chose WP_HTML_Processor::create_fragment for body-fragment structural parsing. Every HTML API method used is documented. The depth-bounded next_token subtree walk with a #text guard and get_modifiable_text follows the documented DOM-style text recipe. The is_tag_closer check after plain next_tag is redundant because next_tag skips closers by default, but harmless." + }, + { + "trial_id": "trial-2", + "adherence": 92, + "hallucinated_methods": [], + "notes": "Correct processor choice and no undocumented API calls. The single next_token loop with opener/closer state is a documented pattern and handles virtual closers, empty headings, and implied closes. The weak spot is appending get_modifiable_text from non-heading tag opener tokens inside a heading; docs say ordinary subtree text should be only #text tokens unless special-element contents are explicitly desired. This would include TEXTAREA/TITLE decoded text and SCRIPT/STYLE raw text beyond the reference policy." + }, + { + "trial_id": "trial-3", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Near-reference implementation: correct processor, all methods documented, depth-bounded next_token walk, #text-only accumulation, decoded text via get_modifiable_text, and null create_fragment handling. The final get_last_error fallback is documented and conservative, but it can discard already-collected headings on unsupported markup and does not separately consider paused_at_incomplete_token." + } + ], + "failure_analysis": "No failed frozen/hidden cases: all three trials passed all 7 cases. The docs did well in the key places: 'Which processor should I use?' steered subjects away from the Tag Processor for structural text extraction; 'Recipe: collect DOM-style text from a subtree', next_token(), and get_current_depth() gave the depth-bounded #text accumulation pattern; get_tag() returning uppercase handled source case; next_token() describing virtual/implied closers covered '

                      One

                      Two'; and get_modifiable_text() documenting decoded #text handled '&'. Near-misses were Trial 2 over-applying the special-element modifiable-text passage despite the ordinary-text warning, and Trial 3 choosing an unsupported-markup fallback policy that is not clearly specified for read-only extraction tasks.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_modifiable_text() docblock", + "problem": "The docblock explains that special elements carry modifiable text on their opener, but readers can miss that this is not ordinary subtree text.", + "suggestion": "Add a warning and cross-reference: for DOM-style subtree extraction, guard on get_token_type() === '#text'; reading modifiable text from SCRIPT, STYLE, TITLE, or TEXTAREA openers is an explicit opt-in policy." + }, + { + "location": "WP_HTML_Processor::next_token() docblock, nested-loop guidance", + "problem": "The warning against nested next_token loops can seem to discourage the valid bounded-subtree walk shown elsewhere, while not spelling out the boundary between the two patterns.", + "suggestion": "Clarify when a bounded inner walk from a matched opener is safe versus when a single stateful loop is preferred, especially around whether the terminating token itself must be processed by the outer loop." + }, + { + "location": "WP_HTML_Processor::get_last_error() and create_fragment() docs", + "problem": "Unsupported-parser abort guidance is clearer for serialization and mutation than for read-only semantic extraction.", + "suggestion": "State that read-only scans may have partial results when get_last_error() becomes non-null, and callers must choose a contract-specific fallback such as partial results, null, empty result, or error." + }, + { + "location": "WP_HTML_Processor::next_tag() docblock", + "problem": "The default tag_closers behavior is documented in the parameter table, but redundant is_tag_closer guards after plain next_tag suggest it is easy to overlook.", + "suggestion": "Add a short sentence after the signature: plain next_tag() visits opener tags only; request tag_closers => 'visit' when closer tokens are part of the algorithm." + } + ] +} diff --git a/doc-experiment/results/round-44/N06-extract-toc/trial-1/candidate.php b/doc-experiment/results/round-44/N06-extract-toc/trial-1/candidate.php new file mode 100644 index 0000000000000..de704e9bee00d --- /dev/null +++ b/doc-experiment/results/round-44/N06-extract-toc/trial-1/candidate.php @@ -0,0 +1,38 @@ +next_tag() ) { + if ( $processor->is_tag_closer() ) { + continue; + } + + $tag = $processor->get_tag(); + if ( null === $tag || ! preg_match( '/^H([1-6])$/', $tag, $matches ) ) { + continue; + } + + $heading_depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $heading_depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + $toc[] = array( + 'level' => (int) $matches[1], + 'text' => $text, + ); + } + + return $toc; +} diff --git a/doc-experiment/results/round-44/N06-extract-toc/trial-1/execution.json b/doc-experiment/results/round-44/N06-extract-toc/trial-1/execution.json new file mode 100644 index 0000000000000..5171cc3296fc0 --- /dev/null +++ b/doc-experiment/results/round-44/N06-extract-toc/trial-1/execution.json @@ -0,0 +1,203 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-44/N06-extract-toc/trial-1/candidate.php", + "function": "extract_toc", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "basic-h1-h3", + "status": "pass", + "expected": [ + { + "level": 1, + "text": "Intro" + }, + { + "level": 3, + "text": "Details here" + } + ], + "actual": [ + { + "level": 1, + "text": "Intro" + }, + { + "level": 3, + "text": "Details here" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "all-heading-levels", + "status": "pass", + "expected": [ + { + "level": 1, + "text": "Title" + }, + { + "level": 2, + "text": "Section" + }, + { + "level": 3, + "text": "Subsection" + }, + { + "level": 4, + "text": "Minor" + }, + { + "level": 5, + "text": "Small" + }, + { + "level": 6, + "text": "Tiny" + } + ], + "actual": [ + { + "level": 1, + "text": "Title" + }, + { + "level": 2, + "text": "Section" + }, + { + "level": 3, + "text": "Subsection" + }, + { + "level": 4, + "text": "Minor" + }, + { + "level": 5, + "text": "Small" + }, + { + "level": 6, + "text": "Tiny" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-text-and-entities", + "status": "pass", + "expected": [ + { + "level": 2, + "text": "A B & C" + } + ], + "actual": [ + { + "level": 2, + "text": "A B & C" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-heading", + "status": "pass", + "expected": [ + { + "level": 2, + "text": "" + }, + { + "level": 3, + "text": "" + } + ], + "actual": [ + { + "level": 2, + "text": "" + }, + { + "level": 3, + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-insensitive-source", + "status": "pass", + "expected": [ + { + "level": 1, + "text": "Upper" + }, + { + "level": 6, + "text": "Lower" + } + ], + "actual": [ + { + "level": 1, + "text": "Upper" + }, + { + "level": 6, + "text": "Lower" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "implied-heading-close", + "status": "pass", + "expected": [ + { + "level": 2, + "text": "One" + }, + { + "level": 3, + "text": "Two" + } + ], + "actual": [ + { + "level": 2, + "text": "One" + }, + { + "level": 3, + "text": "Two" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-matches", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-44/N06-extract-toc/trial-1/response.json b/doc-experiment/results/round-44/N06-extract-toc/trial-1/response.json new file mode 100644 index 0000000000000..652a104c13ac2 --- /dev/null +++ b/doc-experiment/results/round-44/N06-extract-toc/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with tree awareness, then scans tags in document order with `next_tag()`. For each `H1` through `H6` opener, it records the heading depth with `get_current_depth()` and walks forward with `next_token()`, appending only descendant `#text` token content via `get_token_type()` and `get_modifiable_text()` until the walk leaves that heading subtree.", + "confidence": 92 +} diff --git a/doc-experiment/results/round-44/N06-extract-toc/trial-2/candidate.php b/doc-experiment/results/round-44/N06-extract-toc/trial-2/candidate.php new file mode 100644 index 0000000000000..23b256c0620cf --- /dev/null +++ b/doc-experiment/results/round-44/N06-extract-toc/trial-2/candidate.php @@ -0,0 +1,64 @@ + 1, + 'H2' => 2, + 'H3' => 3, + 'H4' => 4, + 'H5' => 5, + 'H6' => 6, + ); + + $toc = array(); + $current_heading = null; + + while ( $processor->next_token() ) { + $token_type = $processor->get_token_type(); + + if ( '#tag' === $token_type ) { + $tag = $processor->get_tag(); + + if ( null !== $tag && isset( $heading_levels[ $tag ] ) ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_heading && $current_heading['tag'] === $tag ) { + $toc[] = array( + 'level' => $current_heading['level'], + 'text' => $current_heading['text'], + ); + $current_heading = null; + } + } else { + $current_heading = array( + 'tag' => $tag, + 'level' => $heading_levels[ $tag ], + 'text' => '', + ); + } + + continue; + } + } + + if ( null === $current_heading ) { + continue; + } + + if ( '#text' === $token_type ) { + $current_heading['text'] .= $processor->get_modifiable_text(); + continue; + } + + if ( '#tag' === $token_type && ! $processor->is_tag_closer() ) { + $current_heading['text'] .= $processor->get_modifiable_text(); + } + } + + return $toc; +} diff --git a/doc-experiment/results/round-44/N06-extract-toc/trial-2/execution.json b/doc-experiment/results/round-44/N06-extract-toc/trial-2/execution.json new file mode 100644 index 0000000000000..2055861e31b39 --- /dev/null +++ b/doc-experiment/results/round-44/N06-extract-toc/trial-2/execution.json @@ -0,0 +1,203 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-44/N06-extract-toc/trial-2/candidate.php", + "function": "extract_toc", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "basic-h1-h3", + "status": "pass", + "expected": [ + { + "level": 1, + "text": "Intro" + }, + { + "level": 3, + "text": "Details here" + } + ], + "actual": [ + { + "level": 1, + "text": "Intro" + }, + { + "level": 3, + "text": "Details here" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "all-heading-levels", + "status": "pass", + "expected": [ + { + "level": 1, + "text": "Title" + }, + { + "level": 2, + "text": "Section" + }, + { + "level": 3, + "text": "Subsection" + }, + { + "level": 4, + "text": "Minor" + }, + { + "level": 5, + "text": "Small" + }, + { + "level": 6, + "text": "Tiny" + } + ], + "actual": [ + { + "level": 1, + "text": "Title" + }, + { + "level": 2, + "text": "Section" + }, + { + "level": 3, + "text": "Subsection" + }, + { + "level": 4, + "text": "Minor" + }, + { + "level": 5, + "text": "Small" + }, + { + "level": 6, + "text": "Tiny" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-text-and-entities", + "status": "pass", + "expected": [ + { + "level": 2, + "text": "A B & C" + } + ], + "actual": [ + { + "level": 2, + "text": "A B & C" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-heading", + "status": "pass", + "expected": [ + { + "level": 2, + "text": "" + }, + { + "level": 3, + "text": "" + } + ], + "actual": [ + { + "level": 2, + "text": "" + }, + { + "level": 3, + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-insensitive-source", + "status": "pass", + "expected": [ + { + "level": 1, + "text": "Upper" + }, + { + "level": 6, + "text": "Lower" + } + ], + "actual": [ + { + "level": 1, + "text": "Upper" + }, + { + "level": 6, + "text": "Lower" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "implied-heading-close", + "status": "pass", + "expected": [ + { + "level": 2, + "text": "One" + }, + { + "level": 3, + "text": "Two" + } + ], + "actual": [ + { + "level": 2, + "text": "One" + }, + { + "level": 3, + "text": "Two" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-matches", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-44/N06-extract-toc/trial-2/response.json b/doc-experiment/results/round-44/N06-extract-toc/trial-2/response.json new file mode 100644 index 0000000000000..2aa39be923f4e --- /dev/null +++ b/doc-experiment/results/round-44/N06-extract-toc/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, then walks tokens in document order with `next_token()`. It detects `H1` through `H6` opening and closing tags via `get_tag()` and `is_tag_closer()`, accumulates heading text from descendant `#text` tokens with `get_modifiable_text()`, and also includes modifiable text carried on special element opener tokens when they appear inside a heading.", + "confidence": 88 +} diff --git a/doc-experiment/results/round-44/N06-extract-toc/trial-3/candidate.php b/doc-experiment/results/round-44/N06-extract-toc/trial-3/candidate.php new file mode 100644 index 0000000000000..50a8660c9233e --- /dev/null +++ b/doc-experiment/results/round-44/N06-extract-toc/trial-3/candidate.php @@ -0,0 +1,40 @@ +next_tag() ) { + $tag = $processor->get_tag(); + + if ( ! in_array( $tag, $heading_tags, true ) ) { + continue; + } + + $heading_depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $heading_depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + $toc[] = array( + 'level' => (int) substr( $tag, 1, 1 ), + 'text' => $text, + ); + } + + if ( null !== $processor->get_last_error() ) { + return array(); + } + + return $toc; +} diff --git a/doc-experiment/results/round-44/N06-extract-toc/trial-3/execution.json b/doc-experiment/results/round-44/N06-extract-toc/trial-3/execution.json new file mode 100644 index 0000000000000..13d51efd3bb14 --- /dev/null +++ b/doc-experiment/results/round-44/N06-extract-toc/trial-3/execution.json @@ -0,0 +1,203 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-44/N06-extract-toc/trial-3/candidate.php", + "function": "extract_toc", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "basic-h1-h3", + "status": "pass", + "expected": [ + { + "level": 1, + "text": "Intro" + }, + { + "level": 3, + "text": "Details here" + } + ], + "actual": [ + { + "level": 1, + "text": "Intro" + }, + { + "level": 3, + "text": "Details here" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "all-heading-levels", + "status": "pass", + "expected": [ + { + "level": 1, + "text": "Title" + }, + { + "level": 2, + "text": "Section" + }, + { + "level": 3, + "text": "Subsection" + }, + { + "level": 4, + "text": "Minor" + }, + { + "level": 5, + "text": "Small" + }, + { + "level": 6, + "text": "Tiny" + } + ], + "actual": [ + { + "level": 1, + "text": "Title" + }, + { + "level": 2, + "text": "Section" + }, + { + "level": 3, + "text": "Subsection" + }, + { + "level": 4, + "text": "Minor" + }, + { + "level": 5, + "text": "Small" + }, + { + "level": 6, + "text": "Tiny" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-text-and-entities", + "status": "pass", + "expected": [ + { + "level": 2, + "text": "A B & C" + } + ], + "actual": [ + { + "level": 2, + "text": "A B & C" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-heading", + "status": "pass", + "expected": [ + { + "level": 2, + "text": "" + }, + { + "level": 3, + "text": "" + } + ], + "actual": [ + { + "level": 2, + "text": "" + }, + { + "level": 3, + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-insensitive-source", + "status": "pass", + "expected": [ + { + "level": 1, + "text": "Upper" + }, + { + "level": 6, + "text": "Lower" + } + ], + "actual": [ + { + "level": 1, + "text": "Upper" + }, + { + "level": 6, + "text": "Lower" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "implied-heading-close", + "status": "pass", + "expected": [ + { + "level": 2, + "text": "One" + }, + { + "level": 3, + "text": "Two" + } + ], + "actual": [ + { + "level": 2, + "text": "One" + }, + { + "level": 3, + "text": "Two" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-matches", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-44/N06-extract-toc/trial-3/response.json b/doc-experiment/results/round-44/N06-extract-toc/trial-3/response.json new file mode 100644 index 0000000000000..635d887911858 --- /dev/null +++ b/doc-experiment/results/round-44/N06-extract-toc/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because heading text extraction depends on document structure. It scans heading openers with `next_tag()`, identifies `H1` through `H6` via `get_tag()`, then walks each heading subtree with `next_token()` while the depth from `get_current_depth()` stays inside that heading, appending only `#text` token content from `get_modifiable_text()` so nested markup contributes text but not tags.", + "confidence": 90 +} diff --git a/doc-experiment/results/round-44/T03-first-h1-text/judge.json b/doc-experiment/results/round-44/T03-first-h1-text/judge.json new file mode 100644 index 0000000000000..befd21e3f0648 --- /dev/null +++ b/doc-experiment/results/round-44/T03-first-h1-text/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), next_tag('H1'), depth-bounded next_token() walking, #text filtering, and get_modifiable_text() exactly as documented for subtree text extraction. All called methods appear in the rendered docs and execution recorded no misuse." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same fully documented, idiomatic approach as the reference: HTML Processor fragment parsing, first H1 match, subtree walk guarded by get_current_depth() >= opener depth, and decoded #text accumulation. No undocumented API or _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 91, + "hallucinated_methods": [], + "notes": "Correct processor and all methods are documented. The main #text walk is idiomatic, but the extra branch appending get_modifiable_text() from every non-closing #tag over-applies the special-element guidance. It is harmless for ordinary inline tags and passed the hidden cases, but would include SCRIPT/STYLE/TEXTAREA/TITLE opener text when the ordinary subtree-text recipe says to include only #text tokens unless the caller explicitly opts in." + } + ], + "failure_analysis": "All trials passed all 8 frozen cases, so there were no failed hidden cases to attribute. The docs worked well because they directly exposed the needed pattern: choose WP_HTML_Processor for tree-aware text extraction, create a BODY fragment with create_fragment(), find the first element with next_tag(), record get_current_depth(), walk with next_token(), keep the guard as >=, and append only #text tokens via get_modifiable_text(). The next_token/get_current_depth docs also explain virtual closers and malformed input well enough for the unclosed-h1 case, and get_modifiable_text() clearly states that ordinary #text is already decoded, explaining the entity case. The only near-miss was trial-3: it noticed that special elements carry modifiable text on opener tokens and generalized that into a generic #tag branch. A read-only probe shows the risk: for

                      AC

                      , the reference-style #text walk returns \"AC\" while trial-3 returns \"ABC\"; for TEXTAREA it similarly appends opener text. The rendered overview recipe explicitly warns against this, but the next_token and get_modifiable_text method sections can still be read in isolation as encouragement to add opener-token text during subtree extraction.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::next_token() docblock, special-element exception", + "problem": "The special-element paragraph says to read SCRIPT/STYLE/TITLE/TEXTAREA text from the opening token, but does not locally restate that this is an opt-in policy, not part of ordinary subtree #text extraction.", + "suggestion": "Add a sentence such as: \"Do this only when the caller explicitly wants those special-element contents; a generic DOM-style text-node walk should still append only #text tokens.\" Also mention SCRIPT/STYLE are raw, not decoded." + }, + { + "location": "WP_HTML_Processor::get_modifiable_text() docblock", + "problem": "The method explains that many token kinds can carry modifiable text, but the method section itself does not strongly warn that get_modifiable_text() is not a predicate for ordinary text content.", + "suggestion": "Add a warning that ordinary text extraction should first check get_token_type() === '#text'; comments, processing instructions, raw-text elements, and special opener tokens require explicit whitelisting." + }, + { + "location": "HTML Processor text-extraction examples", + "problem": "The successful recipe is in the overview, while method-level readers may jump straight to next_token() or get_modifiable_text() and miss the default-vs-opt-in distinction.", + "suggestion": "Cross-link those method docs back to the \"collect DOM-style text from a subtree\" recipe, using wording that distinguishes ordinary text-node content from special-element modifiable text." + } + ] +} diff --git a/doc-experiment/results/round-44/T03-first-h1-text/trial-1/candidate.php b/doc-experiment/results/round-44/T03-first-h1-text/trial-1/candidate.php new file mode 100644 index 0000000000000..6678102356346 --- /dev/null +++ b/doc-experiment/results/round-44/T03-first-h1-text/trial-1/candidate.php @@ -0,0 +1,23 @@ +next_tag( 'H1' ) ) { + return null; + } + + $h1_depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-44/T03-first-h1-text/trial-1/execution.json b/doc-experiment/results/round-44/T03-first-h1-text/trial-1/execution.json new file mode 100644 index 0000000000000..dad089f9a21d0 --- /dev/null +++ b/doc-experiment/results/round-44/T03-first-h1-text/trial-1/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-44/T03-first-h1-text/trial-1/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-44/T03-first-h1-text/trial-1/response.json b/doc-experiment/results/round-44/T03-first-h1-text/trial-1/response.json new file mode 100644 index 0000000000000..b15fa9750356e --- /dev/null +++ b/doc-experiment/results/round-44/T03-first-h1-text/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation, then finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, and walks its subtree with `next_token()`, concatenating only `#text` tokens via `get_token_type()` and `get_modifiable_text()`. That yields decoded text content from nested elements while ignoring markup-only content like images.", + "confidence": 94 +} diff --git a/doc-experiment/results/round-44/T03-first-h1-text/trial-2/candidate.php b/doc-experiment/results/round-44/T03-first-h1-text/trial-2/candidate.php new file mode 100644 index 0000000000000..01f92158d50f2 --- /dev/null +++ b/doc-experiment/results/round-44/T03-first-h1-text/trial-2/candidate.php @@ -0,0 +1,22 @@ +next_tag( 'H1' ) ) { + return null; + } + + $h1_depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-44/T03-first-h1-text/trial-2/execution.json b/doc-experiment/results/round-44/T03-first-h1-text/trial-2/execution.json new file mode 100644 index 0000000000000..b446bc41c1cf9 --- /dev/null +++ b/doc-experiment/results/round-44/T03-first-h1-text/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-44/T03-first-h1-text/trial-2/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-44/T03-first-h1-text/trial-2/response.json b/doc-experiment/results/round-44/T03-first-h1-text/trial-2/response.json new file mode 100644 index 0000000000000..6b5bd12868c25 --- /dev/null +++ b/doc-experiment/results/round-44/T03-first-h1-text/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation, finds the first `H1` with `next_tag()`, then walks that element\u2019s subtree with `next_token()` while the depth stays within the `H1`, concatenating only `#text` tokens via `get_modifiable_text()` so nested markup contributes no markup and character references are already decoded.", + "confidence": 94 +} diff --git a/doc-experiment/results/round-44/T03-first-h1-text/trial-3/candidate.php b/doc-experiment/results/round-44/T03-first-h1-text/trial-3/candidate.php new file mode 100644 index 0000000000000..8f4e0f6bd79da --- /dev/null +++ b/doc-experiment/results/round-44/T03-first-h1-text/trial-3/candidate.php @@ -0,0 +1,28 @@ +next_tag( 'H1' ) ) { + return null; + } + + $h1_depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + continue; + } + + if ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-44/T03-first-h1-text/trial-3/execution.json b/doc-experiment/results/round-44/T03-first-h1-text/trial-3/execution.json new file mode 100644 index 0000000000000..8aea4debc1400 --- /dev/null +++ b/doc-experiment/results/round-44/T03-first-h1-text/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-44/T03-first-h1-text/trial-3/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-44/T03-first-h1-text/trial-3/response.json b/doc-experiment/results/round-44/T03-first-h1-text/trial-3/response.json new file mode 100644 index 0000000000000..8a057c333e49d --- /dev/null +++ b/doc-experiment/results/round-44/T03-first-h1-text/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is tree-aware, finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, then walks its subtree with `next_token()`. It concatenates decoded text from `#text` tokens via `get_modifiable_text()`, and also appends modifiable text carried on special opening-tag tokens so nested text-only elements are included.", + "confidence": 88 +} diff --git a/doc-experiment/results/round-44/T05-text-excerpt/judge.json b/doc-experiment/results/round-44/T05-text-excerpt/judge.json new file mode 100644 index 0000000000000..c00168ab19183 --- /dev/null +++ b/doc-experiment/results/round-44/T05-text-excerpt/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 93, + "hallucinated_methods": [], + "notes": "Primary processor choice is correct: `WP_HTML_Processor::create_fragment()` plus `next_token()` for text-bearing tokens. All HTML API calls are documented and no `_doing_it_wrong` records appeared. Small penalty for the `WP_HTML_Tag_Processor` fallback after HTML Processor errors: it is documented, but the docs warn that Tag Processor token walking is lexical and not equivalent to DOM-style fragment text extraction." + }, + { + "trial_id": "trial-2", + "adherence": 99, + "hallucinated_methods": [], + "notes": "Best adherence. Uses the documented HTML Processor fragment factory, a single `next_token()` walk, `#text` filtering, and explicit `TITLE`/`TEXTAREA` opener handling through decoded `get_modifiable_text()`. All called API methods are present in the rendered docs. Minor residual gap: no explicit post-walk unsupported-parser policy, though this task did not require rejecting unsupported input." + }, + { + "trial_id": "trial-3", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Correct documented API usage throughout: HTML Processor fragment parsing, token walking, special-element whitelist, decoded text, and `get_last_error()`. The conservative empty-string return on later parser error is a reasonable documented policy, but it is not clearly required by the task; it also collects the full text before truncating, which is less idiomatic for bounded excerpts but not an API misuse." + } + ], + "failure_analysis": "No hidden case failed in any trial: all three passed 10/10, with empty `doing_it_wrong` records. The docs did well at steering subjects to `WP_HTML_Processor::create_fragment()` for BODY fragments, `next_token()` instead of tag-only walking, `#text` checks before calling `get_modifiable_text()`, and the special rule that `TITLE` and `TEXTAREA` carry decoded text on opener tokens while `SCRIPT` and `STYLE` should not be included by default. The main near-miss was trial-1’s belief that a `WP_HTML_Tag_Processor` fallback applies the same token rules after an HTML Processor abort. That did not fail these tests, but it would change semantics for malformed or structurally significant HTML because the Tag Processor is lexical and lacks BODY-fragment parsing, implied elements, virtual closers, breadcrumbs, and tree order guarantees.", + "doc_gaps": [ + { + "location": "html-processor.md: Recipe: collect DOM-style text from a subtree", + "problem": "The recipe explains ordinary text extraction and special-element opt-in well, but it does not explicitly state the fallback policy for read-only extractors when `get_last_error()` becomes non-null.", + "suggestion": "Add a short policy note: after an unsupported-parser abort, any accumulated read-only extraction is partial; callers should deliberately choose partial output, empty/null, original input, or a clearly lexical fallback." + }, + { + "location": "html-tag-processor.md: Tokens and finer-grained processing", + "problem": "The docs say Tag Processor token walking is lexical, but the warning could be missed when users look for a fallback after HTML Processor unsupported markup.", + "suggestion": "Add an explicit warning that a Tag Processor fallback is not semantically equivalent to an HTML Processor text walk: it does not perform BODY-fragment parsing, implied closing, virtual closers, or tree-aware traversal." + }, + { + "location": "html-processor.md: create_fragment() / HTML Support", + "problem": "`create_fragment()` null creation failure and later `get_last_error()` aborts are documented separately, but examples focus more on mutation/serialization than read-only extraction.", + "suggestion": "Add a general read-only walking note distinguishing factory failure from mid-walk abort, and explain that text/token results collected before an abort are only a caller-defined best-effort result." + } + ] +} diff --git a/doc-experiment/results/round-44/T05-text-excerpt/trial-1/candidate.php b/doc-experiment/results/round-44/T05-text-excerpt/trial-1/candidate.php new file mode 100644 index 0000000000000..a63909aadb3e6 --- /dev/null +++ b/doc-experiment/results/round-44/T05-text-excerpt/trial-1/candidate.php @@ -0,0 +1,107 @@ += $max_codepoints ) { + return $used >= $max_codepoints; + } + + $remaining = $max_codepoints - $used; + $length = $codepoint_length( $text ); + + if ( $length <= $remaining ) { + $excerpt .= $text; + $used += $length; + return false; + } + + $excerpt .= $codepoint_substr( $text, 0, $remaining ); + $used = $max_codepoints; + return true; + }; + + $processor = WP_HTML_Processor::create_fragment( $html ); + + if ( null !== $processor ) { + while ( $processor->next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + if ( $append_text( $processor->get_modifiable_text() ) ) { + return $excerpt; + } + + continue; + } + + if ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) { + $name = $processor->get_token_name(); + if ( 'TITLE' === $name || 'TEXTAREA' === $name ) { + if ( $append_text( $processor->get_modifiable_text() ) ) { + return $excerpt; + } + } + } + } + + if ( null === $processor->get_last_error() ) { + return $excerpt; + } + } + + $excerpt = ''; + $used = 0; + $tags = new WP_HTML_Tag_Processor( $html ); + + while ( $tags->next_token() ) { + if ( '#text' === $tags->get_token_type() ) { + if ( $append_text( $tags->get_modifiable_text() ) ) { + return $excerpt; + } + + continue; + } + + if ( '#tag' === $tags->get_token_type() && ! $tags->is_tag_closer() ) { + $name = $tags->get_token_name(); + if ( 'TITLE' === $name || 'TEXTAREA' === $name ) { + if ( $append_text( $tags->get_modifiable_text() ) ) { + return $excerpt; + } + } + } + } + + return $excerpt; +} diff --git a/doc-experiment/results/round-44/T05-text-excerpt/trial-1/execution.json b/doc-experiment/results/round-44/T05-text-excerpt/trial-1/execution.json new file mode 100644 index 0000000000000..0719980930dd7 --- /dev/null +++ b/doc-experiment/results/round-44/T05-text-excerpt/trial-1/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-44/T05-text-excerpt/trial-1/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-44/T05-text-excerpt/trial-1/response.json b/doc-experiment/results/round-44/T05-text-excerpt/trial-1/response.json new file mode 100644 index 0000000000000..6b3cea5d70080 --- /dev/null +++ b/doc-experiment/results/round-44/T05-text-excerpt/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` plus `next_token()`, `get_token_type()`, `get_token_name()`, `is_tag_closer()`, and `get_modifiable_text()` to collect ordinary `#text` tokens and the decoded opener text for `TITLE` and `TEXTAREA`, then truncates by Unicode code points with UTF-8-aware slicing. If the HTML Processor aborts on unsupported markup, it falls back to a best-effort lexical scan with `WP_HTML_Tag_Processor` using the same token rules.", + "confidence": 86 +} diff --git a/doc-experiment/results/round-44/T05-text-excerpt/trial-2/candidate.php b/doc-experiment/results/round-44/T05-text-excerpt/trial-2/candidate.php new file mode 100644 index 0000000000000..7c1835f146fef --- /dev/null +++ b/doc-experiment/results/round-44/T05-text-excerpt/trial-2/candidate.php @@ -0,0 +1,44 @@ + 0 && $processor->next_token() ) { + $chunk = ''; + + if ( '#text' === $processor->get_token_type() ) { + $chunk = $processor->get_modifiable_text(); + } elseif ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) { + $tag_name = $processor->get_token_name(); + if ( 'TEXTAREA' === $tag_name || 'TITLE' === $tag_name ) { + $chunk = $processor->get_modifiable_text(); + } + } + + if ( '' === $chunk ) { + continue; + } + + $chunk_length = mb_strlen( $chunk, 'UTF-8' ); + if ( $chunk_length <= $remaining ) { + $excerpt .= $chunk; + $remaining -= $chunk_length; + continue; + } + + $excerpt .= mb_substr( $chunk, 0, $remaining, 'UTF-8' ); + break; + } + + return $excerpt; +} diff --git a/doc-experiment/results/round-44/T05-text-excerpt/trial-2/execution.json b/doc-experiment/results/round-44/T05-text-excerpt/trial-2/execution.json new file mode 100644 index 0000000000000..f7dd9ddf175a1 --- /dev/null +++ b/doc-experiment/results/round-44/T05-text-excerpt/trial-2/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-44/T05-text-excerpt/trial-2/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-44/T05-text-excerpt/trial-2/response.json b/doc-experiment/results/round-44/T05-text-excerpt/trial-2/response.json new file mode 100644 index 0000000000000..bbcb30578e62c --- /dev/null +++ b/doc-experiment/results/round-44/T05-text-excerpt/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-context fragment safely, then walks tokens with `next_token()`. It appends decoded text from ordinary `#text` tokens plus decoded opener-carried text from `TITLE` and `TEXTAREA`, while naturally excluding `SCRIPT`, `STYLE`, comments, and other non-text tokens. Truncation is done by Unicode code points with `mb_strlen()` and `mb_substr()` using explicit `UTF-8`, as documented.", + "confidence": 91 +} diff --git a/doc-experiment/results/round-44/T05-text-excerpt/trial-3/candidate.php b/doc-experiment/results/round-44/T05-text-excerpt/trial-3/candidate.php new file mode 100644 index 0000000000000..05c9fa4591e9f --- /dev/null +++ b/doc-experiment/results/round-44/T05-text-excerpt/trial-3/candidate.php @@ -0,0 +1,39 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + continue; + } + + if ( '#tag' !== $processor->get_token_type() || $processor->is_tag_closer() ) { + continue; + } + + $token_name = $processor->get_token_name(); + if ( 'TEXTAREA' === $token_name || 'TITLE' === $token_name ) { + $text .= $processor->get_modifiable_text(); + } + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + if ( mb_strlen( $text, 'UTF-8' ) <= $max_codepoints ) { + return $text; + } + + return mb_substr( $text, 0, $max_codepoints, 'UTF-8' ); +} diff --git a/doc-experiment/results/round-44/T05-text-excerpt/trial-3/execution.json b/doc-experiment/results/round-44/T05-text-excerpt/trial-3/execution.json new file mode 100644 index 0000000000000..0ce8bfa14a102 --- /dev/null +++ b/doc-experiment/results/round-44/T05-text-excerpt/trial-3/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-44/T05-text-excerpt/trial-3/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-44/T05-text-excerpt/trial-3/response.json b/doc-experiment/results/round-44/T05-text-excerpt/trial-3/response.json new file mode 100644 index 0000000000000..f17163365630e --- /dev/null +++ b/doc-experiment/results/round-44/T05-text-excerpt/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the input as a BODY fragment, then walks every token with `next_token()`, appending decoded text from ordinary `#text` tokens plus the documented special text-bearing opener tokens `TITLE` and `TEXTAREA` via `get_modifiable_text()`, while naturally excluding `SCRIPT` and `STYLE`. It then truncates by Unicode code points with `mb_substr(..., 'UTF-8')` after checking the requested limit.", + "confidence": 90 +} diff --git a/doc-experiment/results/round-44/T06-collect-links/judge.json b/doc-experiment/results/round-44/T06-collect-links/judge.json new file mode 100644 index 0000000000000..0728aca43f05a --- /dev/null +++ b/doc-experiment/results/round-44/T06-collect-links/judge.json @@ -0,0 +1,45 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 99, + "hallucinated_methods": [], + "notes": "Correctly chose WP_HTML_Processor::create_fragment() for tree-aware text collection. All HTML API calls are documented in the rendered docs. The single next_token() pass with explicit anchor state matches the documented repeated-region pattern, filters to #text before get_modifiable_text(), and uses is_string(get_attribute('href')) to exclude missing and boolean href values. Minor caveat: returning an empty array on any later get_last_error() is a policy choice not required by the task." + }, + { + "trial_id": "trial-2", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Correct processor choice and no undocumented HTML API usage. The single next_token() state machine is idiomatic and handles decoded text plus string/true/null href semantics correctly. Slight deduction because it never checks get_last_error() or paused_at_incomplete_token(), so unsupported markup or a final incomplete token could silently produce a partial result despite the docs explaining how to detect parser aborts/truncation." + }, + { + "trial_id": "trial-3", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Processor, next_tag('A'), get_current_depth(), a >= depth-bounded next_token() subtree walk, #text filtering, and get_modifiable_text(). All called methods are documented, including inherited paused_at_incomplete_token(). The main caveat is that it treats paused_at_incomplete_token() as grounds to discard all results; the docs say incomplete-token handling is caller-policy dependent, and the task only required handling unclosed elements, which the processor represents with virtual closers." + } + ], + "failure_analysis": "All trials passed all 8 frozen hidden cases, and execution.json recorded no _doing_it_wrong entries. The docs did well on the core concepts this task needs: the 'Which processor should I use?' guidance points subjects to WP_HTML_Processor for collecting element text; the 'Recipe: collect DOM-style text from a subtree' shows create_fragment(), next_tag(), get_current_depth(), next_token(), #text filtering, and get_modifiable_text(); get_attribute() documents string/true/null semantics; get_modifiable_text() documents decoded text; next_token()/get_current_depth() explain virtual closers, which is why the unclosed-link case passed. Near-misses were mostly policy ambiguities, not API hallucinations: trial 2 could silently return partial data after a parser abort, and trial 3 could over-reject a fragment ending in a mid-token after already collecting valid links. Neither ambiguity was exposed by the frozen cases.", + "doc_gaps": [ + { + "location": "html-processor.md: WP_HTML_Processor::get_attribute()", + "problem": "The HTML Processor method section shows string|true|null and examples, but the explicit 'string values are returned decoded' contract is present in the Tag Processor page, not repeated here.", + "suggestion": "Duplicate the decoded-attribute-value sentence in the WP_HTML_Processor get_attribute() section, since users doing structural work may read only the HTML Processor method docs." + }, + { + "location": "html-processor.md: next_token() and 'Recipe: collect DOM-style text from a subtree'", + "problem": "The docs warn that nested next_token() loops can skip boundaries, while also showing depth-bounded subtree walks. The safe boundary between those patterns is implicit.", + "suggestion": "Add a short rule of thumb: a depth-bounded inner walk is appropriate when intentionally consuming one matched subtree before resuming after it; use one outer next_token() state machine when multiple repeated regions or sibling boundaries must be tracked concurrently." + }, + { + "location": "html-processor.md: incomplete-input notes near next_token(), get_current_depth(), and serialize_token()", + "problem": "The docs mention paused_at_incomplete_token(), but the distinction between an unclosed element that receives a virtual closer and a truly incomplete final syntax token is easy to blur.", + "suggestion": "Add a compact contrast example, such as '

                      text' versus '

                      text next_token() ) { + if ( '#tag' === $processor->get_token_type() && 'A' === $processor->get_tag() ) { + if ( $processor->is_tag_closer() ) { + if ( ! empty( $a_stack ) ) { + array_pop( $a_stack ); + } + continue; + } + + $href = $processor->get_attribute( 'href' ); + if ( is_string( $href ) ) { + $links[] = array( + 'href' => $href, + 'text' => '', + ); + $a_stack[] = count( $links ) - 1; + } else { + $a_stack[] = null; + } + + continue; + } + + if ( '#text' === $processor->get_token_type() && ! empty( $a_stack ) ) { + $text = $processor->get_modifiable_text(); + + foreach ( $a_stack as $link_index ) { + if ( null !== $link_index ) { + $links[ $link_index ]['text'] .= $text; + } + } + } + } + + if ( null !== $processor->get_last_error() ) { + return array(); + } + + return $links; +} diff --git a/doc-experiment/results/round-44/T06-collect-links/trial-1/execution.json b/doc-experiment/results/round-44/T06-collect-links/trial-1/execution.json new file mode 100644 index 0000000000000..07c7d4e6d7a0e --- /dev/null +++ b/doc-experiment/results/round-44/T06-collect-links/trial-1/execution.json @@ -0,0 +1,148 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-44/T06-collect-links/trial-1/candidate.php", + "function": "collect_links", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "actual": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-excluded", + "status": "pass", + "expected": [ + { + "href": "/only", + "text": "real" + } + ], + "actual": [ + { + "href": "/only", + "text": "real" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-in-href-decoded", + "status": "pass", + "expected": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "actual": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-link-empty-text", + "status": "pass", + "expected": [ + { + "href": "/img", + "text": "" + } + ], + "actual": [ + { + "href": "/img", + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-text", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "actual": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-links", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-link", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "actual": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-44/T06-collect-links/trial-1/response.json b/doc-experiment/results/round-44/T06-collect-links/trial-1/response.json new file mode 100644 index 0000000000000..a9d3cff643c26 --- /dev/null +++ b/doc-experiment/results/round-44/T06-collect-links/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` and a single `next_token()` walk so document order and subtree structure come from the HTML Processor. On each `A` opener, it reads `href` with `get_attribute()` and keeps only string values; while inside open anchors, it appends decoded text from `#text` tokens via `get_modifiable_text()`, ignoring markup and non-text token types as the docs recommend for DOM-style text collection.", + "confidence": 90 +} diff --git a/doc-experiment/results/round-44/T06-collect-links/trial-2/candidate.php b/doc-experiment/results/round-44/T06-collect-links/trial-2/candidate.php new file mode 100644 index 0000000000000..7e5401f65af29 --- /dev/null +++ b/doc-experiment/results/round-44/T06-collect-links/trial-2/candidate.php @@ -0,0 +1,50 @@ +next_token() ) { + if ( '#tag' === $processor->get_token_type() && 'A' === $processor->get_tag() ) { + if ( $processor->is_tag_closer() ) { + $link_index = array_pop( $anchor_stack ); + if ( null !== $link_index ) { + array_pop( $active_link_indexes ); + } + continue; + } + + $href = $processor->get_attribute( 'href' ); + if ( is_string( $href ) ) { + $links[] = array( + 'href' => $href, + 'text' => '', + ); + + $link_index = count( $links ) - 1; + $anchor_stack[] = $link_index; + $active_link_indexes[] = $link_index; + } else { + $anchor_stack[] = null; + } + + continue; + } + + if ( '#text' === $processor->get_token_type() && ! empty( $active_link_indexes ) ) { + $text = $processor->get_modifiable_text(); + + foreach ( $active_link_indexes as $link_index ) { + $links[ $link_index ]['text'] .= $text; + } + } + } + + return $links; +} diff --git a/doc-experiment/results/round-44/T06-collect-links/trial-2/execution.json b/doc-experiment/results/round-44/T06-collect-links/trial-2/execution.json new file mode 100644 index 0000000000000..9c8e5f6277c52 --- /dev/null +++ b/doc-experiment/results/round-44/T06-collect-links/trial-2/execution.json @@ -0,0 +1,148 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-44/T06-collect-links/trial-2/candidate.php", + "function": "collect_links", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "actual": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-excluded", + "status": "pass", + "expected": [ + { + "href": "/only", + "text": "real" + } + ], + "actual": [ + { + "href": "/only", + "text": "real" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-in-href-decoded", + "status": "pass", + "expected": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "actual": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-link-empty-text", + "status": "pass", + "expected": [ + { + "href": "/img", + "text": "" + } + ], + "actual": [ + { + "href": "/img", + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-text", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "actual": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-links", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-link", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "actual": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-44/T06-collect-links/trial-2/response.json b/doc-experiment/results/round-44/T06-collect-links/trial-2/response.json new file mode 100644 index 0000000000000..349fbecff9753 --- /dev/null +++ b/doc-experiment/results/round-44/T06-collect-links/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` and a single `next_token()` walk so text is collected with full tree awareness. On each `A` opener it reads `href` with `get_attribute()` and includes the link only when that value is a string (excluding missing and boolean `href`), then appends decoded text from `#text` tokens via `get_modifiable_text()` while the link is open, preserving document order.", + "confidence": 89 +} diff --git a/doc-experiment/results/round-44/T06-collect-links/trial-3/candidate.php b/doc-experiment/results/round-44/T06-collect-links/trial-3/candidate.php new file mode 100644 index 0000000000000..812858c223d1a --- /dev/null +++ b/doc-experiment/results/round-44/T06-collect-links/trial-3/candidate.php @@ -0,0 +1,43 @@ +next_tag( 'A' ) ) { + $href = $processor->get_attribute( 'href' ); + + if ( ! is_string( $href ) ) { + continue; + } + + $depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + if ( null !== $processor->get_last_error() || $processor->paused_at_incomplete_token() ) { + return array(); + } + + $links[] = array( + 'href' => $href, + 'text' => $text, + ); + } + + if ( null !== $processor->get_last_error() || $processor->paused_at_incomplete_token() ) { + return array(); + } + + return $links; +} diff --git a/doc-experiment/results/round-44/T06-collect-links/trial-3/execution.json b/doc-experiment/results/round-44/T06-collect-links/trial-3/execution.json new file mode 100644 index 0000000000000..6dbe629a0a073 --- /dev/null +++ b/doc-experiment/results/round-44/T06-collect-links/trial-3/execution.json @@ -0,0 +1,148 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-44/T06-collect-links/trial-3/candidate.php", + "function": "collect_links", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "actual": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-excluded", + "status": "pass", + "expected": [ + { + "href": "/only", + "text": "real" + } + ], + "actual": [ + { + "href": "/only", + "text": "real" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-in-href-decoded", + "status": "pass", + "expected": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "actual": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-link-empty-text", + "status": "pass", + "expected": [ + { + "href": "/img", + "text": "" + } + ], + "actual": [ + { + "href": "/img", + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-text", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "actual": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-links", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-link", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "actual": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-44/T06-collect-links/trial-3/response.json b/doc-experiment/results/round-44/T06-collect-links/trial-3/response.json new file mode 100644 index 0000000000000..52bd4954c0955 --- /dev/null +++ b/doc-experiment/results/round-44/T06-collect-links/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because collecting an anchor\u2019s text content requires tree-aware parsing. It scans `A` tags in document order with `next_tag()`, keeps only anchors whose `href` is reported as a string by `get_attribute()`, then walks each anchor\u2019s subtree with `next_token()` while `get_current_depth()` stays inside that element, concatenating only `#text` tokens via `get_modifiable_text()`. It also rejects unsupported or truncated input by checking `get_last_error()` and `paused_at_incomplete_token()`.", + "confidence": 88 +} diff --git a/doc-experiment/results/round-44/T08-table-extract/judge.json b/doc-experiment/results/round-44/T08-table-extract/judge.json new file mode 100644 index 0000000000000..0ce16873f4056 --- /dev/null +++ b/doc-experiment/results/round-44/T08-table-extract/judge.json @@ -0,0 +1,45 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 94, + "hallucinated_methods": [], + "notes": "Correctly chose WP_HTML_Processor::create_fragment() and used documented methods only: next_tag, next_token, get_current_depth, get_token_type, get_token_name, is_tag_closer, get_modifiable_text, and get_last_error. The single depth-bounded token walk is idiomatic and matches the docs' repeated-region pattern. Minor deduction: it opts into special-element modifiable text inside cells, which the docs say should not be included for ordinary subtree text unless explicitly requested." + }, + { + "trial_id": "trial-2", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Correct processor and no undocumented API calls. The implementation closely follows the documented pattern: create a fragment processor, find TABLE, record depth, walk once with next_token(), track TR/TD/TH state, and read decoded #text via get_modifiable_text(). Minor deduction for the redundant manual EOF flush, since the docs explain that virtual closers make closer-driven flushing reliable, including for omitted closers." + }, + { + "trial_id": "trial-3", + "adherence": 94, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Processor and only documented methods: create_fragment, next_tag, next_token, get_current_depth, get_token_type, get_tag, is_tag_closer, get_modifiable_text, and get_last_error. The traversal is idiomatic and depth-bounded. Minor deduction matches trial-1: it includes SCRIPT/STYLE/TEXTAREA/TITLE opener modifiable text even though the task asked for text nodes and the docs' ordinary subtree-text recipe says to collect #text tokens unless special-element content is explicitly part of the contract." + } + ], + "failure_analysis": "All three trials passed all frozen cases: simple tables, THEAD/TBODY structure, omitted row/cell closers, inline markup in cells, decoded entities, no-table, first-table-only, and empty cells. The docs did well in three places: the Tag Processor overview explicitly says to use the HTML Processor when structure, text extraction, or implied/missing closers matter; WP_HTML_Processor::next_token() documents synthesized table structure and the single-cursor/single-loop state-machine pattern; get_modifiable_text() documents decoded #text values, which explains the entity test success. The main near-miss is special-element text. Trial-1 and trial-3 treated special element opener payloads as cell text. A probe with AC shows the reference returns AC and empty string, while those trials return ABC and D. The relevant docs exist under 'Recipe: collect DOM-style text from a subtree' and get_modifiable_text(), but the availability of modifiable text on SCRIPT/TEXTAREA/TITLE/STYLE still invited over-inclusion. Trial-2 also shows a smaller near-miss: it manually flushes any open row/cell after the walk, suggesting it did not fully trust the documented virtual closer behavior, though that did not affect the hidden cases.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::next_token() docblock / 'Recipe: collect DOM-style text from a subtree'", + "problem": "The docs state the #text-only rule, but models still inferred that special-element modifiable text should be part of generic text extraction.", + "suggestion": "Add a compact generic example contrasting ordinary subtree text with special-element payloads, e.g. a DIV containing text, SCRIPT, TEXTAREA, and more text, and state that generic DOM-style text extraction should append only visited #text tokens unless the caller explicitly requests raw/RCDATA element payloads." + }, + { + "location": "WP_HTML_Tag_Processor::get_modifiable_text() docblock", + "problem": "The method name and broad return behavior can be mistaken for 'this token contributes text content' instead of 'this token has editable payload bytes/text'.", + "suggestion": "Strengthen the warning that non-empty modifiable text is not a text-node predicate. Explicitly say that SCRIPT/STYLE/TITLE/TEXTAREA opener payloads should not be included in generic subtree text just because get_modifiable_text() returns a string." + }, + { + "location": "WP_HTML_Processor::next_token() or get_current_depth() docblock", + "problem": "The reliable virtual-closer behavior is documented, but redundant EOF flushing suggests uncertainty about whether omitted or end-of-input closers are visited.", + "suggestion": "Add one general repeated-region example with omitted closing tags showing opener events, virtual closer events, and closer-driven flushing, emphasizing that callers usually should not add a second EOF flush unless defining a special partial-input policy." + }, + { + "location": "WP_HTML_Processor::get_last_error() / incomplete-token guidance", + "problem": "The docs mention unsupported markup and incomplete trailing syntax in several places, but the policy distinction for read-only extraction versus mutation/rewrite remains diffuse.", + "suggestion": "Add a short decision note: read-only extraction may choose best-effort partial results, while mutations or contracts requiring complete input should check paused_at_incomplete_token and get_last_error before returning transformed output." + } + ] +} diff --git a/doc-experiment/results/round-44/T08-table-extract/trial-1/candidate.php b/doc-experiment/results/round-44/T08-table-extract/trial-1/candidate.php new file mode 100644 index 0000000000000..5072b3d09785d --- /dev/null +++ b/doc-experiment/results/round-44/T08-table-extract/trial-1/candidate.php @@ -0,0 +1,82 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = null; + $text_elements = array( + 'IFRAME' => true, + 'NOEMBED' => true, + 'NOFRAME' => true, + 'NOFRAMES' => true, + 'NOSCRIPT' => true, + 'SCRIPT' => true, + 'STYLE' => true, + 'TEXTAREA' => true, + 'TITLE' => true, + 'XMP' => true, + ); + + while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) { + $token_type = $processor->get_token_type(); + $token_name = $processor->get_token_name(); + + if ( '#tag' === $token_type ) { + if ( 'TR' === $token_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_row ) { + $rows[] = $current_row; + $current_row = null; + } + } else { + $current_row = array(); + } + + continue; + } + + if ( 'TD' === $token_name || 'TH' === $token_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_row && null !== $current_cell ) { + $current_row[] = $current_cell; + $current_cell = null; + } + } else { + $current_cell = ''; + } + + continue; + } + + if ( + null !== $current_cell && + ! $processor->is_tag_closer() && + isset( $text_elements[ $token_name ] ) + ) { + $current_cell .= $processor->get_modifiable_text(); + } + + continue; + } + + if ( '#text' === $token_type && null !== $current_cell ) { + $current_cell .= $processor->get_modifiable_text(); + } + } + + if ( null !== $processor->get_last_error() ) { + return array(); + } + + return $rows; +} diff --git a/doc-experiment/results/round-44/T08-table-extract/trial-1/execution.json b/doc-experiment/results/round-44/T08-table-extract/trial-1/execution.json new file mode 100644 index 0000000000000..383781e57adee --- /dev/null +++ b/doc-experiment/results/round-44/T08-table-extract/trial-1/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-44/T08-table-extract/trial-1/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-44/T08-table-extract/trial-1/response.json b/doc-experiment/results/round-44/T08-table-extract/trial-1/response.json new file mode 100644 index 0000000000000..9f9b1fd601928 --- /dev/null +++ b/doc-experiment/results/round-44/T08-table-extract/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with browser-like table handling, finds the first `TABLE` with `next_tag()`, then walks that table subtree token-by-token with `next_token()` and `get_current_depth()`. It collects rows on `TR` open/close, cells on `TD`/`TH` open/close, appends ordinary text from `#text` tokens via `get_modifiable_text()`, and also includes text carried on special text-only element openers such as `TEXTAREA` or `SCRIPT` when they appear inside a cell.", + "confidence": 86 +} diff --git a/doc-experiment/results/round-44/T08-table-extract/trial-2/candidate.php b/doc-experiment/results/round-44/T08-table-extract/trial-2/candidate.php new file mode 100644 index 0000000000000..b38a70f172b10 --- /dev/null +++ b/doc-experiment/results/round-44/T08-table-extract/trial-2/candidate.php @@ -0,0 +1,59 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = null; + + while ( $processor->next_token() ) { + if ( $processor->get_current_depth() < $table_depth ) { + break; + } + + $token_type = $processor->get_token_type(); + $tag_name = $processor->get_tag(); + + if ( '#tag' === $token_type ) { + if ( ! $processor->is_tag_closer() ) { + if ( 'TR' === $tag_name ) { + $current_row = array(); + } elseif ( null !== $current_row && ( 'TD' === $tag_name || 'TH' === $tag_name ) ) { + $current_cell = ''; + } + } else { + if ( null !== $current_cell && ( 'TD' === $tag_name || 'TH' === $tag_name ) ) { + $current_row[] = $current_cell; + $current_cell = null; + } elseif ( null !== $current_row && 'TR' === $tag_name ) { + $rows[] = $current_row; + $current_row = null; + } + } + + continue; + } + + if ( null !== $current_cell && '#text' === $token_type ) { + $current_cell .= $processor->get_modifiable_text(); + } + } + + if ( null !== $current_cell && null !== $current_row ) { + $current_row[] = $current_cell; + } + + if ( null !== $current_row ) { + $rows[] = $current_row; + } + + return $rows; +} diff --git a/doc-experiment/results/round-44/T08-table-extract/trial-2/execution.json b/doc-experiment/results/round-44/T08-table-extract/trial-2/execution.json new file mode 100644 index 0000000000000..a3fda168ae862 --- /dev/null +++ b/doc-experiment/results/round-44/T08-table-extract/trial-2/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-44/T08-table-extract/trial-2/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-44/T08-table-extract/trial-2/response.json b/doc-experiment/results/round-44/T08-table-extract/trial-2/response.json new file mode 100644 index 0000000000000..42f45fee09a55 --- /dev/null +++ b/doc-experiment/results/round-44/T08-table-extract/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with browser-like table rules, finds the first `TABLE` with `next_tag()`, then walks tokens with `next_token()` while staying inside that table by comparing `get_current_depth()`. It starts rows on `TR` openers, starts cells on `TD`/`TH` openers, appends decoded text from `#text` tokens via `get_modifiable_text()`, and finalizes cells and rows on their closing tokens.", + "confidence": 89 +} diff --git a/doc-experiment/results/round-44/T08-table-extract/trial-3/candidate.php b/doc-experiment/results/round-44/T08-table-extract/trial-3/candidate.php new file mode 100644 index 0000000000000..b456b36a957ca --- /dev/null +++ b/doc-experiment/results/round-44/T08-table-extract/trial-3/candidate.php @@ -0,0 +1,69 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = null; + + while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) { + $token_type = $processor->get_token_type(); + $tag_name = $processor->get_tag(); + + if ( '#tag' === $token_type ) { + if ( 'TR' === $tag_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_row ) { + $rows[] = $current_row; + $current_row = null; + } + } else { + $current_row = array(); + } + + continue; + } + + if ( 'TD' === $tag_name || 'TH' === $tag_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_row && null !== $current_cell ) { + $current_row[] = $current_cell; + } + $current_cell = null; + } else { + $current_cell = ''; + } + + continue; + } + + if ( + null !== $current_cell && + ! $processor->is_tag_closer() && + in_array( $tag_name, array( 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE' ), true ) + ) { + $current_cell .= $processor->get_modifiable_text(); + } + + continue; + } + + if ( '#text' === $token_type && null !== $current_cell ) { + $current_cell .= $processor->get_modifiable_text(); + } + } + + if ( null !== $processor->get_last_error() ) { + return array(); + } + + return $rows; +} diff --git a/doc-experiment/results/round-44/T08-table-extract/trial-3/execution.json b/doc-experiment/results/round-44/T08-table-extract/trial-3/execution.json new file mode 100644 index 0000000000000..da8eb7b392ab0 --- /dev/null +++ b/doc-experiment/results/round-44/T08-table-extract/trial-3/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-44/T08-table-extract/trial-3/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-44/T08-table-extract/trial-3/response.json b/doc-experiment/results/round-44/T08-table-extract/trial-3/response.json new file mode 100644 index 0000000000000..e8232c579078f --- /dev/null +++ b/doc-experiment/results/round-44/T08-table-extract/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the fragment with browser-like table handling, finds the first `TABLE` with `next_tag()`, then makes one depth-bounded `next_token()` walk through that table. It tracks `TR` and `TD`/`TH` openers and closers via `get_tag()` and `is_tag_closer()`, appends decoded text from `#text` tokens with `get_modifiable_text()`, and returns an empty array if no table is found or the processor aborts with `get_last_error()`.", + "confidence": 88 +} diff --git a/doc-experiment/results/round-44/codex-judges-output.json b/doc-experiment/results/round-44/codex-judges-output.json new file mode 100644 index 0000000000000..a7c30ad91d76f --- /dev/null +++ b/doc-experiment/results/round-44/codex-judges-output.json @@ -0,0 +1,234 @@ +{ + "result": [ + { + "id": "T03-first-h1-text", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), next_tag('H1'), depth-bounded next_token() walking, #text filtering, and get_modifiable_text() exactly as documented for subtree text extraction. All called methods appear in the rendered docs and execution recorded no misuse." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same fully documented, idiomatic approach as the reference: HTML Processor fragment parsing, first H1 match, subtree walk guarded by get_current_depth() >= opener depth, and decoded #text accumulation. No undocumented API or _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 91, + "hallucinated_methods": [], + "notes": "Correct processor and all methods are documented. The main #text walk is idiomatic, but the extra branch appending get_modifiable_text() from every non-closing #tag over-applies the special-element guidance. It is harmless for ordinary inline tags and passed the hidden cases, but would include SCRIPT/STYLE/TEXTAREA/TITLE opener text when the ordinary subtree-text recipe says to include only #text tokens unless the caller explicitly opts in." + } + ], + "failure_analysis": "All trials passed all 8 frozen cases, so there were no failed hidden cases to attribute. The docs worked well because they directly exposed the needed pattern: choose WP_HTML_Processor for tree-aware text extraction, create a BODY fragment with create_fragment(), find the first element with next_tag(), record get_current_depth(), walk with next_token(), keep the guard as >=, and append only #text tokens via get_modifiable_text(). The next_token/get_current_depth docs also explain virtual closers and malformed input well enough for the unclosed-h1 case, and get_modifiable_text() clearly states that ordinary #text is already decoded, explaining the entity case. The only near-miss was trial-3: it noticed that special elements carry modifiable text on opener tokens and generalized that into a generic #tag branch. A read-only probe shows the risk: for

                      AC

                      , the reference-style #text walk returns \"AC\" while trial-3 returns \"ABC\"; for TEXTAREA it similarly appends opener text. The rendered overview recipe explicitly warns against this, but the next_token and get_modifiable_text method sections can still be read in isolation as encouragement to add opener-token text during subtree extraction.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::next_token() docblock, special-element exception", + "problem": "The special-element paragraph says to read SCRIPT/STYLE/TITLE/TEXTAREA text from the opening token, but does not locally restate that this is an opt-in policy, not part of ordinary subtree #text extraction.", + "suggestion": "Add a sentence such as: \"Do this only when the caller explicitly wants those special-element contents; a generic DOM-style text-node walk should still append only #text tokens.\" Also mention SCRIPT/STYLE are raw, not decoded." + }, + { + "location": "WP_HTML_Processor::get_modifiable_text() docblock", + "problem": "The method explains that many token kinds can carry modifiable text, but the method section itself does not strongly warn that get_modifiable_text() is not a predicate for ordinary text content.", + "suggestion": "Add a warning that ordinary text extraction should first check get_token_type() === '#text'; comments, processing instructions, raw-text elements, and special opener tokens require explicit whitelisting." + }, + { + "location": "HTML Processor text-extraction examples", + "problem": "The successful recipe is in the overview, while method-level readers may jump straight to next_token() or get_modifiable_text() and miss the default-vs-opt-in distinction.", + "suggestion": "Cross-link those method docs back to the \"collect DOM-style text from a subtree\" recipe, using wording that distinguishes ordinary text-node content from special-element modifiable text." + } + ] + } + }, + { + "id": "T05-text-excerpt", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 93, + "hallucinated_methods": [], + "notes": "Primary processor choice is correct: `WP_HTML_Processor::create_fragment()` plus `next_token()` for text-bearing tokens. All HTML API calls are documented and no `_doing_it_wrong` records appeared. Small penalty for the `WP_HTML_Tag_Processor` fallback after HTML Processor errors: it is documented, but the docs warn that Tag Processor token walking is lexical and not equivalent to DOM-style fragment text extraction." + }, + { + "trial_id": "trial-2", + "adherence": 99, + "hallucinated_methods": [], + "notes": "Best adherence. Uses the documented HTML Processor fragment factory, a single `next_token()` walk, `#text` filtering, and explicit `TITLE`/`TEXTAREA` opener handling through decoded `get_modifiable_text()`. All called API methods are present in the rendered docs. Minor residual gap: no explicit post-walk unsupported-parser policy, though this task did not require rejecting unsupported input." + }, + { + "trial_id": "trial-3", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Correct documented API usage throughout: HTML Processor fragment parsing, token walking, special-element whitelist, decoded text, and `get_last_error()`. The conservative empty-string return on later parser error is a reasonable documented policy, but it is not clearly required by the task; it also collects the full text before truncating, which is less idiomatic for bounded excerpts but not an API misuse." + } + ], + "failure_analysis": "No hidden case failed in any trial: all three passed 10/10, with empty `doing_it_wrong` records. The docs did well at steering subjects to `WP_HTML_Processor::create_fragment()` for BODY fragments, `next_token()` instead of tag-only walking, `#text` checks before calling `get_modifiable_text()`, and the special rule that `TITLE` and `TEXTAREA` carry decoded text on opener tokens while `SCRIPT` and `STYLE` should not be included by default. The main near-miss was trial-1’s belief that a `WP_HTML_Tag_Processor` fallback applies the same token rules after an HTML Processor abort. That did not fail these tests, but it would change semantics for malformed or structurally significant HTML because the Tag Processor is lexical and lacks BODY-fragment parsing, implied elements, virtual closers, breadcrumbs, and tree order guarantees.", + "doc_gaps": [ + { + "location": "html-processor.md: Recipe: collect DOM-style text from a subtree", + "problem": "The recipe explains ordinary text extraction and special-element opt-in well, but it does not explicitly state the fallback policy for read-only extractors when `get_last_error()` becomes non-null.", + "suggestion": "Add a short policy note: after an unsupported-parser abort, any accumulated read-only extraction is partial; callers should deliberately choose partial output, empty/null, original input, or a clearly lexical fallback." + }, + { + "location": "html-tag-processor.md: Tokens and finer-grained processing", + "problem": "The docs say Tag Processor token walking is lexical, but the warning could be missed when users look for a fallback after HTML Processor unsupported markup.", + "suggestion": "Add an explicit warning that a Tag Processor fallback is not semantically equivalent to an HTML Processor text walk: it does not perform BODY-fragment parsing, implied closing, virtual closers, or tree-aware traversal." + }, + { + "location": "html-processor.md: create_fragment() / HTML Support", + "problem": "`create_fragment()` null creation failure and later `get_last_error()` aborts are documented separately, but examples focus more on mutation/serialization than read-only extraction.", + "suggestion": "Add a general read-only walking note distinguishing factory failure from mid-walk abort, and explain that text/token results collected before an abort are only a caller-defined best-effort result." + } + ] + } + }, + { + "id": "T06-collect-links", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 99, + "hallucinated_methods": [], + "notes": "Correctly chose WP_HTML_Processor::create_fragment() for tree-aware text collection. All HTML API calls are documented in the rendered docs. The single next_token() pass with explicit anchor state matches the documented repeated-region pattern, filters to #text before get_modifiable_text(), and uses is_string(get_attribute('href')) to exclude missing and boolean href values. Minor caveat: returning an empty array on any later get_last_error() is a policy choice not required by the task." + }, + { + "trial_id": "trial-2", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Correct processor choice and no undocumented HTML API usage. The single next_token() state machine is idiomatic and handles decoded text plus string/true/null href semantics correctly. Slight deduction because it never checks get_last_error() or paused_at_incomplete_token(), so unsupported markup or a final incomplete token could silently produce a partial result despite the docs explaining how to detect parser aborts/truncation." + }, + { + "trial_id": "trial-3", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Processor, next_tag('A'), get_current_depth(), a >= depth-bounded next_token() subtree walk, #text filtering, and get_modifiable_text(). All called methods are documented, including inherited paused_at_incomplete_token(). The main caveat is that it treats paused_at_incomplete_token() as grounds to discard all results; the docs say incomplete-token handling is caller-policy dependent, and the task only required handling unclosed elements, which the processor represents with virtual closers." + } + ], + "failure_analysis": "All trials passed all 8 frozen hidden cases, and execution.json recorded no _doing_it_wrong entries. The docs did well on the core concepts this task needs: the 'Which processor should I use?' guidance points subjects to WP_HTML_Processor for collecting element text; the 'Recipe: collect DOM-style text from a subtree' shows create_fragment(), next_tag(), get_current_depth(), next_token(), #text filtering, and get_modifiable_text(); get_attribute() documents string/true/null semantics; get_modifiable_text() documents decoded text; next_token()/get_current_depth() explain virtual closers, which is why the unclosed-link case passed. Near-misses were mostly policy ambiguities, not API hallucinations: trial 2 could silently return partial data after a parser abort, and trial 3 could over-reject a fragment ending in a mid-token after already collecting valid links. Neither ambiguity was exposed by the frozen cases.", + "doc_gaps": [ + { + "location": "html-processor.md: WP_HTML_Processor::get_attribute()", + "problem": "The HTML Processor method section shows string|true|null and examples, but the explicit 'string values are returned decoded' contract is present in the Tag Processor page, not repeated here.", + "suggestion": "Duplicate the decoded-attribute-value sentence in the WP_HTML_Processor get_attribute() section, since users doing structural work may read only the HTML Processor method docs." + }, + { + "location": "html-processor.md: next_token() and 'Recipe: collect DOM-style text from a subtree'", + "problem": "The docs warn that nested next_token() loops can skip boundaries, while also showing depth-bounded subtree walks. The safe boundary between those patterns is implicit.", + "suggestion": "Add a short rule of thumb: a depth-bounded inner walk is appropriate when intentionally consuming one matched subtree before resuming after it; use one outer next_token() state machine when multiple repeated regions or sibling boundaries must be tracked concurrently." + }, + { + "location": "html-processor.md: incomplete-input notes near next_token(), get_current_depth(), and serialize_token()", + "problem": "The docs mention paused_at_incomplete_token(), but the distinction between an unclosed element that receives a virtual closer and a truly incomplete final syntax token is easy to blur.", + "suggestion": "Add a compact contrast example, such as '

                      text' versus '

                      text AC shows the reference returns AC and empty string, while those trials return ABC and D. The relevant docs exist under 'Recipe: collect DOM-style text from a subtree' and get_modifiable_text(), but the availability of modifiable text on SCRIPT/TEXTAREA/TITLE/STYLE still invited over-inclusion. Trial-2 also shows a smaller near-miss: it manually flushes any open row/cell after the walk, suggesting it did not fully trust the documented virtual closer behavior, though that did not affect the hidden cases.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::next_token() docblock / 'Recipe: collect DOM-style text from a subtree'", + "problem": "The docs state the #text-only rule, but models still inferred that special-element modifiable text should be part of generic text extraction.", + "suggestion": "Add a compact generic example contrasting ordinary subtree text with special-element payloads, e.g. a DIV containing text, SCRIPT, TEXTAREA, and more text, and state that generic DOM-style text extraction should append only visited #text tokens unless the caller explicitly requests raw/RCDATA element payloads." + }, + { + "location": "WP_HTML_Tag_Processor::get_modifiable_text() docblock", + "problem": "The method name and broad return behavior can be mistaken for 'this token contributes text content' instead of 'this token has editable payload bytes/text'.", + "suggestion": "Strengthen the warning that non-empty modifiable text is not a text-node predicate. Explicitly say that SCRIPT/STYLE/TITLE/TEXTAREA opener payloads should not be included in generic subtree text just because get_modifiable_text() returns a string." + }, + { + "location": "WP_HTML_Processor::next_token() or get_current_depth() docblock", + "problem": "The reliable virtual-closer behavior is documented, but redundant EOF flushing suggests uncertainty about whether omitted or end-of-input closers are visited.", + "suggestion": "Add one general repeated-region example with omitted closing tags showing opener events, virtual closer events, and closer-driven flushing, emphasizing that callers usually should not add a second EOF flush unless defining a special partial-input policy." + }, + { + "location": "WP_HTML_Processor::get_last_error() / incomplete-token guidance", + "problem": "The docs mention unsupported markup and incomplete trailing syntax in several places, but the policy distinction for read-only extraction versus mutation/rewrite remains diffuse.", + "suggestion": "Add a short decision note: read-only extraction may choose best-effort partial results, while mutations or contracts requiring complete input should check paused_at_incomplete_token and get_last_error before returning transformed output." + } + ] + } + }, + { + "id": "N06-extract-toc", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Correctly chose WP_HTML_Processor::create_fragment for body-fragment structural parsing. Every HTML API method used is documented. The depth-bounded next_token subtree walk with a #text guard and get_modifiable_text follows the documented DOM-style text recipe. The is_tag_closer check after plain next_tag is redundant because next_tag skips closers by default, but harmless." + }, + { + "trial_id": "trial-2", + "adherence": 92, + "hallucinated_methods": [], + "notes": "Correct processor choice and no undocumented API calls. The single next_token loop with opener/closer state is a documented pattern and handles virtual closers, empty headings, and implied closes. The weak spot is appending get_modifiable_text from non-heading tag opener tokens inside a heading; docs say ordinary subtree text should be only #text tokens unless special-element contents are explicitly desired. This would include TEXTAREA/TITLE decoded text and SCRIPT/STYLE raw text beyond the reference policy." + }, + { + "trial_id": "trial-3", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Near-reference implementation: correct processor, all methods documented, depth-bounded next_token walk, #text-only accumulation, decoded text via get_modifiable_text, and null create_fragment handling. The final get_last_error fallback is documented and conservative, but it can discard already-collected headings on unsupported markup and does not separately consider paused_at_incomplete_token." + } + ], + "failure_analysis": "No failed frozen/hidden cases: all three trials passed all 7 cases. The docs did well in the key places: 'Which processor should I use?' steered subjects away from the Tag Processor for structural text extraction; 'Recipe: collect DOM-style text from a subtree', next_token(), and get_current_depth() gave the depth-bounded #text accumulation pattern; get_tag() returning uppercase handled source case; next_token() describing virtual/implied closers covered '

                      One

                      Two'; and get_modifiable_text() documenting decoded #text handled '&'. Near-misses were Trial 2 over-applying the special-element modifiable-text passage despite the ordinary-text warning, and Trial 3 choosing an unsupported-markup fallback policy that is not clearly specified for read-only extraction tasks.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_modifiable_text() docblock", + "problem": "The docblock explains that special elements carry modifiable text on their opener, but readers can miss that this is not ordinary subtree text.", + "suggestion": "Add a warning and cross-reference: for DOM-style subtree extraction, guard on get_token_type() === '#text'; reading modifiable text from SCRIPT, STYLE, TITLE, or TEXTAREA openers is an explicit opt-in policy." + }, + { + "location": "WP_HTML_Processor::next_token() docblock, nested-loop guidance", + "problem": "The warning against nested next_token loops can seem to discourage the valid bounded-subtree walk shown elsewhere, while not spelling out the boundary between the two patterns.", + "suggestion": "Clarify when a bounded inner walk from a matched opener is safe versus when a single stateful loop is preferred, especially around whether the terminating token itself must be processed by the outer loop." + }, + { + "location": "WP_HTML_Processor::get_last_error() and create_fragment() docs", + "problem": "Unsupported-parser abort guidance is clearer for serialization and mutation than for read-only semantic extraction.", + "suggestion": "State that read-only scans may have partial results when get_last_error() becomes non-null, and callers must choose a contract-specific fallback such as partial results, null, empty result, or error." + }, + { + "location": "WP_HTML_Processor::next_tag() docblock", + "problem": "The default tag_closers behavior is documented in the parameter table, but redundant is_tag_closer guards after plain next_tag suggest it is easy to overlook.", + "suggestion": "Add a short sentence after the signature: plain next_tag() visits opener tags only; request tag_closers => 'visit' when closer tokens are part of the algorithm." + } + ] + } + } + ] +} diff --git a/doc-experiment/results/round-44/codex-trials-output.json b/doc-experiment/results/round-44/codex-trials-output.json new file mode 100644 index 0000000000000..b440657e0b7c3 --- /dev/null +++ b/doc-experiment/results/round-44/codex-trials-output.json @@ -0,0 +1,143 @@ +{ + "subject_isolation": { + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-44/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." + }, + "result": [ + { + "id": "T03-first-h1-text", + "trial": 1, + "ok": true, + "code": "next_tag( 'H1' ) ) {\n return null;\n }\n\n $h1_depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation, then finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, and walks its subtree with `next_token()`, concatenating only `#text` tokens via `get_token_type()` and `get_modifiable_text()`. That yields decoded text content from nested elements while ignoring markup-only content like images.", + "confidence": 94 + }, + { + "id": "T03-first-h1-text", + "trial": 2, + "ok": true, + "code": "next_tag( 'H1' ) ) {\n return null;\n }\n\n $h1_depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation, finds the first `H1` with `next_tag()`, then walks that element’s subtree with `next_token()` while the depth stays within the `H1`, concatenating only `#text` tokens via `get_modifiable_text()` so nested markup contributes no markup and character references are already decoded.", + "confidence": 94 + }, + { + "id": "T03-first-h1-text", + "trial": 3, + "ok": true, + "code": "next_tag( 'H1' ) ) {\n return null;\n }\n\n $h1_depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n continue;\n }\n\n if ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is tree-aware, finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, then walks its subtree with `next_token()`. It concatenates decoded text from `#text` tokens via `get_modifiable_text()`, and also appends modifiable text carried on special opening-tag tokens so nested text-only elements are included.", + "confidence": 88 + }, + { + "id": "T05-text-excerpt", + "trial": 1, + "ok": true, + "code": "= $max_codepoints ) {\n return $used >= $max_codepoints;\n }\n\n $remaining = $max_codepoints - $used;\n $length = $codepoint_length( $text );\n\n if ( $length <= $remaining ) {\n $excerpt .= $text;\n $used += $length;\n return false;\n }\n\n $excerpt .= $codepoint_substr( $text, 0, $remaining );\n $used = $max_codepoints;\n return true;\n };\n\n $processor = WP_HTML_Processor::create_fragment( $html );\n\n if ( null !== $processor ) {\n while ( $processor->next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n if ( $append_text( $processor->get_modifiable_text() ) ) {\n return $excerpt;\n }\n\n continue;\n }\n\n if ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) {\n $name = $processor->get_token_name();\n if ( 'TITLE' === $name || 'TEXTAREA' === $name ) {\n if ( $append_text( $processor->get_modifiable_text() ) ) {\n return $excerpt;\n }\n }\n }\n }\n\n if ( null === $processor->get_last_error() ) {\n return $excerpt;\n }\n }\n\n $excerpt = '';\n $used = 0;\n $tags = new WP_HTML_Tag_Processor( $html );\n\n while ( $tags->next_token() ) {\n if ( '#text' === $tags->get_token_type() ) {\n if ( $append_text( $tags->get_modifiable_text() ) ) {\n return $excerpt;\n }\n\n continue;\n }\n\n if ( '#tag' === $tags->get_token_type() && ! $tags->is_tag_closer() ) {\n $name = $tags->get_token_name();\n if ( 'TITLE' === $name || 'TEXTAREA' === $name ) {\n if ( $append_text( $tags->get_modifiable_text() ) ) {\n return $excerpt;\n }\n }\n }\n }\n\n return $excerpt;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` plus `next_token()`, `get_token_type()`, `get_token_name()`, `is_tag_closer()`, and `get_modifiable_text()` to collect ordinary `#text` tokens and the decoded opener text for `TITLE` and `TEXTAREA`, then truncates by Unicode code points with UTF-8-aware slicing. If the HTML Processor aborts on unsupported markup, it falls back to a best-effort lexical scan with `WP_HTML_Tag_Processor` using the same token rules.", + "confidence": 86 + }, + { + "id": "T05-text-excerpt", + "trial": 2, + "ok": true, + "code": " 0 && $processor->next_token() ) {\n $chunk = '';\n\n if ( '#text' === $processor->get_token_type() ) {\n $chunk = $processor->get_modifiable_text();\n } elseif ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) {\n $tag_name = $processor->get_token_name();\n if ( 'TEXTAREA' === $tag_name || 'TITLE' === $tag_name ) {\n $chunk = $processor->get_modifiable_text();\n }\n }\n\n if ( '' === $chunk ) {\n continue;\n }\n\n $chunk_length = mb_strlen( $chunk, 'UTF-8' );\n if ( $chunk_length <= $remaining ) {\n $excerpt .= $chunk;\n $remaining -= $chunk_length;\n continue;\n }\n\n $excerpt .= mb_substr( $chunk, 0, $remaining, 'UTF-8' );\n break;\n }\n\n return $excerpt;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-context fragment safely, then walks tokens with `next_token()`. It appends decoded text from ordinary `#text` tokens plus decoded opener-carried text from `TITLE` and `TEXTAREA`, while naturally excluding `SCRIPT`, `STYLE`, comments, and other non-text tokens. Truncation is done by Unicode code points with `mb_strlen()` and `mb_substr()` using explicit `UTF-8`, as documented.", + "confidence": 91 + }, + { + "id": "T05-text-excerpt", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n continue;\n }\n\n if ( '#tag' !== $processor->get_token_type() || $processor->is_tag_closer() ) {\n continue;\n }\n\n $token_name = $processor->get_token_name();\n if ( 'TEXTAREA' === $token_name || 'TITLE' === $token_name ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n if ( null !== $processor->get_last_error() ) {\n return '';\n }\n\n if ( mb_strlen( $text, 'UTF-8' ) <= $max_codepoints ) {\n return $text;\n }\n\n return mb_substr( $text, 0, $max_codepoints, 'UTF-8' );\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the input as a BODY fragment, then walks every token with `next_token()`, appending decoded text from ordinary `#text` tokens plus the documented special text-bearing opener tokens `TITLE` and `TEXTAREA` via `get_modifiable_text()`, while naturally excluding `SCRIPT` and `STYLE`. It then truncates by Unicode code points with `mb_substr(..., 'UTF-8')` after checking the requested limit.", + "confidence": 90 + }, + { + "id": "T06-collect-links", + "trial": 1, + "ok": true, + "code": "next_token() ) {\n if ( '#tag' === $processor->get_token_type() && 'A' === $processor->get_tag() ) {\n if ( $processor->is_tag_closer() ) {\n if ( ! empty( $a_stack ) ) {\n array_pop( $a_stack );\n }\n continue;\n }\n\n $href = $processor->get_attribute( 'href' );\n if ( is_string( $href ) ) {\n $links[] = array(\n 'href' => $href,\n 'text' => '',\n );\n $a_stack[] = count( $links ) - 1;\n } else {\n $a_stack[] = null;\n }\n\n continue;\n }\n\n if ( '#text' === $processor->get_token_type() && ! empty( $a_stack ) ) {\n $text = $processor->get_modifiable_text();\n\n foreach ( $a_stack as $link_index ) {\n if ( null !== $link_index ) {\n $links[ $link_index ]['text'] .= $text;\n }\n }\n }\n }\n\n if ( null !== $processor->get_last_error() ) {\n return array();\n }\n\n return $links;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` and a single `next_token()` walk so document order and subtree structure come from the HTML Processor. On each `A` opener, it reads `href` with `get_attribute()` and keeps only string values; while inside open anchors, it appends decoded text from `#text` tokens via `get_modifiable_text()`, ignoring markup and non-text token types as the docs recommend for DOM-style text collection.", + "confidence": 90 + }, + { + "id": "T06-collect-links", + "trial": 2, + "ok": true, + "code": "next_token() ) {\n if ( '#tag' === $processor->get_token_type() && 'A' === $processor->get_tag() ) {\n if ( $processor->is_tag_closer() ) {\n $link_index = array_pop( $anchor_stack );\n if ( null !== $link_index ) {\n array_pop( $active_link_indexes );\n }\n continue;\n }\n\n $href = $processor->get_attribute( 'href' );\n if ( is_string( $href ) ) {\n $links[] = array(\n 'href' => $href,\n 'text' => '',\n );\n\n $link_index = count( $links ) - 1;\n $anchor_stack[] = $link_index;\n $active_link_indexes[] = $link_index;\n } else {\n $anchor_stack[] = null;\n }\n\n continue;\n }\n\n if ( '#text' === $processor->get_token_type() && ! empty( $active_link_indexes ) ) {\n $text = $processor->get_modifiable_text();\n\n foreach ( $active_link_indexes as $link_index ) {\n $links[ $link_index ]['text'] .= $text;\n }\n }\n }\n\n return $links;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` and a single `next_token()` walk so text is collected with full tree awareness. On each `A` opener it reads `href` with `get_attribute()` and includes the link only when that value is a string (excluding missing and boolean `href`), then appends decoded text from `#text` tokens via `get_modifiable_text()` while the link is open, preserving document order.", + "confidence": 89 + }, + { + "id": "T06-collect-links", + "trial": 3, + "ok": true, + "code": "next_tag( 'A' ) ) {\n $href = $processor->get_attribute( 'href' );\n\n if ( ! is_string( $href ) ) {\n continue;\n }\n\n $depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n if ( null !== $processor->get_last_error() || $processor->paused_at_incomplete_token() ) {\n return array();\n }\n\n $links[] = array(\n 'href' => $href,\n 'text' => $text,\n );\n }\n\n if ( null !== $processor->get_last_error() || $processor->paused_at_incomplete_token() ) {\n return array();\n }\n\n return $links;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because collecting an anchor’s text content requires tree-aware parsing. It scans `A` tags in document order with `next_tag()`, keeps only anchors whose `href` is reported as a string by `get_attribute()`, then walks each anchor’s subtree with `next_token()` while `get_current_depth()` stays inside that element, concatenating only `#text` tokens via `get_modifiable_text()`. It also rejects unsupported or truncated input by checking `get_last_error()` and `paused_at_incomplete_token()`.", + "confidence": 88 + }, + { + "id": "T08-table-extract", + "trial": 1, + "ok": true, + "code": "next_tag( 'TABLE' ) ) {\n return array();\n }\n\n $table_depth = $processor->get_current_depth();\n $rows = array();\n $current_row = null;\n $current_cell = null;\n $text_elements = array(\n 'IFRAME' => true,\n 'NOEMBED' => true,\n 'NOFRAME' => true,\n 'NOFRAMES' => true,\n 'NOSCRIPT' => true,\n 'SCRIPT' => true,\n 'STYLE' => true,\n 'TEXTAREA' => true,\n 'TITLE' => true,\n 'XMP' => true,\n );\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) {\n $token_type = $processor->get_token_type();\n $token_name = $processor->get_token_name();\n\n if ( '#tag' === $token_type ) {\n if ( 'TR' === $token_name ) {\n if ( $processor->is_tag_closer() ) {\n if ( null !== $current_row ) {\n $rows[] = $current_row;\n $current_row = null;\n }\n } else {\n $current_row = array();\n }\n\n continue;\n }\n\n if ( 'TD' === $token_name || 'TH' === $token_name ) {\n if ( $processor->is_tag_closer() ) {\n if ( null !== $current_row && null !== $current_cell ) {\n $current_row[] = $current_cell;\n $current_cell = null;\n }\n } else {\n $current_cell = '';\n }\n\n continue;\n }\n\n if (\n null !== $current_cell &&\n ! $processor->is_tag_closer() &&\n isset( $text_elements[ $token_name ] )\n ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n\n continue;\n }\n\n if ( '#text' === $token_type && null !== $current_cell ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n }\n\n if ( null !== $processor->get_last_error() ) {\n return array();\n }\n\n return $rows;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with browser-like table handling, finds the first `TABLE` with `next_tag()`, then walks that table subtree token-by-token with `next_token()` and `get_current_depth()`. It collects rows on `TR` open/close, cells on `TD`/`TH` open/close, appends ordinary text from `#text` tokens via `get_modifiable_text()`, and also includes text carried on special text-only element openers such as `TEXTAREA` or `SCRIPT` when they appear inside a cell.", + "confidence": 86 + }, + { + "id": "T08-table-extract", + "trial": 2, + "ok": true, + "code": "next_tag( 'TABLE' ) ) {\n return array();\n }\n\n $table_depth = $processor->get_current_depth();\n $rows = array();\n $current_row = null;\n $current_cell = null;\n\n while ( $processor->next_token() ) {\n if ( $processor->get_current_depth() < $table_depth ) {\n break;\n }\n\n $token_type = $processor->get_token_type();\n $tag_name = $processor->get_tag();\n\n if ( '#tag' === $token_type ) {\n if ( ! $processor->is_tag_closer() ) {\n if ( 'TR' === $tag_name ) {\n $current_row = array();\n } elseif ( null !== $current_row && ( 'TD' === $tag_name || 'TH' === $tag_name ) ) {\n $current_cell = '';\n }\n } else {\n if ( null !== $current_cell && ( 'TD' === $tag_name || 'TH' === $tag_name ) ) {\n $current_row[] = $current_cell;\n $current_cell = null;\n } elseif ( null !== $current_row && 'TR' === $tag_name ) {\n $rows[] = $current_row;\n $current_row = null;\n }\n }\n\n continue;\n }\n\n if ( null !== $current_cell && '#text' === $token_type ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n }\n\n if ( null !== $current_cell && null !== $current_row ) {\n $current_row[] = $current_cell;\n }\n\n if ( null !== $current_row ) {\n $rows[] = $current_row;\n }\n\n return $rows;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with browser-like table rules, finds the first `TABLE` with `next_tag()`, then walks tokens with `next_token()` while staying inside that table by comparing `get_current_depth()`. It starts rows on `TR` openers, starts cells on `TD`/`TH` openers, appends decoded text from `#text` tokens via `get_modifiable_text()`, and finalizes cells and rows on their closing tokens.", + "confidence": 89 + }, + { + "id": "T08-table-extract", + "trial": 3, + "ok": true, + "code": "next_tag( 'TABLE' ) ) {\n return array();\n }\n\n $table_depth = $processor->get_current_depth();\n $rows = array();\n $current_row = null;\n $current_cell = null;\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) {\n $token_type = $processor->get_token_type();\n $tag_name = $processor->get_tag();\n\n if ( '#tag' === $token_type ) {\n if ( 'TR' === $tag_name ) {\n if ( $processor->is_tag_closer() ) {\n if ( null !== $current_row ) {\n $rows[] = $current_row;\n $current_row = null;\n }\n } else {\n $current_row = array();\n }\n\n continue;\n }\n\n if ( 'TD' === $tag_name || 'TH' === $tag_name ) {\n if ( $processor->is_tag_closer() ) {\n if ( null !== $current_row && null !== $current_cell ) {\n $current_row[] = $current_cell;\n }\n $current_cell = null;\n } else {\n $current_cell = '';\n }\n\n continue;\n }\n\n if (\n null !== $current_cell &&\n ! $processor->is_tag_closer() &&\n in_array( $tag_name, array( 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE' ), true )\n ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n\n continue;\n }\n\n if ( '#text' === $token_type && null !== $current_cell ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n }\n\n if ( null !== $processor->get_last_error() ) {\n return array();\n }\n\n return $rows;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the fragment with browser-like table handling, finds the first `TABLE` with `next_tag()`, then makes one depth-bounded `next_token()` walk through that table. It tracks `TR` and `TD`/`TH` openers and closers via `get_tag()` and `is_tag_closer()`, appends decoded text from `#text` tokens with `get_modifiable_text()`, and returns an empty array if no table is found or the processor aborts with `get_last_error()`.", + "confidence": 88 + }, + { + "id": "N06-extract-toc", + "trial": 1, + "ok": true, + "code": "next_tag() ) {\n if ( $processor->is_tag_closer() ) {\n continue;\n }\n\n $tag = $processor->get_tag();\n if ( null === $tag || ! preg_match( '/^H([1-6])$/', $tag, $matches ) ) {\n continue;\n }\n\n $heading_depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $heading_depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n $toc[] = array(\n 'level' => (int) $matches[1],\n 'text' => $text,\n );\n }\n\n return $toc;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with tree awareness, then scans tags in document order with `next_tag()`. For each `H1` through `H6` opener, it records the heading depth with `get_current_depth()` and walks forward with `next_token()`, appending only descendant `#text` token content via `get_token_type()` and `get_modifiable_text()` until the walk leaves that heading subtree.", + "confidence": 92 + }, + { + "id": "N06-extract-toc", + "trial": 2, + "ok": true, + "code": " 1,\n 'H2' => 2,\n 'H3' => 3,\n 'H4' => 4,\n 'H5' => 5,\n 'H6' => 6,\n );\n\n $toc = array();\n $current_heading = null;\n\n while ( $processor->next_token() ) {\n $token_type = $processor->get_token_type();\n\n if ( '#tag' === $token_type ) {\n $tag = $processor->get_tag();\n\n if ( null !== $tag && isset( $heading_levels[ $tag ] ) ) {\n if ( $processor->is_tag_closer() ) {\n if ( null !== $current_heading && $current_heading['tag'] === $tag ) {\n $toc[] = array(\n 'level' => $current_heading['level'],\n 'text' => $current_heading['text'],\n );\n $current_heading = null;\n }\n } else {\n $current_heading = array(\n 'tag' => $tag,\n 'level' => $heading_levels[ $tag ],\n 'text' => '',\n );\n }\n\n continue;\n }\n }\n\n if ( null === $current_heading ) {\n continue;\n }\n\n if ( '#text' === $token_type ) {\n $current_heading['text'] .= $processor->get_modifiable_text();\n continue;\n }\n\n if ( '#tag' === $token_type && ! $processor->is_tag_closer() ) {\n $current_heading['text'] .= $processor->get_modifiable_text();\n }\n }\n\n return $toc;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, then walks tokens in document order with `next_token()`. It detects `H1` through `H6` opening and closing tags via `get_tag()` and `is_tag_closer()`, accumulates heading text from descendant `#text` tokens with `get_modifiable_text()`, and also includes modifiable text carried on special element opener tokens when they appear inside a heading.", + "confidence": 88 + }, + { + "id": "N06-extract-toc", + "trial": 3, + "ok": true, + "code": "next_tag() ) {\n $tag = $processor->get_tag();\n\n if ( ! in_array( $tag, $heading_tags, true ) ) {\n continue;\n }\n\n $heading_depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $heading_depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n $toc[] = array(\n 'level' => (int) substr( $tag, 1, 1 ),\n 'text' => $text,\n );\n }\n\n if ( null !== $processor->get_last_error() ) {\n return array();\n }\n\n return $toc;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because heading text extraction depends on document structure. It scans heading openers with `next_tag()`, identifies `H1` through `H6` via `get_tag()`, then walks each heading subtree with `next_token()` while the depth from `get_current_depth()` stays inside that heading, appending only `#text` token content from `get_modifiable_text()` so nested markup contributes text but not tags.", + "confidence": 90 + } + ] +} diff --git a/doc-experiment/results/round-44/round-metadata.json b/doc-experiment/results/round-44/round-metadata.json new file mode 100644 index 0000000000000..b957541f38d3b --- /dev/null +++ b/doc-experiment/results/round-44/round-metadata.json @@ -0,0 +1,159 @@ +{ + "round": "round-44", + "mode": "shadow-doc-a/b", + "task_ids": [ + "T03-first-h1-text", + "T05-text-excerpt", + "T06-collect-links", + "T08-table-extract", + "N06-extract-toc" + ], + "task_count": 5, + "splits": { + "train": 5 + }, + "concepts": { + "text": 3, + "traversal": 2 + }, + "trials_per_task": 3, + "subject": { + "model": "gpt-5.4", + "reasoning_effort": "medium", + "service_tier": "priority" + }, + "judge": { + "model": "gpt-5.5", + "reasoning_effort": "xhigh", + "service_tier": "priority" + }, + "git_head": "ac41d6448e9a316d5675f67b7d8e42dc9bf4add7", + "git_status_short": "", + "source_file_digests": { + "ref": "ac41d6448e9a316d5675f67b7d8e42dc9bf4add7", + "algorithm": "sha256", + "php_behavior_fingerprint": "token_get_all with T_COMMENT, T_DOC_COMMENT, and T_WHITESPACE removed; JSON-encoded token names and text", + "files": { + "src/wp-includes/html-api/class-wp-html-tag-processor.php": { + "source_sha256": "9e431d345cc6d5563a65e3456b8a944fe1def8ceb8db299e89f20e8e9f560058", + "php_without_comments_sha256": "d073f5772a64826878e4132ed9bb30717fb1b6f3ddb5e23a014a4fa9db218ad7", + "php_without_comments_token_count": 9881 + }, + "src/wp-includes/html-api/class-wp-html-processor.php": { + "source_sha256": "74724f1a228f65ed967dfa42def5ab6e70bfb0e36c0521d1f7649827e95b12ff", + "php_without_comments_sha256": "2da9aa482f295c89a7018c5b74b622b4add7e942599c97785434408cfa72c083", + "php_without_comments_token_count": 16806 + } + } + }, + "corpus_file_digests": { + "ref": "ac41d6448e9a316d5675f67b7d8e42dc9bf4add7", + "algorithm": "sha256", + "tasks": { + "T03-first-h1-text": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T03-first-h1-text/task.md": "66a882aef20a6af4430fb6a7eafc54ba306adcf982980967e07eba7881fc0030", + "doc-experiment/corpus/T03-first-h1-text/reference.php": "a64806dcd5f7f91e254c287aad72401502aec44a7661eb691b51e56e17064f2d", + "doc-experiment/corpus/T03-first-h1-text/tests.json": "499c5da8b3d916a6739142c6ed164e8a6a05c43a6c517003bcb31d6929094533" + } + }, + "T05-text-excerpt": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T05-text-excerpt/task.md": "5e69766456a6343f99ee24037574ed3f9bdf7795ab8be9a61203f6254b8fa2de", + "doc-experiment/corpus/T05-text-excerpt/reference.php": "1081c297e36fa566e8a6a2cf8098fe8d3117da8ac49d4106691b9bb65b3739b6", + "doc-experiment/corpus/T05-text-excerpt/tests.json": "8b2276d754c887d6246f321b2dfad0de59718bc252c0a6bed52b65ae1bad1496" + } + }, + "T06-collect-links": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T06-collect-links/task.md": "29bc31f664b27ca880bf5cdd6a50d0da8f3828cf59d085d2897df2cb987d7d3e", + "doc-experiment/corpus/T06-collect-links/reference.php": "c0e5ae9ec5d4448a525962f1e27250a7243541d613209e3bbdc86913ff7e7a81", + "doc-experiment/corpus/T06-collect-links/tests.json": "34460376a147fa18739e7743d341479489f46a88b58b9f5aa5fe0d901f2b3140" + } + }, + "T08-table-extract": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "traversal", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T08-table-extract/task.md": "d7aee8a4765781a8f7c36b976f914b495b70989817ba0612aa9fc4589bedfeee", + "doc-experiment/corpus/T08-table-extract/reference.php": "289b8c6d1b69d4ef4dd7810dbdcba35445c30a7f26b631a00835ef737b78709e", + "doc-experiment/corpus/T08-table-extract/tests.json": "cdec7559d63ee541f1b0dad34a4f9f0982ce66e58ddfea98ec482c425e1b2638" + } + }, + "N06-extract-toc": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/N06-extract-toc/task.md": "ba341a3927bb81410b8662b29f479f23a0887d4ec6f19db95702c1346d670581", + "doc-experiment/corpus/N06-extract-toc/reference.php": "7e5cf569519dc0411da5659e473fc773d82eb634886b5623610b696936c0b2e2", + "doc-experiment/corpus/N06-extract-toc/tests.json": "22f8950aace5d22a842a8c76df372d5d09e770cf4adefb06099eb81f7410317e" + } + } + } + }, + "created_at_utc": "2026-06-13T15:57:05+00:00", + "isolation": { + "scratch_contains": [ + "html-tag-processor.md", + "html-processor.md", + "tasks/.md" + ], + "subjects_must_not_read": [ + "reference.php", + "tests.json", + "source files", + "logs", + "plans", + "hypothesis docs" + ] + }, + "scratch": "/tmp/html-api-docs-eval/round-44", + "staged_task_files": [ + "tasks/T03-first-h1-text.md", + "tasks/T05-text-excerpt.md", + "tasks/T06-collect-links.md", + "tasks/T08-table-extract.md", + "tasks/N06-extract-toc.md" + ], + "scratch_isolation_check": "OK: /tmp/html-api-docs-eval/round-44 exposes 2 docs and 5 task prompt(s), with no forbidden files.", + "scratch_file_sha256": { + "html-processor.md": "852fa4613b5c99ae9fea547f6284eee27e4f459d7b38a0d4dec5080cc657b123", + "html-tag-processor.md": "8ca4ec32aa54723fe16a0a9f45386ca6ebeecde7a06575d8a782e53fb395a664", + "tasks/N06-extract-toc.md": "ba341a3927bb81410b8662b29f479f23a0887d4ec6f19db95702c1346d670581", + "tasks/T03-first-h1-text.md": "66a882aef20a6af4430fb6a7eafc54ba306adcf982980967e07eba7881fc0030", + "tasks/T05-text-excerpt.md": "5e69766456a6343f99ee24037574ed3f9bdf7795ab8be9a61203f6254b8fa2de", + "tasks/T06-collect-links.md": "29bc31f664b27ca880bf5cdd6a50d0da8f3828cf59d085d2897df2cb987d7d3e", + "tasks/T08-table-extract.md": "d7aee8a4765781a8f7c36b976f914b495b70989817ba0612aa9fc4589bedfeee" + } +} diff --git a/doc-experiment/results/round-44/round-summary.json b/doc-experiment/results/round-44/round-summary.json new file mode 100644 index 0000000000000..8398523c9185d --- /dev/null +++ b/doc-experiment/results/round-44/round-summary.json @@ -0,0 +1,222 @@ +{ + "round_score": 98.94, + "core_score": 98.94, + "by_split": { + "train": 98.94 + }, + "by_concept": { + "text": 99.13, + "traversal": 98.65 + }, + "tasks": { + "T03-first-h1-text": { + "score": 99.1, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 91, + "score": 97.3 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html", + "split": "train" + } + }, + "T05-text-excerpt": { + "score": 98.9, + "trials": [ + { + "trial": "trial-1", + "passed": 10, + "total": 10, + "adherence": 93, + "score": 97.9 + }, + { + "trial": "trial-2", + "passed": 10, + "total": 10, + "adherence": 99, + "score": 99.7 + }, + { + "trial": "trial-3", + "passed": 10, + "total": 10, + "adherence": 97, + "score": 99.1 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html", + "split": "train" + } + }, + "T06-collect-links": { + "score": 99.4, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 99, + "score": 99.7 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 97, + "score": 99.1 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 98, + "score": 99.4 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html", + "split": "train" + } + }, + "T08-table-extract": { + "score": 98.6, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 94, + "score": 98.2 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 98, + "score": 99.4 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 94, + "score": 98.2 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "traversal", + "processor": "html", + "split": "train" + } + }, + "N06-extract-toc": { + "score": 98.7, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 98, + "score": 99.4 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 92, + "score": 97.6 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 97, + "score": 99.1 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html", + "split": "train" + } + } + }, + "round_metadata": { + "round": "round-44", + "mode": "shadow-doc-a/b", + "task_ids": [ + "T03-first-h1-text", + "T05-text-excerpt", + "T06-collect-links", + "T08-table-extract", + "N06-extract-toc" + ], + "task_count": 5, + "trials_per_task": 3, + "subject": { + "model": "gpt-5.4", + "reasoning_effort": "medium", + "service_tier": "priority" + }, + "judge": { + "model": "gpt-5.5", + "reasoning_effort": "xhigh", + "service_tier": "priority" + }, + "git_head": "ac41d6448e9a316d5675f67b7d8e42dc9bf4add7", + "git_status_short": "" + }, + "subject_isolation": { + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-44/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." + } +} diff --git a/doc-experiment/results/round-44/subject-isolation.json b/doc-experiment/results/round-44/subject-isolation.json new file mode 100644 index 0000000000000..877059bed6a0d --- /dev/null +++ b/doc-experiment/results/round-44/subject-isolation.json @@ -0,0 +1,19 @@ +{ + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-44/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." +} diff --git a/doc-experiment/results/round-45/N06-extract-toc/judge.json b/doc-experiment/results/round-45/N06-extract-toc/judge.json new file mode 100644 index 0000000000000..246366cb6750c --- /dev/null +++ b/doc-experiment/results/round-45/N06-extract-toc/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Correctly chose WP_HTML_Processor::create_fragment() for body-fragment, structure-aware traversal. All HTML API calls are documented: create_fragment, next_tag, get_tag, get_current_depth, next_token, get_token_type, get_modifiable_text, and get_last_error. The subtree walk and #text-only get_modifiable_text() use are idiomatic and handle decoded entities, nested inline markup, empty headings, uppercase source tags, and implied heading closes. Minor penalty: the final get_last_error() check discards all accumulated read-only results on unsupported markup; the docs say that is a caller policy, but this task did not specify fail-closed behavior." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Canonical use of the documented API. It chooses WP_HTML_Processor::create_fragment(), scans heading openers with next_tag(), records opener depth, walks each heading subtree with next_token() while depth remains >= the opener depth, and reads only #text tokens through get_modifiable_text(). No undocumented methods or _doing_it_wrong records. Edge cases in the frozen expectations are handled cleanly." + }, + { + "trial_id": "trial-3", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Correctly uses WP_HTML_Processor::create_fragment() and a single next_token() state machine, matching the documented repeated-region pattern. All HTML API methods used are documented, including is_tag_closer(), get_token_type(), get_tag(), and get_modifiable_text(). It handles virtual/implied closers, empty headings, decoded text, and case normalization. Minor penalty: it relies on closer-driven flushing and an end-of-scan fallback without checking get_last_error()/paused_at_incomplete_token(), so unsupported or truncated scans could produce partial output without an explicit policy." + } + ], + "failure_analysis": "No hidden case failed in any trial: all three passed 7/7 frozen expectations with no _doing_it_wrong records. The rendered docs appear to have done the important work well. The 'Supported elements' and processor-choice language clearly pushed subjects to WP_HTML_Processor rather than the lexical Tag Processor. The 'collect DOM-style text from a subtree' recipe and get_modifiable_text() docs prevented the common mistake of appending tags, comments, or raw special-element content, and made entity decoding clear. The get_current_depth() section's explicit >= guidance maps directly to headings with nested inline markup, while the next_token() section's promise of implicit/end-of-input closing tokens explains the implied-heading-close case. Near-misses: trial-1 over-applied get_last_error() as a global fail-closed policy for read-only extraction, and trial-3 relied on get_tag()/is_tag_closer() behavior on virtual closers that is demonstrated indirectly but could be stated more directly in method docs.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_last_error() docblock", + "problem": "The method doc explains how to distinguish no match from parser abort, but does not locally repeat the read-only extraction policy: already-visited tokens remain usable, and discarding accumulated results is only appropriate when the caller requires complete-source semantics.", + "suggestion": "Add a short note that for read-only extraction, get_last_error() reports scan incompleteness/unsupported markup but does not invalidate data already visited; callers should explicitly choose partial result, empty result, or sentinel behavior." + }, + { + "location": "WP_HTML_Processor::is_tag_closer(), get_tag(), and get_token_name() docblocks", + "problem": "Closer-driven state machines depend on what name APIs return while matched on a real or virtual closing token. The docs say depth/breadcrumbs report the parent after a closer, but the tag-name contract for closers is less explicit.", + "suggestion": "State that on tag closer tokens, including virtual closers emitted by the HTML Processor, get_tag()/get_token_name() return the closed element's name even though depth and breadcrumbs have already moved to the parent context." + }, + { + "location": "WP_HTML_Processor::next_token() recipes", + "problem": "The docs separately show first-subtree text extraction and a repeated-region state machine. Subjects inferred the combination correctly, but the safe choices for repeated subtree text extraction remain spread across sections.", + "suggestion": "Add a general non-task-specific recipe for collecting text for every matching element in document order, clarifying when a next_tag()+bounded next_token() loop is acceptable and when a single next_token() state machine is preferable." + } + ] +} diff --git a/doc-experiment/results/round-45/N06-extract-toc/trial-1/candidate.php b/doc-experiment/results/round-45/N06-extract-toc/trial-1/candidate.php new file mode 100644 index 0000000000000..f51f6900b1fe6 --- /dev/null +++ b/doc-experiment/results/round-45/N06-extract-toc/trial-1/candidate.php @@ -0,0 +1,47 @@ + 1, + 'H2' => 2, + 'H3' => 3, + 'H4' => 4, + 'H5' => 5, + 'H6' => 6, + ); + $toc = array(); + + while ( $processor->next_tag() ) { + $tag = $processor->get_tag(); + + if ( null === $tag || ! isset( $levels[ $tag ] ) ) { + continue; + } + + $heading_depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $heading_depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + $toc[] = array( + 'level' => $levels[ $tag ], + 'text' => $text, + ); + } + + if ( null !== $processor->get_last_error() ) { + return array(); + } + + return $toc; +} diff --git a/doc-experiment/results/round-45/N06-extract-toc/trial-1/execution.json b/doc-experiment/results/round-45/N06-extract-toc/trial-1/execution.json new file mode 100644 index 0000000000000..0246ea474f2a5 --- /dev/null +++ b/doc-experiment/results/round-45/N06-extract-toc/trial-1/execution.json @@ -0,0 +1,203 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-45/N06-extract-toc/trial-1/candidate.php", + "function": "extract_toc", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "basic-h1-h3", + "status": "pass", + "expected": [ + { + "level": 1, + "text": "Intro" + }, + { + "level": 3, + "text": "Details here" + } + ], + "actual": [ + { + "level": 1, + "text": "Intro" + }, + { + "level": 3, + "text": "Details here" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "all-heading-levels", + "status": "pass", + "expected": [ + { + "level": 1, + "text": "Title" + }, + { + "level": 2, + "text": "Section" + }, + { + "level": 3, + "text": "Subsection" + }, + { + "level": 4, + "text": "Minor" + }, + { + "level": 5, + "text": "Small" + }, + { + "level": 6, + "text": "Tiny" + } + ], + "actual": [ + { + "level": 1, + "text": "Title" + }, + { + "level": 2, + "text": "Section" + }, + { + "level": 3, + "text": "Subsection" + }, + { + "level": 4, + "text": "Minor" + }, + { + "level": 5, + "text": "Small" + }, + { + "level": 6, + "text": "Tiny" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-text-and-entities", + "status": "pass", + "expected": [ + { + "level": 2, + "text": "A B & C" + } + ], + "actual": [ + { + "level": 2, + "text": "A B & C" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-heading", + "status": "pass", + "expected": [ + { + "level": 2, + "text": "" + }, + { + "level": 3, + "text": "" + } + ], + "actual": [ + { + "level": 2, + "text": "" + }, + { + "level": 3, + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-insensitive-source", + "status": "pass", + "expected": [ + { + "level": 1, + "text": "Upper" + }, + { + "level": 6, + "text": "Lower" + } + ], + "actual": [ + { + "level": 1, + "text": "Upper" + }, + { + "level": 6, + "text": "Lower" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "implied-heading-close", + "status": "pass", + "expected": [ + { + "level": 2, + "text": "One" + }, + { + "level": 3, + "text": "Two" + } + ], + "actual": [ + { + "level": 2, + "text": "One" + }, + { + "level": 3, + "text": "Two" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-matches", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-45/N06-extract-toc/trial-1/response.json b/doc-experiment/results/round-45/N06-extract-toc/trial-1/response.json new file mode 100644 index 0000000000000..ac4cd9b2aad35 --- /dev/null +++ b/doc-experiment/results/round-45/N06-extract-toc/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because heading text extraction is a tree-aware operation. It scans forward with `next_tag()`, filters `H1` through `H6` using `get_tag()`, records the heading opener depth with `get_current_depth()`, then walks that heading\u2019s subtree with `next_token()` and concatenates only `#text` token content via `get_token_type()` and `get_modifiable_text()`, which excludes markup tags while preserving heading text in document order.", + "confidence": 89 +} diff --git a/doc-experiment/results/round-45/N06-extract-toc/trial-2/candidate.php b/doc-experiment/results/round-45/N06-extract-toc/trial-2/candidate.php new file mode 100644 index 0000000000000..e0d8c60550dec --- /dev/null +++ b/doc-experiment/results/round-45/N06-extract-toc/trial-2/candidate.php @@ -0,0 +1,33 @@ +next_tag() ) { + $tag = $processor->get_tag(); + if ( ! in_array( $tag, array( 'H1', 'H2', 'H3', 'H4', 'H5', 'H6' ), true ) ) { + continue; + } + + $level = (int) substr( $tag, 1 ); + $depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + $toc[] = array( + 'level' => $level, + 'text' => $text, + ); + } + + return $toc; +} diff --git a/doc-experiment/results/round-45/N06-extract-toc/trial-2/execution.json b/doc-experiment/results/round-45/N06-extract-toc/trial-2/execution.json new file mode 100644 index 0000000000000..5366e48d31a57 --- /dev/null +++ b/doc-experiment/results/round-45/N06-extract-toc/trial-2/execution.json @@ -0,0 +1,203 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-45/N06-extract-toc/trial-2/candidate.php", + "function": "extract_toc", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "basic-h1-h3", + "status": "pass", + "expected": [ + { + "level": 1, + "text": "Intro" + }, + { + "level": 3, + "text": "Details here" + } + ], + "actual": [ + { + "level": 1, + "text": "Intro" + }, + { + "level": 3, + "text": "Details here" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "all-heading-levels", + "status": "pass", + "expected": [ + { + "level": 1, + "text": "Title" + }, + { + "level": 2, + "text": "Section" + }, + { + "level": 3, + "text": "Subsection" + }, + { + "level": 4, + "text": "Minor" + }, + { + "level": 5, + "text": "Small" + }, + { + "level": 6, + "text": "Tiny" + } + ], + "actual": [ + { + "level": 1, + "text": "Title" + }, + { + "level": 2, + "text": "Section" + }, + { + "level": 3, + "text": "Subsection" + }, + { + "level": 4, + "text": "Minor" + }, + { + "level": 5, + "text": "Small" + }, + { + "level": 6, + "text": "Tiny" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-text-and-entities", + "status": "pass", + "expected": [ + { + "level": 2, + "text": "A B & C" + } + ], + "actual": [ + { + "level": 2, + "text": "A B & C" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-heading", + "status": "pass", + "expected": [ + { + "level": 2, + "text": "" + }, + { + "level": 3, + "text": "" + } + ], + "actual": [ + { + "level": 2, + "text": "" + }, + { + "level": 3, + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-insensitive-source", + "status": "pass", + "expected": [ + { + "level": 1, + "text": "Upper" + }, + { + "level": 6, + "text": "Lower" + } + ], + "actual": [ + { + "level": 1, + "text": "Upper" + }, + { + "level": 6, + "text": "Lower" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "implied-heading-close", + "status": "pass", + "expected": [ + { + "level": 2, + "text": "One" + }, + { + "level": 3, + "text": "Two" + } + ], + "actual": [ + { + "level": 2, + "text": "One" + }, + { + "level": 3, + "text": "Two" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-matches", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-45/N06-extract-toc/trial-2/response.json b/doc-experiment/results/round-45/N06-extract-toc/trial-2/response.json new file mode 100644 index 0000000000000..227209e57fe2d --- /dev/null +++ b/doc-experiment/results/round-45/N06-extract-toc/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because heading text extraction is tree-aware. It scans heading openers with `next_tag()`, records each heading level from `get_tag()`, then walks that heading\u2019s subtree with `next_token()` and a `get_current_depth()` boundary, appending only ordinary `#text` tokens via `get_token_type()` and `get_modifiable_text()` so nested markup contributes text but not tags.", + "confidence": 92 +} diff --git a/doc-experiment/results/round-45/N06-extract-toc/trial-3/candidate.php b/doc-experiment/results/round-45/N06-extract-toc/trial-3/candidate.php new file mode 100644 index 0000000000000..c07c3ec28fa34 --- /dev/null +++ b/doc-experiment/results/round-45/N06-extract-toc/trial-3/candidate.php @@ -0,0 +1,44 @@ +next_token() ) { + if ( '#tag' === $processor->get_token_type() ) { + $tag = $processor->get_tag(); + + if ( ! $processor->is_tag_closer() ) { + if ( null !== $tag && preg_match( '/^H([1-6])$/', $tag, $matches ) ) { + $current_heading = array( + 'level' => (int) $matches[1], + 'text' => '', + ); + $current_tag = $tag; + } + } elseif ( null !== $current_heading && $tag === $current_tag ) { + $toc[] = $current_heading; + $current_heading = null; + $current_tag = null; + } + + continue; + } + + if ( null !== $current_heading && '#text' === $processor->get_token_type() ) { + $current_heading['text'] .= $processor->get_modifiable_text(); + } + } + + if ( null !== $current_heading ) { + $toc[] = $current_heading; + } + + return $toc; +} diff --git a/doc-experiment/results/round-45/N06-extract-toc/trial-3/execution.json b/doc-experiment/results/round-45/N06-extract-toc/trial-3/execution.json new file mode 100644 index 0000000000000..0004271b4b5c4 --- /dev/null +++ b/doc-experiment/results/round-45/N06-extract-toc/trial-3/execution.json @@ -0,0 +1,203 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-45/N06-extract-toc/trial-3/candidate.php", + "function": "extract_toc", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "basic-h1-h3", + "status": "pass", + "expected": [ + { + "level": 1, + "text": "Intro" + }, + { + "level": 3, + "text": "Details here" + } + ], + "actual": [ + { + "level": 1, + "text": "Intro" + }, + { + "level": 3, + "text": "Details here" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "all-heading-levels", + "status": "pass", + "expected": [ + { + "level": 1, + "text": "Title" + }, + { + "level": 2, + "text": "Section" + }, + { + "level": 3, + "text": "Subsection" + }, + { + "level": 4, + "text": "Minor" + }, + { + "level": 5, + "text": "Small" + }, + { + "level": 6, + "text": "Tiny" + } + ], + "actual": [ + { + "level": 1, + "text": "Title" + }, + { + "level": 2, + "text": "Section" + }, + { + "level": 3, + "text": "Subsection" + }, + { + "level": 4, + "text": "Minor" + }, + { + "level": 5, + "text": "Small" + }, + { + "level": 6, + "text": "Tiny" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-text-and-entities", + "status": "pass", + "expected": [ + { + "level": 2, + "text": "A B & C" + } + ], + "actual": [ + { + "level": 2, + "text": "A B & C" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-heading", + "status": "pass", + "expected": [ + { + "level": 2, + "text": "" + }, + { + "level": 3, + "text": "" + } + ], + "actual": [ + { + "level": 2, + "text": "" + }, + { + "level": 3, + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-insensitive-source", + "status": "pass", + "expected": [ + { + "level": 1, + "text": "Upper" + }, + { + "level": 6, + "text": "Lower" + } + ], + "actual": [ + { + "level": 1, + "text": "Upper" + }, + { + "level": 6, + "text": "Lower" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "implied-heading-close", + "status": "pass", + "expected": [ + { + "level": 2, + "text": "One" + }, + { + "level": 3, + "text": "Two" + } + ], + "actual": [ + { + "level": 2, + "text": "One" + }, + { + "level": 3, + "text": "Two" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-matches", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-45/N06-extract-toc/trial-3/response.json b/doc-experiment/results/round-45/N06-extract-toc/trial-3/response.json new file mode 100644 index 0000000000000..847d753c32b48 --- /dev/null +++ b/doc-experiment/results/round-45/N06-extract-toc/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, then walks tokens with `next_token()`. On each heading opener `H1` through `H6`, it starts a current entry; while inside that heading it appends only `#text` token contents via `get_modifiable_text()`, and when the matching heading closer is reached with `is_tag_closer()` and `get_tag()`, it stores the completed TOC item.", + "confidence": 90 +} diff --git a/doc-experiment/results/round-45/T03-first-h1-text/judge.json b/doc-experiment/results/round-45/T03-first-h1-text/judge.json new file mode 100644 index 0000000000000..12ea6203142ca --- /dev/null +++ b/doc-experiment/results/round-45/T03-first-h1-text/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the correct WP_HTML_Processor fragment parser, guarded null creation and missing H1, found the first H1 with next_tag(), then used the documented depth-bounded next_token() subtree walk. It read only #text tokens and used get_modifiable_text(), which the docs state returns decoded text for text nodes. Execution passed 8/8 with no _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same documented, idiomatic implementation as the reference: create_fragment(), next_tag('H1'), get_current_depth(), next_token(), get_token_type(), and get_modifiable_text(). No undocumented methods or misuse. Handles nested markup, decoded entities, no-H1 null, image-only empty string, and unclosed H1 through the HTML Processor’s tree-aware walk." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correct processor and method set, all present in the rendered docs. The explanation explicitly cites tree-aware extraction and decoded character references. The implementation follows the HTML Processor subtree text recipe and passed all cases without warnings." + } + ], + "failure_analysis": "No hidden case failed across the trials. The docs did well because the relevant guidance was direct and task-shaped without embedding this exact solution: Tag Processor / 'Which processor should I use?' says collecting element text and walking a subtree require WP_HTML_Processor; HTML Processor / 'Recipe: collect DOM-style text from a subtree' shows the depth-bounded next_token() pattern and the #text-only filter; HTML Processor / get_current_depth explains why the guard must be >=, including child closers and malformed or unclosed input; get_modifiable_text explains decoded text for #text nodes and warns not to use it as a predicate for ordinary text. The main near-miss is that create_fragment() returning null is visible in the signature and examples, but the text-extraction recipe itself does not include the null guard, so a less careful subject could omit it.", + "doc_gaps": [ + { + "location": "html-processor.md / Recipe: collect DOM-style text from a subtree", + "problem": "The recipe starts with create_fragment() and immediately calls next_tag(), while create_fragment() can return null.", + "suggestion": "Add the null guard to the text-extraction recipe, or state directly that callers should handle a null factory result before token walking." + }, + { + "location": "html-processor.md / create_fragment()", + "problem": "The nullable return is clear in the signature but easy to miss in prose.", + "suggestion": "Add a short Returns note explaining when null can occur and that callers should branch before invoking processor methods." + }, + { + "location": "html-processor.md / Overview future-direction bullets", + "problem": "The bullet saying inner-content reading is a future capability can look inconsistent with the current documented ability to collect DOM-style text by walking tokens.", + "suggestion": "Clarify that direct innerHTML/textContent-style convenience APIs are future work, while read-only text extraction is currently supported through token walking." + } + ] +} diff --git a/doc-experiment/results/round-45/T03-first-h1-text/trial-1/candidate.php b/doc-experiment/results/round-45/T03-first-h1-text/trial-1/candidate.php new file mode 100644 index 0000000000000..ea896dbaa6c56 --- /dev/null +++ b/doc-experiment/results/round-45/T03-first-h1-text/trial-1/candidate.php @@ -0,0 +1,23 @@ +next_tag( 'H1' ) ) { + return null; + } + + $depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-45/T03-first-h1-text/trial-1/execution.json b/doc-experiment/results/round-45/T03-first-h1-text/trial-1/execution.json new file mode 100644 index 0000000000000..87c14971fa311 --- /dev/null +++ b/doc-experiment/results/round-45/T03-first-h1-text/trial-1/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-45/T03-first-h1-text/trial-1/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-45/T03-first-h1-text/trial-1/response.json b/doc-experiment/results/round-45/T03-first-h1-text/trial-1/response.json new file mode 100644 index 0000000000000..34f5a162e1da1 --- /dev/null +++ b/doc-experiment/results/round-45/T03-first-h1-text/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware task, then finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, and walks its subtree with `next_token()`, concatenating only `#text` tokens via `get_token_type()` and `get_modifiable_text()`. That yields decoded text content from the heading and its nested elements while ignoring markup itself.", + "confidence": 94 +} diff --git a/doc-experiment/results/round-45/T03-first-h1-text/trial-2/candidate.php b/doc-experiment/results/round-45/T03-first-h1-text/trial-2/candidate.php new file mode 100644 index 0000000000000..9853515781c4d --- /dev/null +++ b/doc-experiment/results/round-45/T03-first-h1-text/trial-2/candidate.php @@ -0,0 +1,23 @@ +next_tag( 'H1' ) ) { + return null; + } + + $h1_depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-45/T03-first-h1-text/trial-2/execution.json b/doc-experiment/results/round-45/T03-first-h1-text/trial-2/execution.json new file mode 100644 index 0000000000000..8236b31a9e465 --- /dev/null +++ b/doc-experiment/results/round-45/T03-first-h1-text/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-45/T03-first-h1-text/trial-2/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-45/T03-first-h1-text/trial-2/response.json b/doc-experiment/results/round-45/T03-first-h1-text/trial-2/response.json new file mode 100644 index 0000000000000..69569b1ee7ef6 --- /dev/null +++ b/doc-experiment/results/round-45/T03-first-h1-text/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation, then finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, and walks that subtree with `next_token()`, concatenating only `#text` tokens via `get_token_type()` and `get_modifiable_text()`, which already returns decoded text.", + "confidence": 94 +} diff --git a/doc-experiment/results/round-45/T03-first-h1-text/trial-3/candidate.php b/doc-experiment/results/round-45/T03-first-h1-text/trial-3/candidate.php new file mode 100644 index 0000000000000..9853515781c4d --- /dev/null +++ b/doc-experiment/results/round-45/T03-first-h1-text/trial-3/candidate.php @@ -0,0 +1,23 @@ +next_tag( 'H1' ) ) { + return null; + } + + $h1_depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-45/T03-first-h1-text/trial-3/execution.json b/doc-experiment/results/round-45/T03-first-h1-text/trial-3/execution.json new file mode 100644 index 0000000000000..0837d6d893e55 --- /dev/null +++ b/doc-experiment/results/round-45/T03-first-h1-text/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-45/T03-first-h1-text/trial-3/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-45/T03-first-h1-text/trial-3/response.json b/doc-experiment/results/round-45/T03-first-h1-text/trial-3/response.json new file mode 100644 index 0000000000000..dab4a453420fe --- /dev/null +++ b/doc-experiment/results/round-45/T03-first-h1-text/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction depends on document structure, then finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, and walks its subtree with `next_token()`, concatenating only `#text` tokens via `get_token_type()` and `get_modifiable_text()`. That follows the documented HTML Processor recipe for DOM-style subtree text and preserves decoded character references.", + "confidence": 94 +} diff --git a/doc-experiment/results/round-45/T05-text-excerpt/judge.json b/doc-experiment/results/round-45/T05-text-excerpt/judge.json new file mode 100644 index 0000000000000..e67acaa52e029 --- /dev/null +++ b/doc-experiment/results/round-45/T05-text-excerpt/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 99, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), walked with next_token(), filtered ordinary #text tokens, explicitly opted in TITLE/TEXTAREA opener-carried text, and used get_modifiable_text() only after token checks. All called methods are present in the rendered docs and execution recorded no _doing_it_wrong notices. Minor inefficiency: it accumulates all text before truncating instead of stopping once enough code points are collected." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Fully aligned with the documented text-extraction pattern: HTML Processor fragment parsing, single token walk, #text filtering, TITLE/TEXTAREA opt-in via opening tags, decoded text via get_modifiable_text(), and UTF-8 mb_* truncation. No undocumented API calls or misuse notices." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Fully aligned with the docs: correct processor, documented methods only, guarded use of get_modifiable_text(), explicit exclusion of SCRIPT/STYLE by whitelist, and Unicode-safe truncation. No _doing_it_wrong records." + } + ], + "failure_analysis": "All three trials passed all 10 hidden cases, so there were no failed cases to attribute to a misconception. The docs appear to have done well in the key places: the HTML Processor overview says to choose WP_HTML_Processor for document structure and text collection; the next_token() section states that element text may be split across multiple #text tokens and that TITLE/TEXTAREA/SCRIPT/STYLE carry text on opener tokens instead of child #text nodes; the get_modifiable_text() section warns that it is not a predicate for ordinary text and explains decoded #text/TITLE/TEXTAREA versus raw SCRIPT/STYLE. The candidates’ explanations closely mirrored those passages. Near-misses were limited to robustness and performance: trial-1 did not stop after reaching the limit, and none checked incomplete-token/error state, but the task and frozen cases did not require rejecting partial parses.", + "doc_gaps": [ + { + "location": "html-processor.md / next_token() and get_modifiable_text()", + "problem": "The correct text-extraction rules are documented, but spread across narrative sections. A reader has to combine token walking, ordinary #text filtering, special-element opener text, and decoded/raw semantics.", + "suggestion": "Add a compact reference table in the get_modifiable_text() docblock listing token category, whether it represents DOM-style text content, whether character references are decoded, and whether callers should opt in explicitly." + }, + { + "location": "html-processor.md / next_token()", + "problem": "The docs mention incomplete input handling for complete-source callers, but the text-extraction examples do not show a policy decision for paused_at_incomplete_token() or get_last_error().", + "suggestion": "Add a general note to text-walk examples: after a read-only walk, decide whether partial text is acceptable; if not, check paused_at_incomplete_token() and get_last_error() before returning accumulated text." + }, + { + "location": "html-processor.md / get_modifiable_text()", + "problem": "The docs recommend mb_strlen()/mb_substr() with UTF-8, but do not explicitly distinguish Unicode code points from grapheme clusters. This can matter for emoji plus variation selectors or combining marks.", + "suggestion": "Add a short note that PHP mb_* string length/substr with UTF-8 counts code points, not user-perceived grapheme clusters, and direct callers to grapheme_* when a UI-facing character limit requires grapheme clusters." + } + ] +} diff --git a/doc-experiment/results/round-45/T05-text-excerpt/trial-1/candidate.php b/doc-experiment/results/round-45/T05-text-excerpt/trial-1/candidate.php new file mode 100644 index 0000000000000..e0143302eff99 --- /dev/null +++ b/doc-experiment/results/round-45/T05-text-excerpt/trial-1/candidate.php @@ -0,0 +1,35 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + continue; + } + + if ( '#tag' !== $processor->get_token_type() || $processor->is_tag_closer() ) { + continue; + } + + $tag_name = $processor->get_token_name(); + if ( 'TITLE' === $tag_name || 'TEXTAREA' === $tag_name ) { + $text .= $processor->get_modifiable_text(); + } + } + + if ( mb_strlen( $text, 'UTF-8' ) <= $max_codepoints ) { + return $text; + } + + return mb_substr( $text, 0, $max_codepoints, 'UTF-8' ); +} diff --git a/doc-experiment/results/round-45/T05-text-excerpt/trial-1/execution.json b/doc-experiment/results/round-45/T05-text-excerpt/trial-1/execution.json new file mode 100644 index 0000000000000..7bac8c3ee3d05 --- /dev/null +++ b/doc-experiment/results/round-45/T05-text-excerpt/trial-1/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-45/T05-text-excerpt/trial-1/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-45/T05-text-excerpt/trial-1/response.json b/doc-experiment/results/round-45/T05-text-excerpt/trial-1/response.json new file mode 100644 index 0000000000000..313f2fe3709c3 --- /dev/null +++ b/doc-experiment/results/round-45/T05-text-excerpt/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, then walks all tokens with `next_token()`, appending ordinary decoded `#text` nodes via `get_token_type()` and `get_modifiable_text()`, plus opener-carried decoded text for `TITLE` and `TEXTAREA` via `get_token_name()` and `is_tag_closer()`. It excludes `SCRIPT` and `STYLE` by only whitelisting the special text-bearing elements the task requires, and truncates with `mb_strlen()`/`mb_substr()` in UTF-8 so multi-byte characters are not split.", + "confidence": 90 +} diff --git a/doc-experiment/results/round-45/T05-text-excerpt/trial-2/candidate.php b/doc-experiment/results/round-45/T05-text-excerpt/trial-2/candidate.php new file mode 100644 index 0000000000000..b498a049952d3 --- /dev/null +++ b/doc-experiment/results/round-45/T05-text-excerpt/trial-2/candidate.php @@ -0,0 +1,48 @@ +next_token() ) { + $chunk = ''; + + if ( '#text' === $processor->get_token_type() ) { + $chunk = $processor->get_modifiable_text(); + } elseif ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) { + $token_name = $processor->get_token_name(); + if ( 'TITLE' === $token_name || 'TEXTAREA' === $token_name ) { + $chunk = $processor->get_modifiable_text(); + } + } + + if ( '' === $chunk ) { + continue; + } + + $chunk_length = mb_strlen( $chunk, 'UTF-8' ); + if ( $chunk_length <= $remaining ) { + $excerpt .= $chunk; + $remaining -= $chunk_length; + + if ( 0 === $remaining ) { + break; + } + + continue; + } + + $excerpt .= mb_substr( $chunk, 0, $remaining, 'UTF-8' ); + break; + } + + return $excerpt; +} diff --git a/doc-experiment/results/round-45/T05-text-excerpt/trial-2/execution.json b/doc-experiment/results/round-45/T05-text-excerpt/trial-2/execution.json new file mode 100644 index 0000000000000..599a0ca2748f9 --- /dev/null +++ b/doc-experiment/results/round-45/T05-text-excerpt/trial-2/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-45/T05-text-excerpt/trial-2/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-45/T05-text-excerpt/trial-2/response.json b/doc-experiment/results/round-45/T05-text-excerpt/trial-2/response.json new file mode 100644 index 0000000000000..6975597f328d5 --- /dev/null +++ b/doc-experiment/results/round-45/T05-text-excerpt/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, then walks tokens with `next_token()`, appending only ordinary `#text` tokens plus opener-carried text from `TITLE` and `TEXTAREA` via `get_modifiable_text()`. That follows the documented text-extraction rules, automatically gets decoded character references where HTML defines them, excludes `SCRIPT` and `STYLE`, and truncates safely by Unicode code points with `mb_strlen()` and `mb_substr()` in UTF-8.", + "confidence": 92 +} diff --git a/doc-experiment/results/round-45/T05-text-excerpt/trial-3/candidate.php b/doc-experiment/results/round-45/T05-text-excerpt/trial-3/candidate.php new file mode 100644 index 0000000000000..b0a24f65d6297 --- /dev/null +++ b/doc-experiment/results/round-45/T05-text-excerpt/trial-3/candidate.php @@ -0,0 +1,33 @@ +next_token() ) { + $token_type = $processor->get_token_type(); + + if ( '#text' === $token_type ) { + $text .= $processor->get_modifiable_text(); + } elseif ( + '#tag' === $token_type && + ! $processor->is_tag_closer() && + in_array( $processor->get_token_name(), array( 'TEXTAREA', 'TITLE' ), true ) + ) { + $text .= $processor->get_modifiable_text(); + } + + if ( mb_strlen( $text, 'UTF-8' ) > $max_codepoints ) { + return mb_substr( $text, 0, $max_codepoints, 'UTF-8' ); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-45/T05-text-excerpt/trial-3/execution.json b/doc-experiment/results/round-45/T05-text-excerpt/trial-3/execution.json new file mode 100644 index 0000000000000..554a78f47c109 --- /dev/null +++ b/doc-experiment/results/round-45/T05-text-excerpt/trial-3/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-45/T05-text-excerpt/trial-3/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-45/T05-text-excerpt/trial-3/response.json b/doc-experiment/results/round-45/T05-text-excerpt/trial-3/response.json new file mode 100644 index 0000000000000..58a69a28c0748 --- /dev/null +++ b/doc-experiment/results/round-45/T05-text-excerpt/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` and a full `next_token()` walk so extraction follows the parsed HTML fragment structure. It appends only ordinary `#text` tokens plus the documented opener-carried text for `TEXTAREA` and `TITLE` via `get_modifiable_text()`, skips `SCRIPT` and `STYLE`, and truncates by Unicode code points with `mb_strlen()`/`mb_substr()` using explicit `UTF-8` encoding as the docs require.", + "confidence": 94 +} diff --git a/doc-experiment/results/round-45/T06-collect-links/judge.json b/doc-experiment/results/round-45/T06-collect-links/judge.json new file mode 100644 index 0000000000000..3611aebf06045 --- /dev/null +++ b/doc-experiment/results/round-45/T06-collect-links/judge.json @@ -0,0 +1,45 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Passed 8/8. Correct processor choice with WP_HTML_Processor::create_fragment(). All API calls are documented. Uses a solid one-pass next_token() state machine, get_attribute() with is_string() for href, and #text plus get_modifiable_text() for decoded link text. Minor reservation: it manually tracks anchor scope instead of using the depth/breadcrumb subtree recipe, but this is still consistent with documented closer-driven token walking." + }, + { + "trial_id": "trial-2", + "adherence": 95, + "hallucinated_methods": [], + "notes": "Passed 8/8. Correct processor choice and no undocumented API usage. This is closest to the reference: next_tag('A'), depth-bounded next_token() walk, #text filtering, get_modifiable_text(), and string-only href handling. Main penalty: it returns an empty array whenever paused_at_incomplete_token() is true after the scan, which over-applies a complete-input policy to a read-only extraction. A probe with a valid link followed by an incomplete trailing tag returns [] here while the reference returns the collected link." + }, + { + "trial_id": "trial-3", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Passed 8/8. Correctly uses WP_HTML_Processor::create_fragment() and only documented methods. The #tag guard, get_tag(), is_tag_closer(), get_attribute(), #text filtering, and get_modifiable_text() are all appropriate. Minor reservation: it appends text to every active link in a manual stack, which is a less precise mental model than using the processor's parsed subtree boundary or current-region state; it works for these cases because the HTML Processor emits structural/virtual closers." + } + ], + "failure_analysis": "No hidden case failed across the three trials: simple, no-href-excluded, entity-in-href-decoded, valueless-href, image-link-empty-text, entities-in-text, no-links, and unclosed-link all passed in every execution.json. The docs did well on the important concepts: WP_HTML_Processor::create_fragment() is clearly recommended for BODY fragments and structural text extraction; the DOM-style text recipe shows next_tag()/next_token(), get_current_depth(), #text filtering, and get_modifiable_text(); get_attribute() documents string|true|null and decoded attribute values; get_modifiable_text() documents decoded #text values; next_token() documents virtual/end-of-input closers, which explains why the unclosed-link case works. The main near-miss was trial-2's global fail-closed policy for paused_at_incomplete_token(): the docs say read-only extraction policy is caller-defined and visited tokens remain usable, but the examples still make it easy to treat truncation as a reason to erase all accumulated data.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_current_depth() and the read-only text extraction recipe", + "problem": "The docs state that paused_at_incomplete_token() is a caller policy for read-only extraction, but there is no compact example showing a successful extraction before a later incomplete trailing token. Trial-2 therefore treated any incomplete trailing syntax as a reason to return an empty result.", + "suggestion": "Add a short read-only extraction example where tokens are collected before a trailing incomplete token, and explicitly say that preserving accumulated data is valid when the function contract is best-effort or fragment-oriented; reject only when the contract requires complete source bytes." + }, + { + "location": "WP_HTML_Tag_Processor::get_attribute() and WP_HTML_Processor::get_attribute() return docs", + "problem": "The return description says boolean attributes return true, but the practical contract is broader: an attribute present without a syntactic value returns true even when the attribute name is not a known boolean attribute, such as href.", + "suggestion": "Define true as 'attribute present with no value in source', null as absent/unavailable, and '' as an explicitly empty value. Include one non-boolean valueless example alongside the boolean-style example." + }, + { + "location": "WP_HTML_Processor::next_token() repeated-region guidance", + "problem": "The docs contain a first-element subtree example and a DT state-machine example, but not a concise general recipe for collecting many repeated element subtrees in document order. Candidates split between depth-bounded nested walks and manual active stacks.", + "suggestion": "Add a general repeated-region extraction recipe: detect an opener, initialize current state, append only #text tokens while inside, and finalize on the processor-reported closer, noting that virtual closers cover implied and end-of-input closes." + }, + { + "location": "WP_HTML_Processor::get_tag() docblock", + "problem": "The docs say get_tag() returns null if no tag is matched, but do not directly spell out behavior on non-tag tokens during next_token() scans. This encourages unguarded get_tag() calls in token loops.", + "suggestion": "Add a note that text, comment, doctype, and other non-tag tokens return null from get_tag(); for tag-only logic, either use next_tag() or guard next_token() code with get_token_type() === '#tag'." + } + ] +} diff --git a/doc-experiment/results/round-45/T06-collect-links/trial-1/candidate.php b/doc-experiment/results/round-45/T06-collect-links/trial-1/candidate.php new file mode 100644 index 0000000000000..cce44f1c574ec --- /dev/null +++ b/doc-experiment/results/round-45/T06-collect-links/trial-1/candidate.php @@ -0,0 +1,45 @@ +next_token() ) { + if ( 'A' === $processor->get_tag() ) { + if ( $processor->is_tag_closer() ) { + array_pop( $active_links ); + continue; + } + + $href = $processor->get_attribute( 'href' ); + + if ( is_string( $href ) ) { + $links[] = array( + 'href' => $href, + 'text' => '', + ); + $active_links[] = count( $links ) - 1; + } else { + $active_links[] = null; + } + + continue; + } + + if ( '#text' === $processor->get_token_type() && ! empty( $active_links ) ) { + $current_link = $active_links[ count( $active_links ) - 1 ]; + + if ( null !== $current_link ) { + $links[ $current_link ]['text'] .= $processor->get_modifiable_text(); + } + } + } + + return $links; +} diff --git a/doc-experiment/results/round-45/T06-collect-links/trial-1/execution.json b/doc-experiment/results/round-45/T06-collect-links/trial-1/execution.json new file mode 100644 index 0000000000000..1862b8e40b62a --- /dev/null +++ b/doc-experiment/results/round-45/T06-collect-links/trial-1/execution.json @@ -0,0 +1,148 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-45/T06-collect-links/trial-1/candidate.php", + "function": "collect_links", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "actual": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-excluded", + "status": "pass", + "expected": [ + { + "href": "/only", + "text": "real" + } + ], + "actual": [ + { + "href": "/only", + "text": "real" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-in-href-decoded", + "status": "pass", + "expected": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "actual": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-link-empty-text", + "status": "pass", + "expected": [ + { + "href": "/img", + "text": "" + } + ], + "actual": [ + { + "href": "/img", + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-text", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "actual": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-links", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-link", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "actual": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-45/T06-collect-links/trial-1/response.json b/doc-experiment/results/round-45/T06-collect-links/trial-1/response.json new file mode 100644 index 0000000000000..ef1542af3e680 --- /dev/null +++ b/doc-experiment/results/round-45/T06-collect-links/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on HTML structure and subtree text collection. It walks the fragment once with `next_token()`, starts a link record on each opening `A` whose `get_attribute( 'href' )` returns a string, appends decoded text only from `#text` tokens via `get_modifiable_text()`, and closes link scopes when `is_tag_closer()` reports the matching `A` end token.", + "confidence": 89 +} diff --git a/doc-experiment/results/round-45/T06-collect-links/trial-2/candidate.php b/doc-experiment/results/round-45/T06-collect-links/trial-2/candidate.php new file mode 100644 index 0000000000000..1766ef0b3cdfc --- /dev/null +++ b/doc-experiment/results/round-45/T06-collect-links/trial-2/candidate.php @@ -0,0 +1,41 @@ +next_tag( 'A' ) ) { + if ( $processor->is_tag_closer() ) { + continue; + } + + $href = $processor->get_attribute( 'href' ); + if ( ! is_string( $href ) ) { + continue; + } + + $depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + $links[] = array( + 'href' => $href, + 'text' => $text, + ); + } + + if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) { + return array(); + } + + return $links; +} diff --git a/doc-experiment/results/round-45/T06-collect-links/trial-2/execution.json b/doc-experiment/results/round-45/T06-collect-links/trial-2/execution.json new file mode 100644 index 0000000000000..80fd82539ee9f --- /dev/null +++ b/doc-experiment/results/round-45/T06-collect-links/trial-2/execution.json @@ -0,0 +1,148 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-45/T06-collect-links/trial-2/candidate.php", + "function": "collect_links", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "actual": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-excluded", + "status": "pass", + "expected": [ + { + "href": "/only", + "text": "real" + } + ], + "actual": [ + { + "href": "/only", + "text": "real" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-in-href-decoded", + "status": "pass", + "expected": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "actual": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-link-empty-text", + "status": "pass", + "expected": [ + { + "href": "/img", + "text": "" + } + ], + "actual": [ + { + "href": "/img", + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-text", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "actual": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-links", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-link", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "actual": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-45/T06-collect-links/trial-2/response.json b/doc-experiment/results/round-45/T06-collect-links/trial-2/response.json new file mode 100644 index 0000000000000..d568e4c4cf491 --- /dev/null +++ b/doc-experiment/results/round-45/T06-collect-links/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because link text extraction is structure-dependent. It finds each `A` opener with `next_tag( 'A' )`, reads `href` with `get_attribute()` and keeps only string-valued attributes, then walks that anchor\u2019s subtree with `next_token()` bounded by `get_current_depth()`, concatenating only `#text` tokens via `get_token_type()` and `get_modifiable_text()`. It returns an empty array if parsing could not complete cleanly (`create_fragment()` failed, the input paused at an incomplete token, or `get_last_error()` reported unsupported markup).", + "confidence": 88 +} diff --git a/doc-experiment/results/round-45/T06-collect-links/trial-3/candidate.php b/doc-experiment/results/round-45/T06-collect-links/trial-3/candidate.php new file mode 100644 index 0000000000000..7756ea84dddc5 --- /dev/null +++ b/doc-experiment/results/round-45/T06-collect-links/trial-3/candidate.php @@ -0,0 +1,48 @@ +next_token() ) { + if ( '#tag' === $processor->get_token_type() && 'A' === $processor->get_tag() ) { + if ( $processor->is_tag_closer() ) { + array_pop( $link_stack ); + continue; + } + + $href = $processor->get_attribute( 'href' ); + if ( is_string( $href ) ) { + $links[] = array( + 'href' => $href, + 'text' => '', + ); + $link_stack[] = count( $links ) - 1; + } else { + $link_stack[] = null; + } + + continue; + } + + if ( '#text' === $processor->get_token_type() && ! empty( $link_stack ) ) { + $text = $processor->get_modifiable_text(); + + foreach ( $link_stack as $link_index ) { + if ( null !== $link_index ) { + $links[ $link_index ]['text'] .= $text; + } + } + } + } + + if ( null !== $processor->get_last_error() ) { + return array(); + } + + return $links; +} diff --git a/doc-experiment/results/round-45/T06-collect-links/trial-3/execution.json b/doc-experiment/results/round-45/T06-collect-links/trial-3/execution.json new file mode 100644 index 0000000000000..84423b7110b12 --- /dev/null +++ b/doc-experiment/results/round-45/T06-collect-links/trial-3/execution.json @@ -0,0 +1,148 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-45/T06-collect-links/trial-3/candidate.php", + "function": "collect_links", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "actual": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-excluded", + "status": "pass", + "expected": [ + { + "href": "/only", + "text": "real" + } + ], + "actual": [ + { + "href": "/only", + "text": "real" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-in-href-decoded", + "status": "pass", + "expected": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "actual": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-link-empty-text", + "status": "pass", + "expected": [ + { + "href": "/img", + "text": "" + } + ], + "actual": [ + { + "href": "/img", + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-text", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "actual": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-links", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-link", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "actual": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-45/T06-collect-links/trial-3/response.json b/doc-experiment/results/round-45/T06-collect-links/trial-3/response.json new file mode 100644 index 0000000000000..6ba5b56ff664a --- /dev/null +++ b/doc-experiment/results/round-45/T06-collect-links/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, then walks tokens once with `next_token()`. On each `A` opener it reads `href` with `get_attribute()` and keeps a stack so later `#text` tokens can be appended via `get_modifiable_text()` until the matching `A` closer, which yields decoded link text in document order while excluding missing or boolean `href` attributes.", + "confidence": 90 +} diff --git a/doc-experiment/results/round-45/T08-table-extract/judge.json b/doc-experiment/results/round-45/T08-table-extract/judge.json new file mode 100644 index 0000000000000..23ea9f4fad554 --- /dev/null +++ b/doc-experiment/results/round-45/T08-table-extract/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Used the correct tree-aware processor and only documented methods: create_fragment, next_tag, get_current_depth, next_token, get_token_type, get_tag, is_tag_closer, get_modifiable_text, and get_last_error all appear in the rendered docs. The traversal is idiomatic: one depth-bounded token walk with row/cell state and #text-only decoded text collection. Minor deductions: the final manual flush is redundant because next_token documents virtual closers, and the get_last_error fail-closed policy could discard already-collected read-only extraction results even though the docs say that is caller policy." + }, + { + "trial_id": "trial-2", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment and a single token walk bounded by the matched table depth. All API calls are documented, including get_token_name for tag names and get_token_type for #text. It follows the documented state-machine pattern for repeated regions and correctly uses get_modifiable_text only after identifying ordinary text. Minor deduction for redundant EOF/current-row flushing, which suggests partial uncertainty about the documented closer-for-every-opener behavior." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Clean API use throughout: correct processor, all methods documented, one depth-bounded next_token loop, explicit #tag/#text dispatch, closer-driven row/cell flushing, and get_modifiable_text only for ordinary text tokens. This aligns closely with the rendered guidance on fragment parsing, implied table structure, virtual closers, and decoded text extraction." + } + ], + "failure_analysis": "All three trials passed all 8 frozen hidden cases, so there are no failed hidden cases to attribute. The docs did especially well on the key hazards for this task: the HTML Processor docs distinguish tree-aware fragment parsing from lexical tag scanning; next_token explains implied elements, synthesized/virtual closers, and the single-cursor state-machine pattern; get_current_depth explains the >= subtree boundary; and get_modifiable_text explains decoded #text handling and warns against treating every modifiable-text token as DOM text. Near-misses were small: two candidates added redundant end-of-loop flushing despite the virtual-closer guarantee, and trial-1 treated get_last_error as a reason to erase read-only results even though the docs frame that as caller policy rather than a universal rule.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::next_token() docblock / traversal recipe", + "problem": "The docs explain virtual closers and single-cursor traversal, but the examples stop short of a compact generic pattern for repeated nested regions inside a previously matched container.", + "suggestion": "Add a general example for collecting repeated child regions within a matched ancestor using one next_token loop, a depth boundary, state variables, and closer-driven flushing. Keep it generic, such as terms/items/sections, not this table task." + }, + { + "location": "WP_HTML_Processor::get_last_error() and WP_HTML_Tag_Processor::paused_at_incomplete_token() docs", + "problem": "The read-only extraction policy is present in narrative guidance, but method-level docs can still lead implementers to discard already-collected data whenever an error is observed.", + "suggestion": "Add a docblock note that these signals mean the scan did not complete; they do not invalidate tokens already visited. Recommend fail-closed behavior for mutation/normalization/complete-source contracts, and explicit caller policy for read-only extraction." + }, + { + "location": "WP_HTML_Processor::get_token_type(), get_token_name(), get_tag(), and is_tag_closer() docs", + "problem": "Each method is documented, but models can still be uncertain about which predicate is best for tags versus ordinary text because the comparison is distributed across separate sections.", + "suggestion": "Add a small cross-method table showing return values for opening tag, closing tag, ordinary #text, comment, and special-element opener tokens, with a note that ordinary DOM text extraction should test get_token_type() === '#text' before reading get_modifiable_text()." + } + ] +} diff --git a/doc-experiment/results/round-45/T08-table-extract/trial-1/candidate.php b/doc-experiment/results/round-45/T08-table-extract/trial-1/candidate.php new file mode 100644 index 0000000000000..e7919eee4d0e7 --- /dev/null +++ b/doc-experiment/results/round-45/T08-table-extract/trial-1/candidate.php @@ -0,0 +1,81 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = null; + + while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) { + if ( '#text' === $processor->get_token_type() ) { + if ( null !== $current_cell ) { + $current_cell .= $processor->get_modifiable_text(); + } + continue; + } + + if ( '#tag' !== $processor->get_token_type() ) { + continue; + } + + $tag_name = $processor->get_tag(); + + if ( 'TR' === $tag_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_cell ) { + $current_row[] = $current_cell; + $current_cell = null; + } + + if ( null !== $current_row ) { + $rows[] = $current_row; + $current_row = null; + } + } else { + $current_row = array(); + } + + continue; + } + + if ( 'TD' === $tag_name || 'TH' === $tag_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_row && null !== $current_cell ) { + $current_row[] = $current_cell; + } + $current_cell = null; + } else { + if ( null === $current_row ) { + $current_row = array(); + } + $current_cell = ''; + } + } + } + + if ( null !== $current_cell ) { + if ( null === $current_row ) { + $current_row = array(); + } + $current_row[] = $current_cell; + } + + if ( null !== $current_row ) { + $rows[] = $current_row; + } + + if ( null !== $processor->get_last_error() ) { + return array(); + } + + return $rows; +} diff --git a/doc-experiment/results/round-45/T08-table-extract/trial-1/execution.json b/doc-experiment/results/round-45/T08-table-extract/trial-1/execution.json new file mode 100644 index 0000000000000..a579b2788da9c --- /dev/null +++ b/doc-experiment/results/round-45/T08-table-extract/trial-1/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-45/T08-table-extract/trial-1/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-45/T08-table-extract/trial-1/response.json b/doc-experiment/results/round-45/T08-table-extract/trial-1/response.json new file mode 100644 index 0000000000000..1848b4c921c6c --- /dev/null +++ b/doc-experiment/results/round-45/T08-table-extract/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` so table parsing follows browser-style HTML structure, including implied table elements like `TBODY` and auto-closed rows/cells. It finds the first `TABLE` with `next_tag()`, then walks tokens inside that table with `next_token()`, collecting only `#text` token contents via `get_modifiable_text()` while a `TD` or `TH` cell is open, and flushing cells/rows on their closing tags.", + "confidence": 89 +} diff --git a/doc-experiment/results/round-45/T08-table-extract/trial-2/candidate.php b/doc-experiment/results/round-45/T08-table-extract/trial-2/candidate.php new file mode 100644 index 0000000000000..5e02be8c8bff4 --- /dev/null +++ b/doc-experiment/results/round-45/T08-table-extract/trial-2/candidate.php @@ -0,0 +1,82 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = null; + + while ( $processor->next_token() ) { + if ( $processor->get_current_depth() < $table_depth ) { + break; + } + + $token_name = $processor->get_token_name(); + + if ( 'TR' === $token_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_cell ) { + $current_row[] = $current_cell; + $current_cell = null; + } + + if ( null !== $current_row ) { + $rows[] = $current_row; + $current_row = null; + } + } else { + $current_row = array(); + } + + continue; + } + + if ( 'TD' === $token_name || 'TH' === $token_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_cell ) { + if ( null === $current_row ) { + $current_row = array(); + } + + $current_row[] = $current_cell; + $current_cell = null; + } + } else { + if ( null === $current_row ) { + $current_row = array(); + } + + $current_cell = ''; + } + + continue; + } + + if ( null !== $current_cell && '#text' === $processor->get_token_type() ) { + $current_cell .= $processor->get_modifiable_text(); + } + } + + if ( null !== $current_cell ) { + if ( null === $current_row ) { + $current_row = array(); + } + + $current_row[] = $current_cell; + } + + if ( null !== $current_row ) { + $rows[] = $current_row; + } + + return $rows; +} diff --git a/doc-experiment/results/round-45/T08-table-extract/trial-2/execution.json b/doc-experiment/results/round-45/T08-table-extract/trial-2/execution.json new file mode 100644 index 0000000000000..3ef8f9325c4ae --- /dev/null +++ b/doc-experiment/results/round-45/T08-table-extract/trial-2/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-45/T08-table-extract/trial-2/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-45/T08-table-extract/trial-2/response.json b/doc-experiment/results/round-45/T08-table-extract/trial-2/response.json new file mode 100644 index 0000000000000..3fa8acfd56564 --- /dev/null +++ b/doc-experiment/results/round-45/T08-table-extract/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` so table parsing follows browser-style HTML structure, including implied table elements. It finds the first `TABLE` with `next_tag()`, then does one `next_token()` walk bounded by the table depth from `get_current_depth()`, starting and ending rows on `TR` tokens, starting and ending cells on `TD`/`TH` tokens, and concatenating only ordinary `#text` token contents via `get_modifiable_text()`.", + "confidence": 86 +} diff --git a/doc-experiment/results/round-45/T08-table-extract/trial-3/candidate.php b/doc-experiment/results/round-45/T08-table-extract/trial-3/candidate.php new file mode 100644 index 0000000000000..927ce73bfbc41 --- /dev/null +++ b/doc-experiment/results/round-45/T08-table-extract/trial-3/candidate.php @@ -0,0 +1,54 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = null; + + while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) { + if ( '#tag' === $processor->get_token_type() ) { + $tag_name = $processor->get_tag(); + + if ( 'TR' === $tag_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_row ) { + $rows[] = $current_row; + $current_row = null; + } + } else { + $current_row = array(); + } + + continue; + } + + if ( 'TD' === $tag_name || 'TH' === $tag_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_row && null !== $current_cell ) { + $current_row[] = $current_cell; + } + $current_cell = null; + } else { + $current_cell = ''; + } + } + + continue; + } + + if ( '#text' === $processor->get_token_type() && null !== $current_cell ) { + $current_cell .= $processor->get_modifiable_text(); + } + } + + return $rows; +} diff --git a/doc-experiment/results/round-45/T08-table-extract/trial-3/execution.json b/doc-experiment/results/round-45/T08-table-extract/trial-3/execution.json new file mode 100644 index 0000000000000..a624e6fc31911 --- /dev/null +++ b/doc-experiment/results/round-45/T08-table-extract/trial-3/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-45/T08-table-extract/trial-3/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-45/T08-table-extract/trial-3/response.json b/doc-experiment/results/round-45/T08-table-extract/trial-3/response.json new file mode 100644 index 0000000000000..0a725a5f943f8 --- /dev/null +++ b/doc-experiment/results/round-45/T08-table-extract/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with browser-like table semantics, finds the first `TABLE` with `next_tag()`, then does a single `next_token()` subtree walk bounded by `get_current_depth()`. It collects row boundaries from `TR` open/close tokens, cell boundaries from `TD`/`TH` open/close tokens, and appends only `#text` token content via `get_modifiable_text()` so markup contributes nothing and character references are already decoded.", + "confidence": 89 +} diff --git a/doc-experiment/results/round-45/VARIANT.md b/doc-experiment/results/round-45/VARIANT.md new file mode 100644 index 0000000000000..ce148002dd3ec --- /dev/null +++ b/doc-experiment/results/round-45/VARIANT.md @@ -0,0 +1,34 @@ +# Round 45 Scratch Variant + +Variant name: `html-processor-text-policy-decision-table` + +Control round: `round-44` + +Edited rendered file: `/tmp/html-api-docs-eval/round-45/html-processor.md` + +Source docblocks were not edited. This is a scratch-only rendered-doc A/B +variant. The staged `html-processor.md` SHA-256 recorded in +`round-metadata.json` is: + +```text +dbec31d2a26f4223bfa3509950485bd0cafa67b7acfb971ec7d28df15fa4e0a3 +``` + +Changed rendered documentation in three places: + +- The class-level DOM-style text recipe now has a compact policy table: + ordinary subtree text uses only `#text`; special-element opener text is an + explicit opt-in with decoded/raw behavior called out; and read-only + extraction fallback policy is separated from mutation, normalization, and + token-rewrite fail-closed policy. +- The `next_token()` special-element paragraph now frames SCRIPT, STYLE, + TITLE, and TEXTAREA opener-carried text as opt-in data for that element's + own contents, not ordinary heading, table-cell, link, or article text. +- The inherited `get_modifiable_text()` section now states that it is not a + predicate for ordinary text nodes: ordinary DOM-style extraction should + first require `get_token_type() === '#text'`. + +Purpose: test whether a compact decision table and method-local opt-in +reminders improve transfer for text extraction tasks where subjects +over-include special-element opener text or discard read-only accumulated +results after incomplete/unsupported trailing input. diff --git a/doc-experiment/results/round-45/codex-judges-output.json b/doc-experiment/results/round-45/codex-judges-output.json new file mode 100644 index 0000000000000..0485287591d63 --- /dev/null +++ b/doc-experiment/results/round-45/codex-judges-output.json @@ -0,0 +1,224 @@ +{ + "result": [ + { + "id": "T03-first-h1-text", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the correct WP_HTML_Processor fragment parser, guarded null creation and missing H1, found the first H1 with next_tag(), then used the documented depth-bounded next_token() subtree walk. It read only #text tokens and used get_modifiable_text(), which the docs state returns decoded text for text nodes. Execution passed 8/8 with no _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same documented, idiomatic implementation as the reference: create_fragment(), next_tag('H1'), get_current_depth(), next_token(), get_token_type(), and get_modifiable_text(). No undocumented methods or misuse. Handles nested markup, decoded entities, no-H1 null, image-only empty string, and unclosed H1 through the HTML Processor’s tree-aware walk." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correct processor and method set, all present in the rendered docs. The explanation explicitly cites tree-aware extraction and decoded character references. The implementation follows the HTML Processor subtree text recipe and passed all cases without warnings." + } + ], + "failure_analysis": "No hidden case failed across the trials. The docs did well because the relevant guidance was direct and task-shaped without embedding this exact solution: Tag Processor / 'Which processor should I use?' says collecting element text and walking a subtree require WP_HTML_Processor; HTML Processor / 'Recipe: collect DOM-style text from a subtree' shows the depth-bounded next_token() pattern and the #text-only filter; HTML Processor / get_current_depth explains why the guard must be >=, including child closers and malformed or unclosed input; get_modifiable_text explains decoded text for #text nodes and warns not to use it as a predicate for ordinary text. The main near-miss is that create_fragment() returning null is visible in the signature and examples, but the text-extraction recipe itself does not include the null guard, so a less careful subject could omit it.", + "doc_gaps": [ + { + "location": "html-processor.md / Recipe: collect DOM-style text from a subtree", + "problem": "The recipe starts with create_fragment() and immediately calls next_tag(), while create_fragment() can return null.", + "suggestion": "Add the null guard to the text-extraction recipe, or state directly that callers should handle a null factory result before token walking." + }, + { + "location": "html-processor.md / create_fragment()", + "problem": "The nullable return is clear in the signature but easy to miss in prose.", + "suggestion": "Add a short Returns note explaining when null can occur and that callers should branch before invoking processor methods." + }, + { + "location": "html-processor.md / Overview future-direction bullets", + "problem": "The bullet saying inner-content reading is a future capability can look inconsistent with the current documented ability to collect DOM-style text by walking tokens.", + "suggestion": "Clarify that direct innerHTML/textContent-style convenience APIs are future work, while read-only text extraction is currently supported through token walking." + } + ] + } + }, + { + "id": "T05-text-excerpt", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 99, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), walked with next_token(), filtered ordinary #text tokens, explicitly opted in TITLE/TEXTAREA opener-carried text, and used get_modifiable_text() only after token checks. All called methods are present in the rendered docs and execution recorded no _doing_it_wrong notices. Minor inefficiency: it accumulates all text before truncating instead of stopping once enough code points are collected." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Fully aligned with the documented text-extraction pattern: HTML Processor fragment parsing, single token walk, #text filtering, TITLE/TEXTAREA opt-in via opening tags, decoded text via get_modifiable_text(), and UTF-8 mb_* truncation. No undocumented API calls or misuse notices." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Fully aligned with the docs: correct processor, documented methods only, guarded use of get_modifiable_text(), explicit exclusion of SCRIPT/STYLE by whitelist, and Unicode-safe truncation. No _doing_it_wrong records." + } + ], + "failure_analysis": "All three trials passed all 10 hidden cases, so there were no failed cases to attribute to a misconception. The docs appear to have done well in the key places: the HTML Processor overview says to choose WP_HTML_Processor for document structure and text collection; the next_token() section states that element text may be split across multiple #text tokens and that TITLE/TEXTAREA/SCRIPT/STYLE carry text on opener tokens instead of child #text nodes; the get_modifiable_text() section warns that it is not a predicate for ordinary text and explains decoded #text/TITLE/TEXTAREA versus raw SCRIPT/STYLE. The candidates’ explanations closely mirrored those passages. Near-misses were limited to robustness and performance: trial-1 did not stop after reaching the limit, and none checked incomplete-token/error state, but the task and frozen cases did not require rejecting partial parses.", + "doc_gaps": [ + { + "location": "html-processor.md / next_token() and get_modifiable_text()", + "problem": "The correct text-extraction rules are documented, but spread across narrative sections. A reader has to combine token walking, ordinary #text filtering, special-element opener text, and decoded/raw semantics.", + "suggestion": "Add a compact reference table in the get_modifiable_text() docblock listing token category, whether it represents DOM-style text content, whether character references are decoded, and whether callers should opt in explicitly." + }, + { + "location": "html-processor.md / next_token()", + "problem": "The docs mention incomplete input handling for complete-source callers, but the text-extraction examples do not show a policy decision for paused_at_incomplete_token() or get_last_error().", + "suggestion": "Add a general note to text-walk examples: after a read-only walk, decide whether partial text is acceptable; if not, check paused_at_incomplete_token() and get_last_error() before returning accumulated text." + }, + { + "location": "html-processor.md / get_modifiable_text()", + "problem": "The docs recommend mb_strlen()/mb_substr() with UTF-8, but do not explicitly distinguish Unicode code points from grapheme clusters. This can matter for emoji plus variation selectors or combining marks.", + "suggestion": "Add a short note that PHP mb_* string length/substr with UTF-8 counts code points, not user-perceived grapheme clusters, and direct callers to grapheme_* when a UI-facing character limit requires grapheme clusters." + } + ] + } + }, + { + "id": "T06-collect-links", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Passed 8/8. Correct processor choice with WP_HTML_Processor::create_fragment(). All API calls are documented. Uses a solid one-pass next_token() state machine, get_attribute() with is_string() for href, and #text plus get_modifiable_text() for decoded link text. Minor reservation: it manually tracks anchor scope instead of using the depth/breadcrumb subtree recipe, but this is still consistent with documented closer-driven token walking." + }, + { + "trial_id": "trial-2", + "adherence": 95, + "hallucinated_methods": [], + "notes": "Passed 8/8. Correct processor choice and no undocumented API usage. This is closest to the reference: next_tag('A'), depth-bounded next_token() walk, #text filtering, get_modifiable_text(), and string-only href handling. Main penalty: it returns an empty array whenever paused_at_incomplete_token() is true after the scan, which over-applies a complete-input policy to a read-only extraction. A probe with a valid link followed by an incomplete trailing tag returns [] here while the reference returns the collected link." + }, + { + "trial_id": "trial-3", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Passed 8/8. Correctly uses WP_HTML_Processor::create_fragment() and only documented methods. The #tag guard, get_tag(), is_tag_closer(), get_attribute(), #text filtering, and get_modifiable_text() are all appropriate. Minor reservation: it appends text to every active link in a manual stack, which is a less precise mental model than using the processor's parsed subtree boundary or current-region state; it works for these cases because the HTML Processor emits structural/virtual closers." + } + ], + "failure_analysis": "No hidden case failed across the three trials: simple, no-href-excluded, entity-in-href-decoded, valueless-href, image-link-empty-text, entities-in-text, no-links, and unclosed-link all passed in every execution.json. The docs did well on the important concepts: WP_HTML_Processor::create_fragment() is clearly recommended for BODY fragments and structural text extraction; the DOM-style text recipe shows next_tag()/next_token(), get_current_depth(), #text filtering, and get_modifiable_text(); get_attribute() documents string|true|null and decoded attribute values; get_modifiable_text() documents decoded #text values; next_token() documents virtual/end-of-input closers, which explains why the unclosed-link case works. The main near-miss was trial-2's global fail-closed policy for paused_at_incomplete_token(): the docs say read-only extraction policy is caller-defined and visited tokens remain usable, but the examples still make it easy to treat truncation as a reason to erase all accumulated data.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_current_depth() and the read-only text extraction recipe", + "problem": "The docs state that paused_at_incomplete_token() is a caller policy for read-only extraction, but there is no compact example showing a successful extraction before a later incomplete trailing token. Trial-2 therefore treated any incomplete trailing syntax as a reason to return an empty result.", + "suggestion": "Add a short read-only extraction example where tokens are collected before a trailing incomplete token, and explicitly say that preserving accumulated data is valid when the function contract is best-effort or fragment-oriented; reject only when the contract requires complete source bytes." + }, + { + "location": "WP_HTML_Tag_Processor::get_attribute() and WP_HTML_Processor::get_attribute() return docs", + "problem": "The return description says boolean attributes return true, but the practical contract is broader: an attribute present without a syntactic value returns true even when the attribute name is not a known boolean attribute, such as href.", + "suggestion": "Define true as 'attribute present with no value in source', null as absent/unavailable, and '' as an explicitly empty value. Include one non-boolean valueless example alongside the boolean-style example." + }, + { + "location": "WP_HTML_Processor::next_token() repeated-region guidance", + "problem": "The docs contain a first-element subtree example and a DT state-machine example, but not a concise general recipe for collecting many repeated element subtrees in document order. Candidates split between depth-bounded nested walks and manual active stacks.", + "suggestion": "Add a general repeated-region extraction recipe: detect an opener, initialize current state, append only #text tokens while inside, and finalize on the processor-reported closer, noting that virtual closers cover implied and end-of-input closes." + }, + { + "location": "WP_HTML_Processor::get_tag() docblock", + "problem": "The docs say get_tag() returns null if no tag is matched, but do not directly spell out behavior on non-tag tokens during next_token() scans. This encourages unguarded get_tag() calls in token loops.", + "suggestion": "Add a note that text, comment, doctype, and other non-tag tokens return null from get_tag(); for tag-only logic, either use next_tag() or guard next_token() code with get_token_type() === '#tag'." + } + ] + } + }, + { + "id": "T08-table-extract", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Used the correct tree-aware processor and only documented methods: create_fragment, next_tag, get_current_depth, next_token, get_token_type, get_tag, is_tag_closer, get_modifiable_text, and get_last_error all appear in the rendered docs. The traversal is idiomatic: one depth-bounded token walk with row/cell state and #text-only decoded text collection. Minor deductions: the final manual flush is redundant because next_token documents virtual closers, and the get_last_error fail-closed policy could discard already-collected read-only extraction results even though the docs say that is caller policy." + }, + { + "trial_id": "trial-2", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment and a single token walk bounded by the matched table depth. All API calls are documented, including get_token_name for tag names and get_token_type for #text. It follows the documented state-machine pattern for repeated regions and correctly uses get_modifiable_text only after identifying ordinary text. Minor deduction for redundant EOF/current-row flushing, which suggests partial uncertainty about the documented closer-for-every-opener behavior." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Clean API use throughout: correct processor, all methods documented, one depth-bounded next_token loop, explicit #tag/#text dispatch, closer-driven row/cell flushing, and get_modifiable_text only for ordinary text tokens. This aligns closely with the rendered guidance on fragment parsing, implied table structure, virtual closers, and decoded text extraction." + } + ], + "failure_analysis": "All three trials passed all 8 frozen hidden cases, so there are no failed hidden cases to attribute. The docs did especially well on the key hazards for this task: the HTML Processor docs distinguish tree-aware fragment parsing from lexical tag scanning; next_token explains implied elements, synthesized/virtual closers, and the single-cursor state-machine pattern; get_current_depth explains the >= subtree boundary; and get_modifiable_text explains decoded #text handling and warns against treating every modifiable-text token as DOM text. Near-misses were small: two candidates added redundant end-of-loop flushing despite the virtual-closer guarantee, and trial-1 treated get_last_error as a reason to erase read-only results even though the docs frame that as caller policy rather than a universal rule.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::next_token() docblock / traversal recipe", + "problem": "The docs explain virtual closers and single-cursor traversal, but the examples stop short of a compact generic pattern for repeated nested regions inside a previously matched container.", + "suggestion": "Add a general example for collecting repeated child regions within a matched ancestor using one next_token loop, a depth boundary, state variables, and closer-driven flushing. Keep it generic, such as terms/items/sections, not this table task." + }, + { + "location": "WP_HTML_Processor::get_last_error() and WP_HTML_Tag_Processor::paused_at_incomplete_token() docs", + "problem": "The read-only extraction policy is present in narrative guidance, but method-level docs can still lead implementers to discard already-collected data whenever an error is observed.", + "suggestion": "Add a docblock note that these signals mean the scan did not complete; they do not invalidate tokens already visited. Recommend fail-closed behavior for mutation/normalization/complete-source contracts, and explicit caller policy for read-only extraction." + }, + { + "location": "WP_HTML_Processor::get_token_type(), get_token_name(), get_tag(), and is_tag_closer() docs", + "problem": "Each method is documented, but models can still be uncertain about which predicate is best for tags versus ordinary text because the comparison is distributed across separate sections.", + "suggestion": "Add a small cross-method table showing return values for opening tag, closing tag, ordinary #text, comment, and special-element opener tokens, with a note that ordinary DOM text extraction should test get_token_type() === '#text' before reading get_modifiable_text()." + } + ] + } + }, + { + "id": "N06-extract-toc", + "verdict": { + "trials": [ + { + "trial_id": "trial-1", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Correctly chose WP_HTML_Processor::create_fragment() for body-fragment, structure-aware traversal. All HTML API calls are documented: create_fragment, next_tag, get_tag, get_current_depth, next_token, get_token_type, get_modifiable_text, and get_last_error. The subtree walk and #text-only get_modifiable_text() use are idiomatic and handle decoded entities, nested inline markup, empty headings, uppercase source tags, and implied heading closes. Minor penalty: the final get_last_error() check discards all accumulated read-only results on unsupported markup; the docs say that is a caller policy, but this task did not specify fail-closed behavior." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Canonical use of the documented API. It chooses WP_HTML_Processor::create_fragment(), scans heading openers with next_tag(), records opener depth, walks each heading subtree with next_token() while depth remains >= the opener depth, and reads only #text tokens through get_modifiable_text(). No undocumented methods or _doing_it_wrong records. Edge cases in the frozen expectations are handled cleanly." + }, + { + "trial_id": "trial-3", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Correctly uses WP_HTML_Processor::create_fragment() and a single next_token() state machine, matching the documented repeated-region pattern. All HTML API methods used are documented, including is_tag_closer(), get_token_type(), get_tag(), and get_modifiable_text(). It handles virtual/implied closers, empty headings, decoded text, and case normalization. Minor penalty: it relies on closer-driven flushing and an end-of-scan fallback without checking get_last_error()/paused_at_incomplete_token(), so unsupported or truncated scans could produce partial output without an explicit policy." + } + ], + "failure_analysis": "No hidden case failed in any trial: all three passed 7/7 frozen expectations with no _doing_it_wrong records. The rendered docs appear to have done the important work well. The 'Supported elements' and processor-choice language clearly pushed subjects to WP_HTML_Processor rather than the lexical Tag Processor. The 'collect DOM-style text from a subtree' recipe and get_modifiable_text() docs prevented the common mistake of appending tags, comments, or raw special-element content, and made entity decoding clear. The get_current_depth() section's explicit >= guidance maps directly to headings with nested inline markup, while the next_token() section's promise of implicit/end-of-input closing tokens explains the implied-heading-close case. Near-misses: trial-1 over-applied get_last_error() as a global fail-closed policy for read-only extraction, and trial-3 relied on get_tag()/is_tag_closer() behavior on virtual closers that is demonstrated indirectly but could be stated more directly in method docs.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_last_error() docblock", + "problem": "The method doc explains how to distinguish no match from parser abort, but does not locally repeat the read-only extraction policy: already-visited tokens remain usable, and discarding accumulated results is only appropriate when the caller requires complete-source semantics.", + "suggestion": "Add a short note that for read-only extraction, get_last_error() reports scan incompleteness/unsupported markup but does not invalidate data already visited; callers should explicitly choose partial result, empty result, or sentinel behavior." + }, + { + "location": "WP_HTML_Processor::is_tag_closer(), get_tag(), and get_token_name() docblocks", + "problem": "Closer-driven state machines depend on what name APIs return while matched on a real or virtual closing token. The docs say depth/breadcrumbs report the parent after a closer, but the tag-name contract for closers is less explicit.", + "suggestion": "State that on tag closer tokens, including virtual closers emitted by the HTML Processor, get_tag()/get_token_name() return the closed element's name even though depth and breadcrumbs have already moved to the parent context." + }, + { + "location": "WP_HTML_Processor::next_token() recipes", + "problem": "The docs separately show first-subtree text extraction and a repeated-region state machine. Subjects inferred the combination correctly, but the safe choices for repeated subtree text extraction remain spread across sections.", + "suggestion": "Add a general non-task-specific recipe for collecting text for every matching element in document order, clarifying when a next_tag()+bounded next_token() loop is acceptable and when a single next_token() state machine is preferable." + } + ] + } + } + ] +} diff --git a/doc-experiment/results/round-45/codex-trials-output.json b/doc-experiment/results/round-45/codex-trials-output.json new file mode 100644 index 0000000000000..02d633fcf5927 --- /dev/null +++ b/doc-experiment/results/round-45/codex-trials-output.json @@ -0,0 +1,143 @@ +{ + "subject_isolation": { + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-45/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." + }, + "result": [ + { + "id": "T03-first-h1-text", + "trial": 1, + "ok": true, + "code": "next_tag( 'H1' ) ) {\n return null;\n }\n\n $depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware task, then finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, and walks its subtree with `next_token()`, concatenating only `#text` tokens via `get_token_type()` and `get_modifiable_text()`. That yields decoded text content from the heading and its nested elements while ignoring markup itself.", + "confidence": 94 + }, + { + "id": "T03-first-h1-text", + "trial": 2, + "ok": true, + "code": "next_tag( 'H1' ) ) {\n return null;\n }\n\n $h1_depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation, then finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, and walks that subtree with `next_token()`, concatenating only `#text` tokens via `get_token_type()` and `get_modifiable_text()`, which already returns decoded text.", + "confidence": 94 + }, + { + "id": "T03-first-h1-text", + "trial": 3, + "ok": true, + "code": "next_tag( 'H1' ) ) {\n return null;\n }\n\n $h1_depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction depends on document structure, then finds the first `H1` with `next_tag()`, records its depth with `get_current_depth()`, and walks its subtree with `next_token()`, concatenating only `#text` tokens via `get_token_type()` and `get_modifiable_text()`. That follows the documented HTML Processor recipe for DOM-style subtree text and preserves decoded character references.", + "confidence": 94 + }, + { + "id": "T05-text-excerpt", + "trial": 1, + "ok": true, + "code": "next_token() ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n continue;\n }\n\n if ( '#tag' !== $processor->get_token_type() || $processor->is_tag_closer() ) {\n continue;\n }\n\n $tag_name = $processor->get_token_name();\n if ( 'TITLE' === $tag_name || 'TEXTAREA' === $tag_name ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n if ( mb_strlen( $text, 'UTF-8' ) <= $max_codepoints ) {\n return $text;\n }\n\n return mb_substr( $text, 0, $max_codepoints, 'UTF-8' );\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, then walks all tokens with `next_token()`, appending ordinary decoded `#text` nodes via `get_token_type()` and `get_modifiable_text()`, plus opener-carried decoded text for `TITLE` and `TEXTAREA` via `get_token_name()` and `is_tag_closer()`. It excludes `SCRIPT` and `STYLE` by only whitelisting the special text-bearing elements the task requires, and truncates with `mb_strlen()`/`mb_substr()` in UTF-8 so multi-byte characters are not split.", + "confidence": 90 + }, + { + "id": "T05-text-excerpt", + "trial": 2, + "ok": true, + "code": "next_token() ) {\n $chunk = '';\n\n if ( '#text' === $processor->get_token_type() ) {\n $chunk = $processor->get_modifiable_text();\n } elseif ( '#tag' === $processor->get_token_type() && ! $processor->is_tag_closer() ) {\n $token_name = $processor->get_token_name();\n if ( 'TITLE' === $token_name || 'TEXTAREA' === $token_name ) {\n $chunk = $processor->get_modifiable_text();\n }\n }\n\n if ( '' === $chunk ) {\n continue;\n }\n\n $chunk_length = mb_strlen( $chunk, 'UTF-8' );\n if ( $chunk_length <= $remaining ) {\n $excerpt .= $chunk;\n $remaining -= $chunk_length;\n\n if ( 0 === $remaining ) {\n break;\n }\n\n continue;\n }\n\n $excerpt .= mb_substr( $chunk, 0, $remaining, 'UTF-8' );\n break;\n }\n\n return $excerpt;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, then walks tokens with `next_token()`, appending only ordinary `#text` tokens plus opener-carried text from `TITLE` and `TEXTAREA` via `get_modifiable_text()`. That follows the documented text-extraction rules, automatically gets decoded character references where HTML defines them, excludes `SCRIPT` and `STYLE`, and truncates safely by Unicode code points with `mb_strlen()` and `mb_substr()` in UTF-8.", + "confidence": 92 + }, + { + "id": "T05-text-excerpt", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n $token_type = $processor->get_token_type();\n\n if ( '#text' === $token_type ) {\n $text .= $processor->get_modifiable_text();\n } elseif (\n '#tag' === $token_type &&\n ! $processor->is_tag_closer() &&\n in_array( $processor->get_token_name(), array( 'TEXTAREA', 'TITLE' ), true )\n ) {\n $text .= $processor->get_modifiable_text();\n }\n\n if ( mb_strlen( $text, 'UTF-8' ) > $max_codepoints ) {\n return mb_substr( $text, 0, $max_codepoints, 'UTF-8' );\n }\n }\n\n return $text;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` and a full `next_token()` walk so extraction follows the parsed HTML fragment structure. It appends only ordinary `#text` tokens plus the documented opener-carried text for `TEXTAREA` and `TITLE` via `get_modifiable_text()`, skips `SCRIPT` and `STYLE`, and truncates by Unicode code points with `mb_strlen()`/`mb_substr()` using explicit `UTF-8` encoding as the docs require.", + "confidence": 94 + }, + { + "id": "T06-collect-links", + "trial": 1, + "ok": true, + "code": "next_token() ) {\n if ( 'A' === $processor->get_tag() ) {\n if ( $processor->is_tag_closer() ) {\n array_pop( $active_links );\n continue;\n }\n\n $href = $processor->get_attribute( 'href' );\n\n if ( is_string( $href ) ) {\n $links[] = array(\n 'href' => $href,\n 'text' => '',\n );\n $active_links[] = count( $links ) - 1;\n } else {\n $active_links[] = null;\n }\n\n continue;\n }\n\n if ( '#text' === $processor->get_token_type() && ! empty( $active_links ) ) {\n $current_link = $active_links[ count( $active_links ) - 1 ];\n\n if ( null !== $current_link ) {\n $links[ $current_link ]['text'] .= $processor->get_modifiable_text();\n }\n }\n }\n\n return $links;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on HTML structure and subtree text collection. It walks the fragment once with `next_token()`, starts a link record on each opening `A` whose `get_attribute( 'href' )` returns a string, appends decoded text only from `#text` tokens via `get_modifiable_text()`, and closes link scopes when `is_tag_closer()` reports the matching `A` end token.", + "confidence": 89 + }, + { + "id": "T06-collect-links", + "trial": 2, + "ok": true, + "code": "next_tag( 'A' ) ) {\n if ( $processor->is_tag_closer() ) {\n continue;\n }\n\n $href = $processor->get_attribute( 'href' );\n if ( ! is_string( $href ) ) {\n continue;\n }\n\n $depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n $links[] = array(\n 'href' => $href,\n 'text' => $text,\n );\n }\n\n if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) {\n return array();\n }\n\n return $links;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because link text extraction is structure-dependent. It finds each `A` opener with `next_tag( 'A' )`, reads `href` with `get_attribute()` and keeps only string-valued attributes, then walks that anchor’s subtree with `next_token()` bounded by `get_current_depth()`, concatenating only `#text` tokens via `get_token_type()` and `get_modifiable_text()`. It returns an empty array if parsing could not complete cleanly (`create_fragment()` failed, the input paused at an incomplete token, or `get_last_error()` reported unsupported markup).", + "confidence": 88 + }, + { + "id": "T06-collect-links", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n if ( '#tag' === $processor->get_token_type() && 'A' === $processor->get_tag() ) {\n if ( $processor->is_tag_closer() ) {\n array_pop( $link_stack );\n continue;\n }\n\n $href = $processor->get_attribute( 'href' );\n if ( is_string( $href ) ) {\n $links[] = array(\n 'href' => $href,\n 'text' => '',\n );\n $link_stack[] = count( $links ) - 1;\n } else {\n $link_stack[] = null;\n }\n\n continue;\n }\n\n if ( '#text' === $processor->get_token_type() && ! empty( $link_stack ) ) {\n $text = $processor->get_modifiable_text();\n\n foreach ( $link_stack as $link_index ) {\n if ( null !== $link_index ) {\n $links[ $link_index ]['text'] .= $text;\n }\n }\n }\n }\n\n if ( null !== $processor->get_last_error() ) {\n return array();\n }\n\n return $links;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, then walks tokens once with `next_token()`. On each `A` opener it reads `href` with `get_attribute()` and keeps a stack so later `#text` tokens can be appended via `get_modifiable_text()` until the matching `A` closer, which yields decoded link text in document order while excluding missing or boolean `href` attributes.", + "confidence": 90 + }, + { + "id": "T08-table-extract", + "trial": 1, + "ok": true, + "code": "next_tag( 'TABLE' ) ) {\n return array();\n }\n\n $table_depth = $processor->get_current_depth();\n $rows = array();\n $current_row = null;\n $current_cell = null;\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n if ( null !== $current_cell ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n continue;\n }\n\n if ( '#tag' !== $processor->get_token_type() ) {\n continue;\n }\n\n $tag_name = $processor->get_tag();\n\n if ( 'TR' === $tag_name ) {\n if ( $processor->is_tag_closer() ) {\n if ( null !== $current_cell ) {\n $current_row[] = $current_cell;\n $current_cell = null;\n }\n\n if ( null !== $current_row ) {\n $rows[] = $current_row;\n $current_row = null;\n }\n } else {\n $current_row = array();\n }\n\n continue;\n }\n\n if ( 'TD' === $tag_name || 'TH' === $tag_name ) {\n if ( $processor->is_tag_closer() ) {\n if ( null !== $current_row && null !== $current_cell ) {\n $current_row[] = $current_cell;\n }\n $current_cell = null;\n } else {\n if ( null === $current_row ) {\n $current_row = array();\n }\n $current_cell = '';\n }\n }\n }\n\n if ( null !== $current_cell ) {\n if ( null === $current_row ) {\n $current_row = array();\n }\n $current_row[] = $current_cell;\n }\n\n if ( null !== $current_row ) {\n $rows[] = $current_row;\n }\n\n if ( null !== $processor->get_last_error() ) {\n return array();\n }\n\n return $rows;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` so table parsing follows browser-style HTML structure, including implied table elements like `TBODY` and auto-closed rows/cells. It finds the first `TABLE` with `next_tag()`, then walks tokens inside that table with `next_token()`, collecting only `#text` token contents via `get_modifiable_text()` while a `TD` or `TH` cell is open, and flushing cells/rows on their closing tags.", + "confidence": 89 + }, + { + "id": "T08-table-extract", + "trial": 2, + "ok": true, + "code": "next_tag( 'TABLE' ) ) {\n return array();\n }\n\n $table_depth = $processor->get_current_depth();\n $rows = array();\n $current_row = null;\n $current_cell = null;\n\n while ( $processor->next_token() ) {\n if ( $processor->get_current_depth() < $table_depth ) {\n break;\n }\n\n $token_name = $processor->get_token_name();\n\n if ( 'TR' === $token_name ) {\n if ( $processor->is_tag_closer() ) {\n if ( null !== $current_cell ) {\n $current_row[] = $current_cell;\n $current_cell = null;\n }\n\n if ( null !== $current_row ) {\n $rows[] = $current_row;\n $current_row = null;\n }\n } else {\n $current_row = array();\n }\n\n continue;\n }\n\n if ( 'TD' === $token_name || 'TH' === $token_name ) {\n if ( $processor->is_tag_closer() ) {\n if ( null !== $current_cell ) {\n if ( null === $current_row ) {\n $current_row = array();\n }\n\n $current_row[] = $current_cell;\n $current_cell = null;\n }\n } else {\n if ( null === $current_row ) {\n $current_row = array();\n }\n\n $current_cell = '';\n }\n\n continue;\n }\n\n if ( null !== $current_cell && '#text' === $processor->get_token_type() ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n }\n\n if ( null !== $current_cell ) {\n if ( null === $current_row ) {\n $current_row = array();\n }\n\n $current_row[] = $current_cell;\n }\n\n if ( null !== $current_row ) {\n $rows[] = $current_row;\n }\n\n return $rows;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` so table parsing follows browser-style HTML structure, including implied table elements. It finds the first `TABLE` with `next_tag()`, then does one `next_token()` walk bounded by the table depth from `get_current_depth()`, starting and ending rows on `TR` tokens, starting and ending cells on `TD`/`TH` tokens, and concatenating only ordinary `#text` token contents via `get_modifiable_text()`.", + "confidence": 86 + }, + { + "id": "T08-table-extract", + "trial": 3, + "ok": true, + "code": "next_tag( 'TABLE' ) ) {\n return array();\n }\n\n $table_depth = $processor->get_current_depth();\n $rows = array();\n $current_row = null;\n $current_cell = null;\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) {\n if ( '#tag' === $processor->get_token_type() ) {\n $tag_name = $processor->get_tag();\n\n if ( 'TR' === $tag_name ) {\n if ( $processor->is_tag_closer() ) {\n if ( null !== $current_row ) {\n $rows[] = $current_row;\n $current_row = null;\n }\n } else {\n $current_row = array();\n }\n\n continue;\n }\n\n if ( 'TD' === $tag_name || 'TH' === $tag_name ) {\n if ( $processor->is_tag_closer() ) {\n if ( null !== $current_row && null !== $current_cell ) {\n $current_row[] = $current_cell;\n }\n $current_cell = null;\n } else {\n $current_cell = '';\n }\n }\n\n continue;\n }\n\n if ( '#text' === $processor->get_token_type() && null !== $current_cell ) {\n $current_cell .= $processor->get_modifiable_text();\n }\n }\n\n return $rows;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with browser-like table semantics, finds the first `TABLE` with `next_tag()`, then does a single `next_token()` subtree walk bounded by `get_current_depth()`. It collects row boundaries from `TR` open/close tokens, cell boundaries from `TD`/`TH` open/close tokens, and appends only `#text` token content via `get_modifiable_text()` so markup contributes nothing and character references are already decoded.", + "confidence": 89 + }, + { + "id": "N06-extract-toc", + "trial": 1, + "ok": true, + "code": " 1,\n 'H2' => 2,\n 'H3' => 3,\n 'H4' => 4,\n 'H5' => 5,\n 'H6' => 6,\n );\n $toc = array();\n\n while ( $processor->next_tag() ) {\n $tag = $processor->get_tag();\n\n if ( null === $tag || ! isset( $levels[ $tag ] ) ) {\n continue;\n }\n\n $heading_depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $heading_depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n $toc[] = array(\n 'level' => $levels[ $tag ],\n 'text' => $text,\n );\n }\n\n if ( null !== $processor->get_last_error() ) {\n return array();\n }\n\n return $toc;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because heading text extraction is a tree-aware operation. It scans forward with `next_tag()`, filters `H1` through `H6` using `get_tag()`, records the heading opener depth with `get_current_depth()`, then walks that heading’s subtree with `next_token()` and concatenates only `#text` token content via `get_token_type()` and `get_modifiable_text()`, which excludes markup tags while preserving heading text in document order.", + "confidence": 89 + }, + { + "id": "N06-extract-toc", + "trial": 2, + "ok": true, + "code": "next_tag() ) {\n $tag = $processor->get_tag();\n if ( ! in_array( $tag, array( 'H1', 'H2', 'H3', 'H4', 'H5', 'H6' ), true ) ) {\n continue;\n }\n\n $level = (int) substr( $tag, 1 );\n $depth = $processor->get_current_depth();\n $text = '';\n\n while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) {\n if ( '#text' === $processor->get_token_type() ) {\n $text .= $processor->get_modifiable_text();\n }\n }\n\n $toc[] = array(\n 'level' => $level,\n 'text' => $text,\n );\n }\n\n return $toc;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because heading text extraction is tree-aware. It scans heading openers with `next_tag()`, records each heading level from `get_tag()`, then walks that heading’s subtree with `next_token()` and a `get_current_depth()` boundary, appending only ordinary `#text` tokens via `get_token_type()` and `get_modifiable_text()` so nested markup contributes text but not tags.", + "confidence": 92 + }, + { + "id": "N06-extract-toc", + "trial": 3, + "ok": true, + "code": "next_token() ) {\n if ( '#tag' === $processor->get_token_type() ) {\n $tag = $processor->get_tag();\n\n if ( ! $processor->is_tag_closer() ) {\n if ( null !== $tag && preg_match( '/^H([1-6])$/', $tag, $matches ) ) {\n $current_heading = array(\n 'level' => (int) $matches[1],\n 'text' => '',\n );\n $current_tag = $tag;\n }\n } elseif ( null !== $current_heading && $tag === $current_tag ) {\n $toc[] = $current_heading;\n $current_heading = null;\n $current_tag = null;\n }\n\n continue;\n }\n\n if ( null !== $current_heading && '#text' === $processor->get_token_type() ) {\n $current_heading['text'] .= $processor->get_modifiable_text();\n }\n }\n\n if ( null !== $current_heading ) {\n $toc[] = $current_heading;\n }\n\n return $toc;\n}\n", + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment structurally, then walks tokens with `next_token()`. On each heading opener `H1` through `H6`, it starts a current entry; while inside that heading it appends only `#text` token contents via `get_modifiable_text()`, and when the matching heading closer is reached with `is_tag_closer()` and `get_tag()`, it stores the completed TOC item.", + "confidence": 90 + } + ] +} diff --git a/doc-experiment/results/round-45/round-metadata.json b/doc-experiment/results/round-45/round-metadata.json new file mode 100644 index 0000000000000..6085decfc93bf --- /dev/null +++ b/doc-experiment/results/round-45/round-metadata.json @@ -0,0 +1,167 @@ +{ + "round": "round-45", + "mode": "shadow-doc-a/b", + "task_ids": [ + "T03-first-h1-text", + "T05-text-excerpt", + "T06-collect-links", + "T08-table-extract", + "N06-extract-toc" + ], + "task_count": 5, + "splits": { + "train": 5 + }, + "concepts": { + "text": 3, + "traversal": 2 + }, + "trials_per_task": 3, + "subject": { + "model": "gpt-5.4", + "reasoning_effort": "medium", + "service_tier": "priority" + }, + "judge": { + "model": "gpt-5.5", + "reasoning_effort": "xhigh", + "service_tier": "priority" + }, + "git_head": "ac41d6448e9a316d5675f67b7d8e42dc9bf4add7", + "git_status_short": "?? doc-experiment/results/round-44/", + "source_file_digests": { + "ref": "working-tree", + "algorithm": "sha256", + "php_behavior_fingerprint": "token_get_all with T_COMMENT, T_DOC_COMMENT, and T_WHITESPACE removed; JSON-encoded token names and text", + "files": { + "src/wp-includes/html-api/class-wp-html-tag-processor.php": { + "source_sha256": "9e431d345cc6d5563a65e3456b8a944fe1def8ceb8db299e89f20e8e9f560058", + "php_without_comments_sha256": "d073f5772a64826878e4132ed9bb30717fb1b6f3ddb5e23a014a4fa9db218ad7", + "php_without_comments_token_count": 9881 + }, + "src/wp-includes/html-api/class-wp-html-processor.php": { + "source_sha256": "74724f1a228f65ed967dfa42def5ab6e70bfb0e36c0521d1f7649827e95b12ff", + "php_without_comments_sha256": "2da9aa482f295c89a7018c5b74b622b4add7e942599c97785434408cfa72c083", + "php_without_comments_token_count": 16806 + } + } + }, + "corpus_file_digests": { + "ref": "working-tree", + "algorithm": "sha256", + "tasks": { + "T03-first-h1-text": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T03-first-h1-text/task.md": "66a882aef20a6af4430fb6a7eafc54ba306adcf982980967e07eba7881fc0030", + "doc-experiment/corpus/T03-first-h1-text/reference.php": "a64806dcd5f7f91e254c287aad72401502aec44a7661eb691b51e56e17064f2d", + "doc-experiment/corpus/T03-first-h1-text/tests.json": "499c5da8b3d916a6739142c6ed164e8a6a05c43a6c517003bcb31d6929094533" + } + }, + "T05-text-excerpt": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T05-text-excerpt/task.md": "5e69766456a6343f99ee24037574ed3f9bdf7795ab8be9a61203f6254b8fa2de", + "doc-experiment/corpus/T05-text-excerpt/reference.php": "1081c297e36fa566e8a6a2cf8098fe8d3117da8ac49d4106691b9bb65b3739b6", + "doc-experiment/corpus/T05-text-excerpt/tests.json": "8b2276d754c887d6246f321b2dfad0de59718bc252c0a6bed52b65ae1bad1496" + } + }, + "T06-collect-links": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T06-collect-links/task.md": "29bc31f664b27ca880bf5cdd6a50d0da8f3828cf59d085d2897df2cb987d7d3e", + "doc-experiment/corpus/T06-collect-links/reference.php": "c0e5ae9ec5d4448a525962f1e27250a7243541d613209e3bbdc86913ff7e7a81", + "doc-experiment/corpus/T06-collect-links/tests.json": "34460376a147fa18739e7743d341479489f46a88b58b9f5aa5fe0d901f2b3140" + } + }, + "T08-table-extract": { + "labels": { + "split": "train", + "role": "core", + "commonness": "medium", + "concept": "traversal", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/T08-table-extract/task.md": "d7aee8a4765781a8f7c36b976f914b495b70989817ba0612aa9fc4589bedfeee", + "doc-experiment/corpus/T08-table-extract/reference.php": "289b8c6d1b69d4ef4dd7810dbdcba35445c30a7f26b631a00835ef737b78709e", + "doc-experiment/corpus/T08-table-extract/tests.json": "cdec7559d63ee541f1b0dad34a4f9f0982ce66e58ddfea98ec482c425e1b2638" + } + }, + "N06-extract-toc": { + "labels": { + "split": "train", + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html" + }, + "files": { + "doc-experiment/corpus/N06-extract-toc/task.md": "ba341a3927bb81410b8662b29f479f23a0887d4ec6f19db95702c1346d670581", + "doc-experiment/corpus/N06-extract-toc/reference.php": "7e5cf569519dc0411da5659e473fc773d82eb634886b5623610b696936c0b2e2", + "doc-experiment/corpus/N06-extract-toc/tests.json": "22f8950aace5d22a842a8c76df372d5d09e770cf4adefb06099eb81f7410317e" + } + } + } + }, + "created_at_utc": "2026-06-13T15:57:10+00:00", + "isolation": { + "scratch_contains": [ + "html-tag-processor.md", + "html-processor.md", + "tasks/.md" + ], + "subjects_must_not_read": [ + "reference.php", + "tests.json", + "source files", + "logs", + "plans", + "hypothesis docs" + ] + }, + "scratch": "/tmp/html-api-docs-eval/round-45", + "staged_task_files": [ + "tasks/T03-first-h1-text.md", + "tasks/T05-text-excerpt.md", + "tasks/T06-collect-links.md", + "tasks/T08-table-extract.md", + "tasks/N06-extract-toc.md" + ], + "scratch_isolation_check": "OK: /tmp/html-api-docs-eval/round-45 exposes 2 docs and 5 task prompt(s), with no forbidden files.", + "scratch_file_sha256": { + "html-processor.md": "dbec31d2a26f4223bfa3509950485bd0cafa67b7acfb971ec7d28df15fa4e0a3", + "html-tag-processor.md": "8ca4ec32aa54723fe16a0a9f45386ca6ebeecde7a06575d8a782e53fb395a664", + "tasks/N06-extract-toc.md": "ba341a3927bb81410b8662b29f479f23a0887d4ec6f19db95702c1346d670581", + "tasks/T03-first-h1-text.md": "66a882aef20a6af4430fb6a7eafc54ba306adcf982980967e07eba7881fc0030", + "tasks/T05-text-excerpt.md": "5e69766456a6343f99ee24037574ed3f9bdf7795ab8be9a61203f6254b8fa2de", + "tasks/T06-collect-links.md": "29bc31f664b27ca880bf5cdd6a50d0da8f3828cf59d085d2897df2cb987d7d3e", + "tasks/T08-table-extract.md": "d7aee8a4765781a8f7c36b976f914b495b70989817ba0612aa9fc4589bedfeee" + }, + "shadow_doc_variant": { + "name": "html-processor-text-policy-decision-table", + "control_round": "round-44", + "edited_files": [ + "html-processor.md" + ], + "notes": "Scratch-only rendered-doc variant. Adds a compact where-text-lives / extraction-policy table and method-local reminders that ordinary DOM-style text reads #text only, special-element opener text is explicit opt-in, and read-only extraction fallback policy differs from mutation/normalization/rewrite fail-closed policy. Source docblocks are unchanged." + } +} diff --git a/doc-experiment/results/round-45/round-summary.json b/doc-experiment/results/round-45/round-summary.json new file mode 100644 index 0000000000000..38c2206e466fd --- /dev/null +++ b/doc-experiment/results/round-45/round-summary.json @@ -0,0 +1,222 @@ +{ + "round_score": 99.56, + "core_score": 99.56, + "by_split": { + "train": 99.56 + }, + "by_concept": { + "text": 99.6, + "traversal": 99.5 + }, + "tasks": { + "T03-first-h1-text": { + "score": 100.0, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html", + "split": "train" + } + }, + "T05-text-excerpt": { + "score": 99.9, + "trials": [ + { + "trial": "trial-1", + "passed": 10, + "total": 10, + "adherence": 99, + "score": 99.7 + }, + { + "trial": "trial-2", + "passed": 10, + "total": 10, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 10, + "total": 10, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html", + "split": "train" + } + }, + "T06-collect-links": { + "score": 98.9, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 98, + "score": 99.4 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 95, + "score": 98.5 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 96, + "score": 98.8 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "text", + "processor": "html", + "split": "train" + } + }, + "T08-table-extract": { + "score": 99.5, + "trials": [ + { + "trial": "trial-1", + "passed": 8, + "total": 8, + "adherence": 97, + "score": 99.1 + }, + { + "trial": "trial-2", + "passed": 8, + "total": 8, + "adherence": 98, + "score": 99.4 + }, + { + "trial": "trial-3", + "passed": 8, + "total": 8, + "adherence": 100, + "score": 100.0 + } + ], + "labels": { + "role": "core", + "commonness": "medium", + "concept": "traversal", + "processor": "html", + "split": "train" + } + }, + "N06-extract-toc": { + "score": 99.5, + "trials": [ + { + "trial": "trial-1", + "passed": 7, + "total": 7, + "adherence": 97, + "score": 99.1 + }, + { + "trial": "trial-2", + "passed": 7, + "total": 7, + "adherence": 100, + "score": 100.0 + }, + { + "trial": "trial-3", + "passed": 7, + "total": 7, + "adherence": 98, + "score": 99.4 + } + ], + "labels": { + "role": "core", + "commonness": "high", + "concept": "traversal", + "processor": "html", + "split": "train" + } + } + }, + "round_metadata": { + "round": "round-45", + "mode": "shadow-doc-a/b", + "task_ids": [ + "T03-first-h1-text", + "T05-text-excerpt", + "T06-collect-links", + "T08-table-extract", + "N06-extract-toc" + ], + "task_count": 5, + "trials_per_task": 3, + "subject": { + "model": "gpt-5.4", + "reasoning_effort": "medium", + "service_tier": "priority" + }, + "judge": { + "model": "gpt-5.5", + "reasoning_effort": "xhigh", + "service_tier": "priority" + }, + "git_head": "ac41d6448e9a316d5675f67b7d8e42dc9bf4add7", + "git_status_short": "?? doc-experiment/results/round-44/" + }, + "subject_isolation": { + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-45/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." + } +} diff --git a/doc-experiment/results/round-45/subject-isolation.json b/doc-experiment/results/round-45/subject-isolation.json new file mode 100644 index 0000000000000..66bbae34872b8 --- /dev/null +++ b/doc-experiment/results/round-45/subject-isolation.json @@ -0,0 +1,19 @@ +{ + "enforced": true, + "agent_type": "codex-cli-isolated-workdir", + "isolation_mode": "isolated-workdir", + "runner": "codex exec", + "input_delivery": "prompt-embedded-docs", + "sandbox_mode": "read-only", + "approval_policy": "never", + "project_rules_loaded": false, + "user_config_loaded": false, + "repo_available_to_subject": false, + "input_files": [ + "html-processor.md", + "html-tag-processor.md", + "task.md" + ], + "work_root": "/var/folders/v7/flqy7j3s3q72cql9ppnrbqth0000gn/T/html-api-docs-eval/round-45/codex-cli-trials", + "equivalent_boundary_notes": "Each subject process runs from a private non-repo directory containing only the two staged rendered docs, one task prompt, and the output schema. The task and rendered docs are embedded directly in the subject prompt because local codex exec does not expose the experiment's Read/Grep-only tools. Codex project rules and user config are ignored; the process uses a read-only sandbox and approval policy never." +} diff --git a/doc-experiment/results/round-46/H04-remove-empty-paragraphs/judge.json b/doc-experiment/results/round-46/H04-remove-empty-paragraphs/judge.json new file mode 100644 index 0000000000000..a16029bcb73cd --- /dev/null +++ b/doc-experiment/results/round-46/H04-remove-empty-paragraphs/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 91, + "hallucinated_methods": [], + "notes": "Used the right processor (`WP_HTML_Processor::create_fragment`) and only documented methods. Strong use of `next_token()`, bookmarks, depth-bounded subtree scanning, `serialize_token()`, and fallback on `get_last_error()` / `paused_at_incomplete_token()`. Minor adherence issues: it uses nested `next_token()` loops for repeated regions despite the docs recommending a single stateful loop, and it treats any visited token as paragraph content rather than checking whether the token has normalized serialized output." + }, + { + "trial_id": "trial-2", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Best aligned with the docs: HTML Processor, one stateful token walk, delayed emission of the opener, normalized output through `serialize_token()`, and explicit incomplete/unsupported fallback. All called APIs are documented and no `_doing_it_wrong` records occurred. The main near-miss is that a token with empty serialization, such as a presumptuous end tag, would still cause the pending element opener to be emitted." + }, + { + "trial_id": "trial-3", + "adherence": 93, + "hallucinated_methods": [], + "notes": "Correct processor choice and all methods are documented. It follows the documented one-cursor/state-machine style and handles parser aborts and incomplete input. Slightly less precise than trial 2 because it recognizes `P` openers without a `#tag` token-type guard and infers the pending element's closer from depth alone. Like the other trials, it counts any visited token as content even if `serialize_token()` would emit an empty string." + } + ], + "failure_analysis": "All three trials passed all 11 frozen hidden cases, with no runtime API misuse recorded. The docs did well on the key concepts for this task: the processor-choice sections clearly steer structural and normalized-output work to `WP_HTML_Processor`; `create_fragment()` explains body-fragment parsing and null creation; `next_token()` explains implicit and end-of-input closers; `get_current_depth()` explains why subtree walks use `>=` and why an element's own closer reports a lower depth; `serialize_token()` explains token-by-token normalized rewriting; and the error/incomplete-token passages led every candidate to return the original input when the parse did not finish cleanly.\n\nThe main near-miss is not covered by the frozen cases: all three candidates treat token presence as content. A probe with `

                      ` shows the reference returns an empty string, because the empty end tag is ignored and `serialize_token()` returns `''`; all candidates return `

                      `. The relevant docs do say presumptuous end tags are ignored and may serialize to an empty string, but that fact is not connected strongly enough to conditional subtree-emission decisions. The models learned how to walk and serialize, but not quite that normalized output content is not the same thing as “any visited token.”", + "doc_gaps": [ + { + "location": "`WP_HTML_Processor::serialize_token()` docblock and rewrite recipe", + "problem": "The docs mention that some visited tokens serialize to an empty string, but examples do not make clear that rewrite decisions based on whether a subtree emits content must ignore empty-serialization parser artifacts.", + "suggestion": "Add a short note and generic example: when deciding whether a visited region has emitted normalized content, test the serialized token string, because tokens such as presumptuous end tags may be visited but produce no output." + }, + { + "location": "`WP_HTML_Processor::next_token()` state-machine examples", + "problem": "The docs show collecting text and removing known wrappers, but not the common delayed-emission pattern for conditionally dropping an element after inspecting whether its subtree produced any normalized output.", + "suggestion": "Add a general recipe for holding an opener pending, emitting it only after the first non-empty serialized descendant/token is seen, and dropping both opener and closer if no emitted content appears. Keep it element-agnostic rather than using this task's paragraph case." + }, + { + "location": "`WP_HTML_Processor::get_tag()` / `get_namespace()` docs", + "problem": "The method docs do not explicitly remind readers that tag-name based structural rewrites should normally guard on `get_token_type() === '#tag'` and, when HTML semantics matter, the `html` namespace.", + "suggestion": "Add a matching-pattern note showing `#tag`, `! is_tag_closer()`, tag name, and namespace checks together for transformations that target HTML elements." + } + ] +} diff --git a/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-1/candidate.php b/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-1/candidate.php new file mode 100644 index 0000000000000..e4de557d5e857 --- /dev/null +++ b/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-1/candidate.php @@ -0,0 +1,52 @@ +next_token() ) { + if ( 'P' === $processor->get_tag() && ! $processor->is_tag_closer() ) { + if ( ! $processor->set_bookmark( 'p-start' ) ) { + return $html; + } + + $p_depth = $processor->get_current_depth(); + $has_content = false; + + while ( $processor->next_token() && $processor->get_current_depth() >= $p_depth ) { + $has_content = true; + } + + if ( null !== $processor->get_last_error() || $processor->paused_at_incomplete_token() ) { + return $html; + } + + if ( $has_content ) { + if ( ! $processor->seek( 'p-start' ) ) { + return $html; + } + + do { + $output .= $processor->serialize_token(); + } while ( $processor->next_token() && $processor->get_current_depth() >= $p_depth ); + + $output .= $processor->serialize_token(); + } + + $processor->release_bookmark( 'p-start' ); + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $processor->get_last_error() || $processor->paused_at_incomplete_token() ) { + return $html; + } + + return $output; +} diff --git a/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-1/execution.json b/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-1/execution.json new file mode 100644 index 0000000000000..01059d9cf80bc --- /dev/null +++ b/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-1/execution.json @@ -0,0 +1,107 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-1/candidate.php", + "function": "remove_empty_paragraphs", + "passed": 11, + "total": 11, + "cases": [ + { + "id": "mixed-paragraphs", + "status": "pass", + "expected": "

                      Keep me

                      ", + "actual": "

                      Keep me

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-and-whitespace", + "status": "pass", + "expected": "

                      \n\t

                      Text

                      ", + "actual": "

                      \n\t

                      Text

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-content", + "status": "pass", + "expected": "

                       

                      A B

                      ", + "actual": "

                       

                      A B

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "element-only-kept", + "status": "pass", + "expected": "


                      ", + "actual": "


                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comment-and-script-kept", + "status": "pass", + "expected": "

                      ", + "actual": "

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "self-closing-paragraph-syntax", + "status": "pass", + "expected": "

                      keep

                      ", + "actual": "

                      keep

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "implicit-paragraph-close", + "status": "pass", + "expected": "

                      One

                      Block

                      Two

                      ", + "actual": "

                      One

                      Block

                      Two

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-insensitive-source", + "status": "pass", + "expected": "

                      Keep

                      ", + "actual": "

                      Keep

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-paragraphs", + "status": "pass", + "expected": "
                      Nothing to remove
                      ", + "actual": "
                      Nothing to remove
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-input-unchanged", + "status": "pass", + "expected": "

                      ", + "actual": "

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-1/response.json b/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-1/response.json new file mode 100644 index 0000000000000..b3eaa517de930 --- /dev/null +++ b/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment safely, then walks tokens with `next_token()` and rewrites the output with `serialize_token()`. For each `P` opener, it uses `set_bookmark()`, `get_current_depth()`, and a bounded subtree scan to detect whether the paragraph has any tokens inside; empty paragraphs are skipped, non-empty ones are replayed from the bookmark, and any unsupported or incomplete input detected via `get_last_error()` or `paused_at_incomplete_token()` causes the original HTML to be returned unchanged.", + "confidence": 84 +} diff --git a/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-2/candidate.php b/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-2/candidate.php new file mode 100644 index 0000000000000..256e74217d0bf --- /dev/null +++ b/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-2/candidate.php @@ -0,0 +1,45 @@ +next_token() ) { + if ( null !== $pending_p ) { + if ( 'P' === $processor->get_tag() && $processor->is_tag_closer() && $processor->get_current_depth() === $pending_p_depth - 1 ) { + $pending_p = null; + $pending_p_depth = 0; + continue; + } + + $output .= $pending_p; + $pending_p = null; + $pending_p_depth = 0; + } + + if ( '#tag' === $processor->get_token_type() && 'P' === $processor->get_tag() && ! $processor->is_tag_closer() ) { + $pending_p = $processor->serialize_token(); + $pending_p_depth = $processor->get_current_depth(); + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) { + return $html; + } + + if ( null !== $pending_p ) { + $output .= $pending_p; + } + + return $output; +} diff --git a/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-2/execution.json b/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-2/execution.json new file mode 100644 index 0000000000000..31671a1f97f03 --- /dev/null +++ b/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-2/execution.json @@ -0,0 +1,107 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-2/candidate.php", + "function": "remove_empty_paragraphs", + "passed": 11, + "total": 11, + "cases": [ + { + "id": "mixed-paragraphs", + "status": "pass", + "expected": "

                      Keep me

                      ", + "actual": "

                      Keep me

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-and-whitespace", + "status": "pass", + "expected": "

                      \n\t

                      Text

                      ", + "actual": "

                      \n\t

                      Text

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-content", + "status": "pass", + "expected": "

                       

                      A B

                      ", + "actual": "

                       

                      A B

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "element-only-kept", + "status": "pass", + "expected": "


                      ", + "actual": "


                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comment-and-script-kept", + "status": "pass", + "expected": "

                      ", + "actual": "

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "self-closing-paragraph-syntax", + "status": "pass", + "expected": "

                      keep

                      ", + "actual": "

                      keep

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "implicit-paragraph-close", + "status": "pass", + "expected": "

                      One

                      Block

                      Two

                      ", + "actual": "

                      One

                      Block

                      Two

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-insensitive-source", + "status": "pass", + "expected": "

                      Keep

                      ", + "actual": "

                      Keep

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-paragraphs", + "status": "pass", + "expected": "
                      Nothing to remove
                      ", + "actual": "
                      Nothing to remove
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-input-unchanged", + "status": "pass", + "expected": "

                      ", + "actual": "

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-2/response.json b/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-2/response.json new file mode 100644 index 0000000000000..0f938b30b7d9b --- /dev/null +++ b/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment, then rewrites the normalized output token-by-token with `next_token()` and `serialize_token()`. A `P` opener is held back briefly; if the very next visited token is that paragraph\u2019s own closer, the pair is dropped as an empty paragraph, otherwise the opener is emitted and processing continues. If parsing cannot fully complete, detected with `paused_at_incomplete_token()` or `get_last_error()`, the function returns the original HTML unchanged.", + "confidence": 90 +} diff --git a/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-3/candidate.php b/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-3/candidate.php new file mode 100644 index 0000000000000..87903b9258080 --- /dev/null +++ b/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-3/candidate.php @@ -0,0 +1,45 @@ +next_token() ) { + if ( null !== $pending_p_depth ) { + if ( $processor->get_current_depth() < $pending_p_depth ) { + // This is the pending paragraph's closer, so the paragraph was empty. + $pending_p_depth = null; + $pending_p_html = ''; + continue; + } + + $output .= $pending_p_html; + $pending_p_depth = null; + $pending_p_html = ''; + } + + if ( 'P' === $processor->get_tag() && ! $processor->is_tag_closer() ) { + $pending_p_depth = $processor->get_current_depth(); + $pending_p_html = $processor->serialize_token(); + continue; + } + + $output .= $processor->serialize_token(); + } + + if ( null !== $pending_p_depth ) { + $output .= $pending_p_html; + } + + if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) { + return $html; + } + + return $output; +} diff --git a/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-3/execution.json b/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-3/execution.json new file mode 100644 index 0000000000000..8e3600ddde3ea --- /dev/null +++ b/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-3/execution.json @@ -0,0 +1,107 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-3/candidate.php", + "function": "remove_empty_paragraphs", + "passed": 11, + "total": 11, + "cases": [ + { + "id": "mixed-paragraphs", + "status": "pass", + "expected": "

                      Keep me

                      ", + "actual": "

                      Keep me

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-and-whitespace", + "status": "pass", + "expected": "

                      \n\t

                      Text

                      ", + "actual": "

                      \n\t

                      Text

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-content", + "status": "pass", + "expected": "

                       

                      A B

                      ", + "actual": "

                       

                      A B

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "element-only-kept", + "status": "pass", + "expected": "


                      ", + "actual": "


                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "comment-and-script-kept", + "status": "pass", + "expected": "

                      ", + "actual": "

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "self-closing-paragraph-syntax", + "status": "pass", + "expected": "

                      keep

                      ", + "actual": "

                      keep

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "implicit-paragraph-close", + "status": "pass", + "expected": "

                      One

                      Block

                      Two

                      ", + "actual": "

                      One

                      Block

                      Two

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-insensitive-source", + "status": "pass", + "expected": "

                      Keep

                      ", + "actual": "

                      Keep

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-paragraphs", + "status": "pass", + "expected": "
                      Nothing to remove
                      ", + "actual": "
                      Nothing to remove
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-input-unchanged", + "status": "pass", + "expected": "

                      ", + "actual": "

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-3/response.json b/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-3/response.json new file mode 100644 index 0000000000000..ecaeb77fbe7c2 --- /dev/null +++ b/doc-experiment/results/round-46/H04-remove-empty-paragraphs/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, then rewrites the normalized output token-by-token with `next_token()`, `get_tag()`, `is_tag_closer()`, `get_current_depth()`, and `serialize_token()`. A `P` opener is held back until the next token proves the paragraph has content; if the next visited token is that paragraph\u2019s closer, both tokens are skipped. If parsing aborts or pauses on incomplete input, it returns the original HTML unchanged using `get_last_error()` and `paused_at_incomplete_token()`.", + "confidence": 90 +} diff --git a/doc-experiment/results/round-46/N01-remove-external-class/judge.json b/doc-experiment/results/round-46/N01-remove-external-class/judge.json new file mode 100644 index 0000000000000..97a17e0563397 --- /dev/null +++ b/doc-experiment/results/round-46/N01-remove-external-class/judge.json @@ -0,0 +1,35 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Tag_Processor, documented next_tag(array('tag_name'=>'A','class_name'=>'external')), documented remove_class(), and get_updated_html(). Passed all 7 cases with no _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same correct Tag Processor pattern. Lowercase tag_name 'a' is documented as ASCII case-insensitive. Passed all 7 cases with no _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same correct documented API usage and idiomatic flat class edit loop. Passed all 7 cases with no _doing_it_wrong records." + } + ], + "failure_analysis": "No hidden case failed in any trial. The docs supported this task well: the Tag Processor overview says to use it for flat tag/class/attribute edits; the Usage section shows direct construction with new WP_HTML_Tag_Processor($html); the Finding tags table documents next_tag(array('tag_name'=>..., 'class_name'=>...)); the CSS class section documents add_class()/remove_class() as safe without pre-checks and says removing the only class removes the whole class attribute; get_updated_html() is identified as the way to retrieve queued edits. Near miss: the docs make class-name case behavior discoverable but not crisp at the point of use. add_class() says exact byte-for-byte comparison, while remove_class() only says 'Removes a class name', and has_class() mentions ASCII case-insensitive behavior without foregrounding quirks-mode nuance. The trials still passed the EXTERNAL case because the API behavior was sufficient, but this is the only place a weaker reader could plausibly infer the wrong class matching semantics.", + "doc_gaps": [ + { + "location": "WP_HTML_Tag_Processor::remove_class() docblock", + "problem": "The method description does not explicitly state class-name comparison semantics or that removing the last remaining class removes the class attribute.", + "suggestion": "Add a short contract mirroring add_class(): class names are compared according to the processor compatibility mode, byte-for-byte in no-quirks mode, and removing the final class removes the class attribute." + }, + { + "location": "WP_HTML_Tag_Processor::next_tag() $query['class_name'] documentation", + "problem": "The query parameter says the tag must contain the whole class name, but does not state class-name case/compatibility behavior where the parameter is introduced.", + "suggestion": "State the class_name matching rule directly in the parameter description, including the no-quirks versus quirks-mode distinction." + } + ] +} diff --git a/doc-experiment/results/round-46/N01-remove-external-class/trial-1/candidate.php b/doc-experiment/results/round-46/N01-remove-external-class/trial-1/candidate.php new file mode 100644 index 0000000000000..62ea9dc242915 --- /dev/null +++ b/doc-experiment/results/round-46/N01-remove-external-class/trial-1/candidate.php @@ -0,0 +1,11 @@ +next_tag( array( 'tag_name' => 'A', 'class_name' => 'external' ) ) ) { + $processor->remove_class( 'external' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-46/N01-remove-external-class/trial-1/execution.json b/doc-experiment/results/round-46/N01-remove-external-class/trial-1/execution.json new file mode 100644 index 0000000000000..d0af66b971415 --- /dev/null +++ b/doc-experiment/results/round-46/N01-remove-external-class/trial-1/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/N01-remove-external-class/trial-1/candidate.php", + "function": "remove_external_class", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "among-others", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "only-class-removes-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-class-untouched", + "status": "pass", + "expected": "stay", + "actual": "stay", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive-not-removed", + "status": "pass", + "expected": "caps", + "actual": "caps", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-links", + "status": "pass", + "expected": "123", + "actual": "123", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "non-link-untouched", + "status": "pass", + "expected": "
                      not a link
                      link", + "actual": "
                      not a link
                      link", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "middle-of-list", + "status": "pass", + "expected": "mid", + "actual": "mid", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/N01-remove-external-class/trial-1/response.json b/doc-experiment/results/round-46/N01-remove-external-class/trial-1/response.json new file mode 100644 index 0000000000000..0a1f6b18f7eb6 --- /dev/null +++ b/doc-experiment/results/round-46/N01-remove-external-class/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat attribute/class edit: it constructs the processor with the input HTML, repeatedly finds `A` tags matching the `external` class via `next_tag()`, removes that class with `remove_class()`, and returns the modified document with `get_updated_html()`.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-46/N01-remove-external-class/trial-2/candidate.php b/doc-experiment/results/round-46/N01-remove-external-class/trial-2/candidate.php new file mode 100644 index 0000000000000..3c7101176574b --- /dev/null +++ b/doc-experiment/results/round-46/N01-remove-external-class/trial-2/candidate.php @@ -0,0 +1,13 @@ +next_tag( array( + 'tag_name' => 'a', + 'class_name' => 'external', + ) ) ) { + $processor->remove_class( 'external' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-46/N01-remove-external-class/trial-2/execution.json b/doc-experiment/results/round-46/N01-remove-external-class/trial-2/execution.json new file mode 100644 index 0000000000000..258a37757736c --- /dev/null +++ b/doc-experiment/results/round-46/N01-remove-external-class/trial-2/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/N01-remove-external-class/trial-2/candidate.php", + "function": "remove_external_class", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "among-others", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "only-class-removes-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-class-untouched", + "status": "pass", + "expected": "stay", + "actual": "stay", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive-not-removed", + "status": "pass", + "expected": "caps", + "actual": "caps", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-links", + "status": "pass", + "expected": "123", + "actual": "123", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "non-link-untouched", + "status": "pass", + "expected": "
                      not a link
                      link", + "actual": "
                      not a link
                      link", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "middle-of-list", + "status": "pass", + "expected": "mid", + "actual": "mid", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/N01-remove-external-class/trial-2/response.json b/doc-experiment/results/round-46/N01-remove-external-class/trial-2/response.json new file mode 100644 index 0000000000000..849b2793201d5 --- /dev/null +++ b/doc-experiment/results/round-46/N01-remove-external-class/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat attribute/class edit: it constructs a processor from the input HTML, repeatedly finds `A` tags matching the `external` class with `next_tag()`, removes that class with `remove_class()`, and returns the modified document with `get_updated_html()`.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-46/N01-remove-external-class/trial-3/candidate.php b/doc-experiment/results/round-46/N01-remove-external-class/trial-3/candidate.php new file mode 100644 index 0000000000000..8df91d9b9826c --- /dev/null +++ b/doc-experiment/results/round-46/N01-remove-external-class/trial-3/candidate.php @@ -0,0 +1,10 @@ +next_tag( array( 'tag_name' => 'a', 'class_name' => 'external' ) ) ) { + $processor->remove_class( 'external' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-46/N01-remove-external-class/trial-3/execution.json b/doc-experiment/results/round-46/N01-remove-external-class/trial-3/execution.json new file mode 100644 index 0000000000000..8c55ed42c5f2a --- /dev/null +++ b/doc-experiment/results/round-46/N01-remove-external-class/trial-3/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/N01-remove-external-class/trial-3/candidate.php", + "function": "remove_external_class", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "among-others", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "only-class-removes-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-class-untouched", + "status": "pass", + "expected": "stay", + "actual": "stay", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "case-sensitive-not-removed", + "status": "pass", + "expected": "caps", + "actual": "caps", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-links", + "status": "pass", + "expected": "123", + "actual": "123", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "non-link-untouched", + "status": "pass", + "expected": "
                      not a link
                      link", + "actual": "
                      not a link
                      link", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "middle-of-list", + "status": "pass", + "expected": "mid", + "actual": "mid", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/N01-remove-external-class/trial-3/response.json b/doc-experiment/results/round-46/N01-remove-external-class/trial-3/response.json new file mode 100644 index 0000000000000..0a1f6b18f7eb6 --- /dev/null +++ b/doc-experiment/results/round-46/N01-remove-external-class/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat attribute/class edit: it constructs the processor with the input HTML, repeatedly finds `A` tags matching the `external` class via `next_tag()`, removes that class with `remove_class()`, and returns the modified document with `get_updated_html()`.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-46/N02-collect-figure-images/judge.json b/doc-experiment/results/round-46/N02-collect-figure-images/judge.json new file mode 100644 index 0000000000000..ee99026ddbe6a --- /dev/null +++ b/doc-experiment/results/round-46/N02-collect-figure-images/judge.json @@ -0,0 +1,45 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 97, + "hallucinated_methods": [], + "notes": "Used the correct tree-aware `WP_HTML_Processor::create_fragment()` and only documented methods: `next_token()`, `get_token_type()`, `get_tag()`, `is_tag_closer()`, and `get_attribute()`. The single token-walk with explicit figure state is documented and passed all cases, including valueless/empty `src`, decoded entities, and an unclosed figure. Minor idiom deduction: for this specific containment query, `get_breadcrumbs()` is the clearer documented structural API than maintaining a manual figure counter." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Closely matches the documented ideal pattern: HTML Processor fragment parsing, `next_tag( 'IMG' )` for document-order image openers, `get_breadcrumbs()` for ancestor membership, and `get_attribute()` with `is_string` plus non-empty filtering. No undocumented API use or `_doing_it_wrong` records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same strong pattern as trial-2: correct processor, documented methods only, breadcrumb-based containment at any depth, decoded attribute access, and correct handling of missing, valueless, and empty `src` values. No misuse recorded." + } + ], + "failure_analysis": "All trials passed all 9 hidden cases, so there are no failed hidden cases to attribute to documentation failures. The docs did well on the key decision points: the Tag Processor overview explicitly says it has no tree awareness and that `get_breadcrumbs()` belongs to `WP_HTML_Processor`; the HTML Processor overview and Breadcrumbs section show structure-aware matching; `create_fragment()` documents the null check; `next_tag()` documents opener-only default behavior; `next_token()` documents generated closers for unclosed elements; and `get_attribute()` documents null/true/empty-string semantics, with decoded string semantics visible in the inherited Tag Processor method docs. Near-misses: the HTML Processor `get_attribute()` method page itself does not repeat the decoded-value contract, and the Breadcrumbs docs emphasize direct breadcrumb paths more than the common 'current element has any ancestor at any depth' check. A weaker subject could have used `array( 'FIGURE', 'IMG' )` as a descendant query and failed the nested-depth case, or could have double-decoded `src` if they only read the HTML Processor method entry.", + "doc_gaps": [ + { + "location": "`WP_HTML_Processor::get_attribute()` docblock", + "problem": "The method entry shows null/true semantics but omits the inherited guarantee that string attribute values are returned decoded.", + "suggestion": "Repeat the decoded string contract directly in the HTML Processor method docs, including a short `&` example and a warning not to decode again." + }, + { + "location": "HTML Processor Breadcrumbs section and `get_breadcrumbs()` docblock", + "problem": "The docs show exact/direct breadcrumb paths but do not explicitly show the common 'is the current element inside ancestor X at any depth?' pattern.", + "suggestion": "Add a general ancestor-membership example using `next_tag( 'IMG' )`, `array_slice( $processor->get_breadcrumbs(), 0, -1 )`, and `in_array( 'FIGURE', ... )`; state that breadcrumbs include the current node and implicit `HTML`/`BODY`." + }, + { + "location": "`next_tag()` breadcrumb query parameter docs", + "problem": "`breadcrumbs` can be mistaken for a descendant selector when it is closer to a child-combinator path/suffix match.", + "suggestion": "Clarify that `array( 'FIGURE', 'IMG' )` matches an IMG directly on that breadcrumb path, not any-depth descendants; recommend `get_breadcrumbs()` or a token walk for arbitrary-depth containment." + }, + { + "location": "`next_token()` structural-walk docs", + "problem": "The docs explain generated closers and explicit state, but do not contrast manual container counters with breadcrumb checks for simple containment queries.", + "suggestion": "Add guidance that manual state is useful when aggregating across regions, while checking the current token's ancestors is usually simpler and less fragile with `get_breadcrumbs()`." + } + ] +} diff --git a/doc-experiment/results/round-46/N02-collect-figure-images/trial-1/candidate.php b/doc-experiment/results/round-46/N02-collect-figure-images/trial-1/candidate.php new file mode 100644 index 0000000000000..8110a886304f9 --- /dev/null +++ b/doc-experiment/results/round-46/N02-collect-figure-images/trial-1/candidate.php @@ -0,0 +1,43 @@ +next_token() ) { + if ( '#tag' !== $processor->get_token_type() ) { + continue; + } + + $tag_name = $processor->get_tag(); + + if ( 'FIGURE' === $tag_name ) { + if ( $processor->is_tag_closer() ) { + if ( $figure_nesting > 0 ) { + $figure_nesting--; + } + } else { + $figure_nesting++; + } + + continue; + } + + if ( 'IMG' !== $tag_name || $processor->is_tag_closer() || $figure_nesting < 1 ) { + continue; + } + + $src = $processor->get_attribute( 'src' ); + if ( is_string( $src ) && '' !== $src ) { + $images[] = $src; + } + } + + return $images; +} diff --git a/doc-experiment/results/round-46/N02-collect-figure-images/trial-1/execution.json b/doc-experiment/results/round-46/N02-collect-figure-images/trial-1/execution.json new file mode 100644 index 0000000000000..e2ea5dc98c336 --- /dev/null +++ b/doc-experiment/results/round-46/N02-collect-figure-images/trial-1/execution.json @@ -0,0 +1,129 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/N02-collect-figure-images/trial-1/candidate.php", + "function": "collect_figure_images", + "passed": 9, + "total": 9, + "cases": [ + { + "id": "in-and-out", + "status": "pass", + "expected": [ + "in.jpg" + ], + "actual": [ + "in.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-depth", + "status": "pass", + "expected": [ + "deep.jpg" + ], + "actual": [ + "deep.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-figures", + "status": "pass", + "expected": [ + "a.jpg", + "b.jpg", + "c.jpg" + ], + "actual": [ + "a.jpg", + "b.jpg", + "c.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-figures", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-src-skipped", + "status": "pass", + "expected": [ + "yes.jpg" + ], + "actual": [ + "yes.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-and-valueless-src-skipped", + "status": "pass", + "expected": [ + "yes.jpg" + ], + "actual": [ + "yes.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-decoded-src", + "status": "pass", + "expected": [ + "/i?a=1&b=2" + ], + "actual": [ + "/i?a=1&b=2" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "figcaption-sibling", + "status": "pass", + "expected": [ + "pic.jpg", + "cap.jpg" + ], + "actual": [ + "pic.jpg", + "cap.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-figure", + "status": "pass", + "expected": [ + "open.jpg", + "later.jpg" + ], + "actual": [ + "open.jpg", + "later.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/N02-collect-figure-images/trial-1/response.json b/doc-experiment/results/round-46/N02-collect-figure-images/trial-1/response.json new file mode 100644 index 0000000000000..8bed3d514b01f --- /dev/null +++ b/doc-experiment/results/round-46/N02-collect-figure-images/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` for tree-aware parsing, then walks the fragment once with `next_token()`, tracking when the cursor is inside a `FIGURE` by counting `FIGURE` openers and closers via `get_tag()` and `is_tag_closer()`. When an `IMG` opener appears while inside a figure, it reads the already-decoded `src` with `get_attribute()` and collects only non-empty string values in document order.", + "confidence": 91 +} diff --git a/doc-experiment/results/round-46/N02-collect-figure-images/trial-2/candidate.php b/doc-experiment/results/round-46/N02-collect-figure-images/trial-2/candidate.php new file mode 100644 index 0000000000000..a7dac8fce38e8 --- /dev/null +++ b/doc-experiment/results/round-46/N02-collect-figure-images/trial-2/candidate.php @@ -0,0 +1,25 @@ +next_tag( 'IMG' ) ) { + if ( ! in_array( 'FIGURE', $processor->get_breadcrumbs(), true ) ) { + continue; + } + + $src = $processor->get_attribute( 'src' ); + + if ( is_string( $src ) && '' !== $src ) { + $images[] = $src; + } + } + + return $images; +} diff --git a/doc-experiment/results/round-46/N02-collect-figure-images/trial-2/execution.json b/doc-experiment/results/round-46/N02-collect-figure-images/trial-2/execution.json new file mode 100644 index 0000000000000..677da9c438c9b --- /dev/null +++ b/doc-experiment/results/round-46/N02-collect-figure-images/trial-2/execution.json @@ -0,0 +1,129 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/N02-collect-figure-images/trial-2/candidate.php", + "function": "collect_figure_images", + "passed": 9, + "total": 9, + "cases": [ + { + "id": "in-and-out", + "status": "pass", + "expected": [ + "in.jpg" + ], + "actual": [ + "in.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-depth", + "status": "pass", + "expected": [ + "deep.jpg" + ], + "actual": [ + "deep.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-figures", + "status": "pass", + "expected": [ + "a.jpg", + "b.jpg", + "c.jpg" + ], + "actual": [ + "a.jpg", + "b.jpg", + "c.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-figures", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-src-skipped", + "status": "pass", + "expected": [ + "yes.jpg" + ], + "actual": [ + "yes.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-and-valueless-src-skipped", + "status": "pass", + "expected": [ + "yes.jpg" + ], + "actual": [ + "yes.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-decoded-src", + "status": "pass", + "expected": [ + "/i?a=1&b=2" + ], + "actual": [ + "/i?a=1&b=2" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "figcaption-sibling", + "status": "pass", + "expected": [ + "pic.jpg", + "cap.jpg" + ], + "actual": [ + "pic.jpg", + "cap.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-figure", + "status": "pass", + "expected": [ + "open.jpg", + "later.jpg" + ], + "actual": [ + "open.jpg", + "later.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/N02-collect-figure-images/trial-2/response.json b/doc-experiment/results/round-46/N02-collect-figure-images/trial-2/response.json new file mode 100644 index 0000000000000..02f4a6473c993 --- /dev/null +++ b/doc-experiment/results/round-46/N02-collect-figure-images/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure, then scans `IMG` tags in document order with `next_tag( 'IMG' )`, checks ancestor membership via `get_breadcrumbs()`, and reads decoded `src` values with `get_attribute()`, skipping missing or empty values.", + "confidence": 95 +} diff --git a/doc-experiment/results/round-46/N02-collect-figure-images/trial-3/candidate.php b/doc-experiment/results/round-46/N02-collect-figure-images/trial-3/candidate.php new file mode 100644 index 0000000000000..7c772e2e3c78c --- /dev/null +++ b/doc-experiment/results/round-46/N02-collect-figure-images/trial-3/candidate.php @@ -0,0 +1,26 @@ +next_tag( 'IMG' ) ) { + if ( ! in_array( 'FIGURE', $processor->get_breadcrumbs(), true ) ) { + continue; + } + + $src = $processor->get_attribute( 'src' ); + if ( ! is_string( $src ) || '' === $src ) { + continue; + } + + $images[] = $src; + } + + return $images; +} diff --git a/doc-experiment/results/round-46/N02-collect-figure-images/trial-3/execution.json b/doc-experiment/results/round-46/N02-collect-figure-images/trial-3/execution.json new file mode 100644 index 0000000000000..ed9f67769cdf3 --- /dev/null +++ b/doc-experiment/results/round-46/N02-collect-figure-images/trial-3/execution.json @@ -0,0 +1,129 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/N02-collect-figure-images/trial-3/candidate.php", + "function": "collect_figure_images", + "passed": 9, + "total": 9, + "cases": [ + { + "id": "in-and-out", + "status": "pass", + "expected": [ + "in.jpg" + ], + "actual": [ + "in.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-depth", + "status": "pass", + "expected": [ + "deep.jpg" + ], + "actual": [ + "deep.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-figures", + "status": "pass", + "expected": [ + "a.jpg", + "b.jpg", + "c.jpg" + ], + "actual": [ + "a.jpg", + "b.jpg", + "c.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-figures", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-src-skipped", + "status": "pass", + "expected": [ + "yes.jpg" + ], + "actual": [ + "yes.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-and-valueless-src-skipped", + "status": "pass", + "expected": [ + "yes.jpg" + ], + "actual": [ + "yes.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-decoded-src", + "status": "pass", + "expected": [ + "/i?a=1&b=2" + ], + "actual": [ + "/i?a=1&b=2" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "figcaption-sibling", + "status": "pass", + "expected": [ + "pic.jpg", + "cap.jpg" + ], + "actual": [ + "pic.jpg", + "cap.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-figure", + "status": "pass", + "expected": [ + "open.jpg", + "later.jpg" + ], + "actual": [ + "open.jpg", + "later.jpg" + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/N02-collect-figure-images/trial-3/response.json b/doc-experiment/results/round-46/N02-collect-figure-images/trial-3/response.json new file mode 100644 index 0000000000000..e8f4f4d73c8f6 --- /dev/null +++ b/doc-experiment/results/round-46/N02-collect-figure-images/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure, then scans `IMG` openers in document order with `next_tag( 'IMG' )`. For each image, it checks `get_breadcrumbs()` to confirm a `FIGURE` ancestor exists at any depth, and reads the already-decoded `src` with `get_attribute()`, skipping missing or empty values.", + "confidence": 92 +} diff --git a/doc-experiment/results/round-46/N03-first-list-count/judge.json b/doc-experiment/results/round-46/N03-first-list-count/judge.json new file mode 100644 index 0000000000000..981e0b2ff2c43 --- /dev/null +++ b/doc-experiment/results/round-46/N03-first-list-count/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment for structural parsing, scanned for the first UL/OL in document order, bookmarked the opener, walked the subtree with next_token() and get_current_depth(), counted only LI openers at depth + 1, checked paused_at_incomplete_token() and get_last_error(), sought back, set the attribute, released the bookmark, and returned get_updated_html(). Every called API method appears in the rendered docs; execution recorded no _doing_it_wrong misuse." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same documented pattern as trial-1: correct HTML Processor choice, documented token walk and depth guard, bookmark/seek edit, clean-scan checks, set_attribute(), and get_updated_html(). All API calls are documented in the two markdown files and there were no _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly separated finding the first list from scanning its subtree, then used the documented bookmark, next_token(), get_token_type(), is_tag_closer(), get_current_depth(), paused_at_incomplete_token(), get_last_error(), seek(), set_attribute(), release_bookmark(), and get_updated_html() APIs. No hallucinated methods or runtime misuse." + } + ], + "failure_analysis": "All three trials passed all 11 frozen cases, so there were no failed hidden cases to attribute to misconceptions. The rendered docs did unusually well for this task: the HTML Processor overview explicitly says to choose WP_HTML_Processor when document structure matters, while the Tag Processor page warns that it has no nesting depth or ancestor awareness. The next_tag() docs explain that tag_name is not an alternatives list and show scanning any tag then branching on get_tag(), which matches the first-UL-or-OL requirement. The region-before-editing recipe gives the exact bookmark -> next_token() subtree scan -> clean-scan check -> seek back pattern. The direct-child recipe states the three necessary checks: #tag, not a closer, and current depth equal to container depth + 1. The get_current_depth() and next_token() docs also explain why a bounded walk must use >= or break only when depth drops below the opener depth, which prevents undercounting around nested lists and omitted LI closers. The incomplete/unsupported cases were covered by passages warning that virtual closers prove structural exit but not byte completeness, and by the guidance to check paused_at_incomplete_token() and get_last_error() before applying a mutation. A near-miss remains: the rendered next_token() section still includes a stale Since note saying “Added for internal support; do not use,” even though the same page teaches it as the public tool for structural token walks. These subjects followed the examples anyway, but a cautious model could have avoided next_token() because of that contradiction.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::next_token() docblock / rendered Since section", + "problem": "The method is documented with extensive public examples, but its historical Since note still says it was added for internal support and should not be used. That contradicts the surrounding guidance and could discourage the documented structural-walk pattern.", + "suggestion": "Replace the stale “do not use” changelog text with a clear public-use statement, or move any remaining caveat into prose that explains when to prefer next_tag() versus next_token()." + }, + { + "location": "WP_HTML_Tag_Processor::paused_at_incomplete_token() and WP_HTML_Processor subtree-walk examples", + "problem": "The docs say to drain all tokens before interpreting paused_at_incomplete_token(), while bounded subtree scans intentionally stop once the container closes. The examples imply the right behavior, but the contract could be sharper for callers whose result depends only on a region rather than the whole document.", + "suggestion": "Add a short note that for bounded structural scans, paused_at_incomplete_token() and get_last_error() reflect only what has been scanned so far; truncation or unsupported markup after a closed region is not observed unless the caller continues scanning or requires whole-document validation." + }, + { + "location": "WP_HTML_Processor::set_bookmark() / inherited bookmark documentation", + "problem": "The HTML Processor can visit parser-inserted virtual tokens, and bookmarks cannot be set on tokens absent from the original source. The docs mention this, but the failure mode is easy to miss when applying structural recipes that use bookmarks.", + "suggestion": "Add an HTML Processor-specific bookmark note near the structural-walk recipes: always check set_bookmark() because virtual tokens cannot be bookmarked, and bookmark an original source token before walking forward when the later edit must return to that token." + } + ] +} diff --git a/doc-experiment/results/round-46/N03-first-list-count/trial-1/candidate.php b/doc-experiment/results/round-46/N03-first-list-count/trial-1/candidate.php new file mode 100644 index 0000000000000..adf3ef6110088 --- /dev/null +++ b/doc-experiment/results/round-46/N03-first-list-count/trial-1/candidate.php @@ -0,0 +1,57 @@ +next_tag() ) { + $tag = $processor->get_tag(); + if ( 'UL' !== $tag && 'OL' !== $tag ) { + continue; + } + + if ( ! $processor->set_bookmark( 'first-list' ) ) { + return $html; + } + + $list_depth = $processor->get_current_depth(); + $count = 0; + + while ( $processor->next_token() ) { + if ( $processor->get_current_depth() < $list_depth ) { + break; + } + + if ( + '#tag' === $processor->get_token_type() && + ! $processor->is_tag_closer() && + 'LI' === $processor->get_tag() && + $processor->get_current_depth() === $list_depth + 1 + ) { + $count++; + } + } + + $scan_finished_cleanly = + ! $processor->paused_at_incomplete_token() && + null === $processor->get_last_error(); + + if ( ! $scan_finished_cleanly ) { + $processor->release_bookmark( 'first-list' ); + return $html; + } + + if ( ! $processor->seek( 'first-list' ) ) { + $processor->release_bookmark( 'first-list' ); + return $html; + } + + $processor->set_attribute( 'data-item-count', (string) $count ); + $processor->release_bookmark( 'first-list' ); + + return $processor->get_updated_html(); + } + + return $html; +} diff --git a/doc-experiment/results/round-46/N03-first-list-count/trial-1/execution.json b/doc-experiment/results/round-46/N03-first-list-count/trial-1/execution.json new file mode 100644 index 0000000000000..98f7775e29877 --- /dev/null +++ b/doc-experiment/results/round-46/N03-first-list-count/trial-1/execution.json @@ -0,0 +1,107 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/N03-first-list-count/trial-1/candidate.php", + "function": "add_first_list_item_count", + "passed": 11, + "total": 11, + "cases": [ + { + "id": "simple-ul", + "status": "pass", + "expected": "
                      • A
                      • B
                      • C
                      ", + "actual": "
                      • A
                      • B
                      • C
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ol", + "status": "pass", + "expected": "
                      1. A
                      2. B
                      ", + "actual": "
                      1. A
                      2. B
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-list", + "status": "pass", + "expected": "

                      No list here.

                      ", + "actual": "

                      No list here.

                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-count-overwritten", + "status": "pass", + "expected": "
                      • A
                      ", + "actual": "
                      • A
                      ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-li-closers", + "status": "pass", + "expected": "
                      • one
                      • two", + "actual": "
                        • one
                        • two", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-list-counts-direct-children", + "status": "pass", + "expected": "
                            • x
                          • y", + "actual": "
                              • x
                            • y", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-token-inside-list", + "status": "pass", + "expected": "
                              • ", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-images", + "status": "pass", + "expected": "

                                Nothing here.

                                ", + "actual": "

                                Nothing here.

                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unquoted-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-tag-at-end", + "status": "pass", + "expected": "

                                text

                                text

                                next_tag( 'img' ) ) { + $processor->add_class( 'wp-image' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-46/T01-add-image-class/trial-2/execution.json b/doc-experiment/results/round-46/T01-add-image-class/trial-2/execution.json new file mode 100644 index 0000000000000..0c40bbe4bab62 --- /dev/null +++ b/doc-experiment/results/round-46/T01-add-image-class/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T01-add-image-class/trial-2/candidate.php", + "function": "add_image_class", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

                                ", + "actual": "

                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple", + "status": "pass", + "expected": "
                                ", + "actual": "
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-classes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-tag", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-images", + "status": "pass", + "expected": "

                                Nothing here.

                                ", + "actual": "

                                Nothing here.

                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unquoted-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-tag-at-end", + "status": "pass", + "expected": "

                                text

                                text

                                next_tag( 'img' ) ) { + $processor->add_class( 'wp-image' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-46/T01-add-image-class/trial-3/execution.json b/doc-experiment/results/round-46/T01-add-image-class/trial-3/execution.json new file mode 100644 index 0000000000000..fa8fe5816cc91 --- /dev/null +++ b/doc-experiment/results/round-46/T01-add-image-class/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T01-add-image-class/trial-3/candidate.php", + "function": "add_image_class", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "

                                ", + "actual": "

                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple", + "status": "pass", + "expected": "
                                ", + "actual": "
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-classes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-tag", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-images", + "status": "pass", + "expected": "

                                Nothing here.

                                ", + "actual": "

                                Nothing here.

                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unquoted-attributes", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "incomplete-tag-at-end", + "status": "pass", + "expected": "

                                text

                                text

                                next_tag( 'A' ) ) { + if ( null === $processor->get_attribute( 'href' ) ) { + continue; + } + + $processor->set_attribute( 'target', '_blank' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-46/T02-link-targets/trial-1/execution.json b/doc-experiment/results/round-46/T02-link-targets/trial-1/execution.json new file mode 100644 index 0000000000000..8aa0226def49c --- /dev/null +++ b/doc-experiment/results/round-46/T02-link-targets/trial-1/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T02-link-targets/trial-1/candidate.php", + "function": "add_link_targets", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-skipped", + "status": "pass", + "expected": "staygo", + "actual": "staygo", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-target-overwritten", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup-in-link", + "status": "pass", + "expected": "bold move", + "actual": "bold move", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/T02-link-targets/trial-1/response.json b/doc-experiment/results/round-46/T02-link-targets/trial-1/response.json new file mode 100644 index 0000000000000..7eb5299ce975b --- /dev/null +++ b/doc-experiment/results/round-46/T02-link-targets/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for a flat, byte-preserving attribute edit: it scans only `A` tags with `next_tag( 'A' )`, checks `get_attribute( 'href' )` so present-empty and boolean `href` both count while missing `href` does not, then overwrites or adds `target` via `set_attribute()` and returns the result with `get_updated_html()`.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-46/T02-link-targets/trial-2/candidate.php b/doc-experiment/results/round-46/T02-link-targets/trial-2/candidate.php new file mode 100644 index 0000000000000..b69b5917371f2 --- /dev/null +++ b/doc-experiment/results/round-46/T02-link-targets/trial-2/candidate.php @@ -0,0 +1,15 @@ +next_tag( 'A' ) ) { + if ( null === $processor->get_attribute( 'href' ) ) { + continue; + } + + $processor->set_attribute( 'target', '_blank' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-46/T02-link-targets/trial-2/execution.json b/doc-experiment/results/round-46/T02-link-targets/trial-2/execution.json new file mode 100644 index 0000000000000..e2b6a8465034a --- /dev/null +++ b/doc-experiment/results/round-46/T02-link-targets/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T02-link-targets/trial-2/candidate.php", + "function": "add_link_targets", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-skipped", + "status": "pass", + "expected": "staygo", + "actual": "staygo", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-target-overwritten", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup-in-link", + "status": "pass", + "expected": "bold move", + "actual": "bold move", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/T02-link-targets/trial-2/response.json b/doc-experiment/results/round-46/T02-link-targets/trial-2/response.json new file mode 100644 index 0000000000000..f4f7d1c55d503 --- /dev/null +++ b/doc-experiment/results/round-46/T02-link-targets/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` for byte-precise attribute edits: it scans only `A` tag openers with `next_tag( 'A' )`, checks `get_attribute( 'href' )` so empty-string and boolean `href` values still count as present, overwrites or adds `target` via `set_attribute( 'target', '_blank' )`, and returns the modified markup with `get_updated_html()`.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-46/T02-link-targets/trial-3/candidate.php b/doc-experiment/results/round-46/T02-link-targets/trial-3/candidate.php new file mode 100644 index 0000000000000..972d891e974db --- /dev/null +++ b/doc-experiment/results/round-46/T02-link-targets/trial-3/candidate.php @@ -0,0 +1,14 @@ +next_tag( 'A' ) ) { + if ( null === $processor->get_attribute( 'href' ) ) { + continue; + } + + $processor->set_attribute( 'target', '_blank' ); + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-46/T02-link-targets/trial-3/execution.json b/doc-experiment/results/round-46/T02-link-targets/trial-3/execution.json new file mode 100644 index 0000000000000..04c94c5a86939 --- /dev/null +++ b/doc-experiment/results/round-46/T02-link-targets/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T02-link-targets/trial-3/candidate.php", + "function": "add_link_targets", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-skipped", + "status": "pass", + "expected": "staygo", + "actual": "staygo", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href-counts", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-target-overwritten", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "uppercase-attribute", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "inside-comment-ignored", + "status": "pass", + "expected": "go", + "actual": "go", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup-in-link", + "status": "pass", + "expected": "bold move", + "actual": "bold move", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/T02-link-targets/trial-3/response.json b/doc-experiment/results/round-46/T02-link-targets/trial-3/response.json new file mode 100644 index 0000000000000..39eb20e39dbe6 --- /dev/null +++ b/doc-experiment/results/round-46/T02-link-targets/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` because the task is a flat attribute rewrite and requires preserving untouched bytes exactly. The function scans only `A` tags with `next_tag( 'A' )`, checks `get_attribute( 'href' )` so present-but-empty (`\"\"`) and boolean (`true`) `href` values still count as present, then overwrites or creates `target` with `set_attribute( 'target', '_blank' )`, and finally returns the result with `get_updated_html()`.", + "confidence": 98 +} diff --git a/doc-experiment/results/round-46/T03-first-h1-text/judge.json b/doc-experiment/results/round-46/T03-first-h1-text/judge.json new file mode 100644 index 0000000000000..da2f62ca42fe1 --- /dev/null +++ b/doc-experiment/results/round-46/T03-first-h1-text/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the correct tree-aware processor and only documented calls: WP_HTML_Processor::create_fragment(), next_tag(), get_current_depth(), next_token(), get_token_type(), and get_modifiable_text(). The implementation follows the documented subtree text recipe: record opener depth, walk tokens while depth is >= that depth, append only #text token modifiable text, and distinguish no H1 from an empty H1. execution.json passed 8/8 with no _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same canonical pattern as trial-1. Processor choice, documented method use, depth-bounded token walking, #text filtering, decoded text handling, no-H1 null return, image-only empty string, and unclosed H1 behavior all align with the rendered docs. execution.json passed 8/8 with no _doing_it_wrong records." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same canonical pattern as the reference. It uses the HTML Processor for structural text extraction, avoids broad get_modifiable_text() reads on non-text tokens, and relies on the documented virtual-closing/depth behavior for malformed input. execution.json passed 8/8 with no _doing_it_wrong records." + } + ], + "failure_analysis": "No hidden case failed in any trial. The docs worked well because the HTML Processor overview explicitly says to choose WP_HTML_Processor when collecting an element's text or walking a subtree; the 'Recipe: collect DOM-style text from a subtree' gives the exact depth-bounded #text-token pattern; get_modifiable_text() documents decoded text semantics; next_token() and get_current_depth() explain that unclosed elements still get closing tokens and that the guard must be >=, not >. Near-misses: the candidates did not discuss unsupported-parser errors or special-element opt-in text, but those were not required by this task and the chosen #text-only policy matches ordinary H1 text extraction.", + "doc_gaps": [ + { + "location": "/tmp/html-api-docs-eval/round-46/html-processor.md - Recipe: collect DOM-style text from a subtree", + "problem": "The recipe implies, but does not directly state, the return-value distinction between 'target element exists but has no ordinary text descendants' and 'target element was not found'.", + "suggestion": "Add a general note that subtree text collectors should initialize the accumulator only after the target element is found; an existing element with no included text tokens yields an empty string, while absence of the target is a caller-defined not-found value such as null." + }, + { + "location": "/tmp/html-api-docs-eval/round-46/html-processor.md - next_token() / get_current_depth()", + "problem": "The examples correctly use >=, but variable names like $depth_inside_li can obscure that the recorded value is the opener's depth and that equality is intentionally part of the subtree.", + "suggestion": "Use names such as $container_depth or $opener_depth in examples and state once more that descendant text and nested closers may report depth equal to the opener, so <= is the wrong break condition." + }, + { + "location": "/tmp/html-api-docs-eval/round-46/html-processor.md - Recipe: collect DOM-style text from a subtree", + "problem": "The docs explain incomplete-token and unsupported-markup checks mostly in mutation/rewrite contexts; read-only extraction policy is left to inference.", + "suggestion": "Add a short policy note for read-only scans: if partial best-effort text is unacceptable, check paused_at_incomplete_token() and get_last_error() after the walk and return the caller's fallback value." + } + ] +} diff --git a/doc-experiment/results/round-46/T03-first-h1-text/trial-1/candidate.php b/doc-experiment/results/round-46/T03-first-h1-text/trial-1/candidate.php new file mode 100644 index 0000000000000..9853515781c4d --- /dev/null +++ b/doc-experiment/results/round-46/T03-first-h1-text/trial-1/candidate.php @@ -0,0 +1,23 @@ +next_tag( 'H1' ) ) { + return null; + } + + $h1_depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-46/T03-first-h1-text/trial-1/execution.json b/doc-experiment/results/round-46/T03-first-h1-text/trial-1/execution.json new file mode 100644 index 0000000000000..d9aaaf3a4b767 --- /dev/null +++ b/doc-experiment/results/round-46/T03-first-h1-text/trial-1/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T03-first-h1-text/trial-1/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/T03-first-h1-text/trial-1/response.json b/doc-experiment/results/round-46/T03-first-h1-text/trial-1/response.json new file mode 100644 index 0000000000000..d890d71a67c75 --- /dev/null +++ b/doc-experiment/results/round-46/T03-first-h1-text/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware task. It finds the first `H1` with `next_tag()`, records that element\u2019s depth with `get_current_depth()`, then walks forward with `next_token()` while still inside that subtree and concatenates only `#text` tokens via `get_modifiable_text()`, which already returns decoded text.", + "confidence": 92 +} diff --git a/doc-experiment/results/round-46/T03-first-h1-text/trial-2/candidate.php b/doc-experiment/results/round-46/T03-first-h1-text/trial-2/candidate.php new file mode 100644 index 0000000000000..46bfdf8965376 --- /dev/null +++ b/doc-experiment/results/round-46/T03-first-h1-text/trial-2/candidate.php @@ -0,0 +1,24 @@ +next_tag( 'H1' ) ) { + return null; + } + + $h1_depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-46/T03-first-h1-text/trial-2/execution.json b/doc-experiment/results/round-46/T03-first-h1-text/trial-2/execution.json new file mode 100644 index 0000000000000..12986817ad655 --- /dev/null +++ b/doc-experiment/results/round-46/T03-first-h1-text/trial-2/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T03-first-h1-text/trial-2/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/T03-first-h1-text/trial-2/response.json b/doc-experiment/results/round-46/T03-first-h1-text/trial-2/response.json new file mode 100644 index 0000000000000..a86010208ca1e --- /dev/null +++ b/doc-experiment/results/round-46/T03-first-h1-text/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction depends on document structure. It finds the first `H1` with `next_tag()`, records that opener's depth with `get_current_depth()`, then walks the subtree with `next_token()` while the depth stays within the heading, appending only ordinary `#text` tokens via `get_token_type()` and `get_modifiable_text()`, which already returns decoded text.", + "confidence": 92 +} diff --git a/doc-experiment/results/round-46/T03-first-h1-text/trial-3/candidate.php b/doc-experiment/results/round-46/T03-first-h1-text/trial-3/candidate.php new file mode 100644 index 0000000000000..9853515781c4d --- /dev/null +++ b/doc-experiment/results/round-46/T03-first-h1-text/trial-3/candidate.php @@ -0,0 +1,23 @@ +next_tag( 'H1' ) ) { + return null; + } + + $h1_depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $h1_depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + return $text; +} diff --git a/doc-experiment/results/round-46/T03-first-h1-text/trial-3/execution.json b/doc-experiment/results/round-46/T03-first-h1-text/trial-3/execution.json new file mode 100644 index 0000000000000..bc51e08355296 --- /dev/null +++ b/doc-experiment/results/round-46/T03-first-h1-text/trial-3/execution.json @@ -0,0 +1,80 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T03-first-h1-text/trial-3/candidate.php", + "function": "get_first_h1_text", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "Hello", + "actual": "Hello", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-markup", + "status": "pass", + "expected": "A B C", + "actual": "A B C", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-decoded", + "status": "pass", + "expected": "Fish & Chips — daily", + "actual": "Fish & Chips — daily", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-h1-null", + "status": "pass", + "expected": null, + "actual": null, + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-only-empty-string", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-of-two", + "status": "pass", + "expected": "First", + "actual": "First", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "nested-in-div", + "status": "pass", + "expected": "Deep title", + "actual": "Deep title", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-h1", + "status": "pass", + "expected": "Runs to the end", + "actual": "Runs to the end", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/T03-first-h1-text/trial-3/response.json b/doc-experiment/results/round-46/T03-first-h1-text/trial-3/response.json new file mode 100644 index 0000000000000..0db2208504d3d --- /dev/null +++ b/doc-experiment/results/round-46/T03-first-h1-text/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction is a tree-aware operation. It finds the first `H1` with `next_tag('H1')`, records that element's depth with `get_current_depth()`, then walks forward with `next_token()` while still inside the `H1` subtree and concatenates only `#text` tokens via `get_modifiable_text()`, which provides decoded text content.", + "confidence": 93 +} diff --git a/doc-experiment/results/round-46/T04-build-figure/judge.json b/doc-experiment/results/round-46/T04-build-figure/judge.json new file mode 100644 index 0000000000000..e2138282c305a --- /dev/null +++ b/doc-experiment/results/round-46/T04-build-figure/judge.json @@ -0,0 +1,35 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly used WP_HTML_Tag_Processor with a literal template, predeclared src/alt attributes to preserve order, walked tokens to the #text placeholder, used set_modifiable_text() for encoded caption text, and returned get_updated_html(). All called methods appear in the rendered docs and execution recorded no misuse." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same reference-quality pattern as trial-1. Processor choice, documented method usage, token walking, text replacement, attribute encoding, and get_updated_html() retrieval all match the documented template-building guidance." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same reference-quality pattern as trial-1. It relies on documented plaintext input semantics for set_attribute() and set_modifiable_text(), so quotes, ampersands, angle brackets, and script-like caption text are encoded rather than parsed." + } + ], + "failure_analysis": "No hidden cases failed in any trial. The rendered docs did especially well on this task: the Tag Processor overview clearly chose WP_HTML_Tag_Processor for flat byte-preserving edits, the 'Building markup from a template' section directly explained using a literal template with pre-existing attributes and placeholder text, set_attribute() documented plaintext input plus attribute-order behavior, set_modifiable_text() documented replacing only modifiable text tokens, and get_updated_html() was clearly presented as the way to retrieve queued edits. The only near-miss is that all candidates copied the recipe's unchecked set_modifiable_text() call; this is harmless for a fixed trusted template and a #text guard, but for variable templates it could silently do nothing if no placeholder text node exists.", + "doc_gaps": [ + { + "location": "html-tag-processor.md: Building markup from a template / set_modifiable_text()", + "problem": "The method section says set_modifiable_text() returns false when the current token is not modifiable and says to check the return value, but the template-building recipe does not model that check.", + "suggestion": "Add a short note or example branch explaining that a fixed trusted template with a guarded #text token is deterministic, while variable templates should handle a missing placeholder or false return." + }, + { + "location": "html-tag-processor.md: Building markup from a template", + "problem": "The docs imply, but do not state explicitly, that the Tag Processor builds new fragments by modifying an existing valid template rather than by appending or creating arbitrary nodes.", + "suggestion": "Add one general sentence: when constructing markup, include every required element, attribute slot, and text placeholder in the template, then replace values through the API." + } + ] +} diff --git a/doc-experiment/results/round-46/T04-build-figure/trial-1/candidate.php b/doc-experiment/results/round-46/T04-build-figure/trial-1/candidate.php new file mode 100644 index 0000000000000..ca65d3d8c6a0c --- /dev/null +++ b/doc-experiment/results/round-46/T04-build-figure/trial-1/candidate.php @@ -0,0 +1,17 @@ +
                                .
                                ' ); + + $processor->next_tag( 'img' ); + $processor->set_attribute( 'src', $url ); + $processor->set_attribute( 'alt', $alt ); + + while ( $processor->next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $processor->set_modifiable_text( $caption ); + break; + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-46/T04-build-figure/trial-1/execution.json b/doc-experiment/results/round-46/T04-build-figure/trial-1/execution.json new file mode 100644 index 0000000000000..c94ef4cb3e9fd --- /dev/null +++ b/doc-experiment/results/round-46/T04-build-figure/trial-1/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T04-build-figure/trial-1/candidate.php", + "function": "build_figure", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "
                                \"A
                                My dog
                                ", + "actual": "
                                \"A
                                My dog
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ampersand-in-caption", + "status": "pass", + "expected": "
                                \"Pair\"
                                Fish & Chips
                                ", + "actual": "
                                \"Pair\"
                                Fish & Chips
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "quotes-in-alt", + "status": "pass", + "expected": "
                                \"The
                                Caption
                                ", + "actual": "
                                \"The
                                Caption
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "special-chars-in-url", + "status": "pass", + "expected": "
                                \"Alt\"
                                Caption
                                ", + "actual": "
                                \"Alt\"
                                Caption
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "angle-brackets-in-caption", + "status": "pass", + "expected": "
                                \"Code\"
                                Use <em> tags & enjoy
                                ", + "actual": "
                                \"Code\"
                                Use <em> tags & enjoy
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unicode", + "status": "pass", + "expected": "
                                \"Schnée
                                Winter 🌨️ scene
                                ", + "actual": "
                                \"Schnée
                                Winter 🌨️ scene
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "html-in-caption-not-parsed", + "status": "pass", + "expected": "
                                \"alt\"
                                <script>alert(1)</script>
                                ", + "actual": "
                                \"alt\"
                                <script>alert(1)</script>
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/T04-build-figure/trial-1/response.json b/doc-experiment/results/round-46/T04-build-figure/trial-1/response.json new file mode 100644 index 0000000000000..7e6e1a5f73ff6 --- /dev/null +++ b/doc-experiment/results/round-46/T04-build-figure/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` with a fixed HTML template so the output shape and the `src` then `alt` attribute order are preserved exactly. It updates the `` attributes with `next_tag()` and `set_attribute()`, then walks tokens with `next_token()` until the placeholder `#text` node inside `
                                ` and replaces it with `set_modifiable_text()`, finally returning the result with `get_updated_html()`.", + "confidence": 96 +} diff --git a/doc-experiment/results/round-46/T04-build-figure/trial-2/candidate.php b/doc-experiment/results/round-46/T04-build-figure/trial-2/candidate.php new file mode 100644 index 0000000000000..ca65d3d8c6a0c --- /dev/null +++ b/doc-experiment/results/round-46/T04-build-figure/trial-2/candidate.php @@ -0,0 +1,17 @@ +
                                .
                                ' ); + + $processor->next_tag( 'img' ); + $processor->set_attribute( 'src', $url ); + $processor->set_attribute( 'alt', $alt ); + + while ( $processor->next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $processor->set_modifiable_text( $caption ); + break; + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-46/T04-build-figure/trial-2/execution.json b/doc-experiment/results/round-46/T04-build-figure/trial-2/execution.json new file mode 100644 index 0000000000000..39be7b38bb5ca --- /dev/null +++ b/doc-experiment/results/round-46/T04-build-figure/trial-2/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T04-build-figure/trial-2/candidate.php", + "function": "build_figure", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "
                                \"A
                                My dog
                                ", + "actual": "
                                \"A
                                My dog
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ampersand-in-caption", + "status": "pass", + "expected": "
                                \"Pair\"
                                Fish & Chips
                                ", + "actual": "
                                \"Pair\"
                                Fish & Chips
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "quotes-in-alt", + "status": "pass", + "expected": "
                                \"The
                                Caption
                                ", + "actual": "
                                \"The
                                Caption
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "special-chars-in-url", + "status": "pass", + "expected": "
                                \"Alt\"
                                Caption
                                ", + "actual": "
                                \"Alt\"
                                Caption
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "angle-brackets-in-caption", + "status": "pass", + "expected": "
                                \"Code\"
                                Use <em> tags & enjoy
                                ", + "actual": "
                                \"Code\"
                                Use <em> tags & enjoy
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unicode", + "status": "pass", + "expected": "
                                \"Schnée
                                Winter 🌨️ scene
                                ", + "actual": "
                                \"Schnée
                                Winter 🌨️ scene
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "html-in-caption-not-parsed", + "status": "pass", + "expected": "
                                \"alt\"
                                <script>alert(1)</script>
                                ", + "actual": "
                                \"alt\"
                                <script>alert(1)</script>
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/T04-build-figure/trial-2/response.json b/doc-experiment/results/round-46/T04-build-figure/trial-2/response.json new file mode 100644 index 0000000000000..ba45e2e5c2043 --- /dev/null +++ b/doc-experiment/results/round-46/T04-build-figure/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Tag_Processor` with a literal template so the fragment shape and attribute order stay exact. It updates the existing `img` attributes with `set_attribute()` and replaces the placeholder figcaption text node with `set_modifiable_text()`, relying on the documented HTML API to perform all required HTML encoding for attribute values and text.", + "confidence": 96 +} diff --git a/doc-experiment/results/round-46/T04-build-figure/trial-3/candidate.php b/doc-experiment/results/round-46/T04-build-figure/trial-3/candidate.php new file mode 100644 index 0000000000000..b5d7a41b97f48 --- /dev/null +++ b/doc-experiment/results/round-46/T04-build-figure/trial-3/candidate.php @@ -0,0 +1,18 @@ +
                                .
                                ' ); + + $processor->next_tag( 'img' ); + $processor->set_attribute( 'src', $url ); + $processor->set_attribute( 'alt', $alt ); + + while ( $processor->next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $processor->set_modifiable_text( $caption ); + break; + } + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-46/T04-build-figure/trial-3/execution.json b/doc-experiment/results/round-46/T04-build-figure/trial-3/execution.json new file mode 100644 index 0000000000000..82cda6597389c --- /dev/null +++ b/doc-experiment/results/round-46/T04-build-figure/trial-3/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T04-build-figure/trial-3/candidate.php", + "function": "build_figure", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": "
                                \"A
                                My dog
                                ", + "actual": "
                                \"A
                                My dog
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ampersand-in-caption", + "status": "pass", + "expected": "
                                \"Pair\"
                                Fish & Chips
                                ", + "actual": "
                                \"Pair\"
                                Fish & Chips
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "quotes-in-alt", + "status": "pass", + "expected": "
                                \"The
                                Caption
                                ", + "actual": "
                                \"The
                                Caption
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "special-chars-in-url", + "status": "pass", + "expected": "
                                \"Alt\"
                                Caption
                                ", + "actual": "
                                \"Alt\"
                                Caption
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "angle-brackets-in-caption", + "status": "pass", + "expected": "
                                \"Code\"
                                Use <em> tags & enjoy
                                ", + "actual": "
                                \"Code\"
                                Use <em> tags & enjoy
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unicode", + "status": "pass", + "expected": "
                                \"Schnée
                                Winter 🌨️ scene
                                ", + "actual": "
                                \"Schnée
                                Winter 🌨️ scene
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "html-in-caption-not-parsed", + "status": "pass", + "expected": "
                                \"alt\"
                                <script>alert(1)</script>
                                ", + "actual": "
                                \"alt\"
                                <script>alert(1)</script>
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/T04-build-figure/trial-3/response.json b/doc-experiment/results/round-46/T04-build-figure/trial-3/response.json new file mode 100644 index 0000000000000..37a8da90eb659 --- /dev/null +++ b/doc-experiment/results/round-46/T04-build-figure/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This builds the fragment from a fixed HTML template so the output shape and attribute order stay exact, then uses `WP_HTML_Tag_Processor::set_attribute()` to safely encode `src` and `alt`, and `next_token()` plus `get_token_type()` and `set_modifiable_text()` to replace the placeholder figcaption text with the provided caption before returning `get_updated_html()`.", + "confidence": 96 +} diff --git a/doc-experiment/results/round-46/T05-text-excerpt/judge.json b/doc-experiment/results/round-46/T05-text-excerpt/judge.json new file mode 100644 index 0000000000000..83bd603ce590e --- /dev/null +++ b/doc-experiment/results/round-46/T05-text-excerpt/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 94, + "hallucinated_methods": [], + "notes": "Used the correct `WP_HTML_Processor::create_fragment()` path and only documented API calls: `next_token()`, `get_token_type()`, `get_modifiable_text()`, `is_tag_closer()`, `get_token_name()`, and `get_last_error()`. The text-token policy is otherwise idiomatic and handles decoded text, `TITLE`/`TEXTAREA`, and `SCRIPT`/`STYLE` exclusion. Minor adherence loss: it scans past the requested limit and then returns empty on any later parser error, which is a caller-policy choice not required for this read-only prefix extraction." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Best match to the documented API contract. It chooses the HTML Processor, checks factory `null`, walks one token stream, reads only ordinary `#text` plus whitelisted opening `TITLE`/`TEXTAREA` tokens, relies on documented decoded UTF-8 text, excludes raw special elements, and truncates with `mb_*` using explicit UTF-8 while stopping once the requested prefix is complete." + }, + { + "trial_id": "trial-3", + "adherence": 94, + "hallucinated_methods": [], + "notes": "Equivalent API usage to trial-1. All called HTML API methods are documented and there were no `_doing_it_wrong` records. The implementation follows the documented special-element text handling, but shares the same overbroad post-scan `get_last_error()` fallback and no early stop after the limit, which can discard a valid prefix if unsupported markup appears later." + } + ], + "failure_analysis": "All three trials passed all 10 frozen cases, so there are no failed hidden cases to attribute. The docs did well on the main hazards: the Tag Processor overview says to use the HTML Processor for structure and DOM-style text extraction; the HTML Processor `next_token()` docs explain that text may be split across multiple `#text` tokens and that `SCRIPT`, `STYLE`, `TITLE`, and `TEXTAREA` do not produce child `#text` tokens; `get_modifiable_text()` states that `#text`, `TITLE`, and `TEXTAREA` are decoded UTF-8 while `SCRIPT` and `STYLE` are raw. The near-miss is trials 1 and 3: they interpreted the `create_fragment()`/`get_last_error()` guidance as a reason to discard the whole read-only result after any later unsupported markup. In a probe, the reference and trial-2 return `abc` for `

                                abcdef

                                onetwothree` with limit 3, while trials 1 and 3 return empty because they continue scanning into the unsupported misnesting and then reject. That did not appear in the frozen cases, but it shows an ambiguity between mutation/serialization safety guidance and best-effort read-only extraction. Incomplete trailing syntax was not explicitly tested beyond malformed nesting; none of the candidates checked `paused_at_incomplete_token()`, which is acceptable only if the caller's policy is best-effort accumulation of visited text.", + "doc_gaps": [ + { + "location": "`WP_HTML_Processor::get_last_error()` and `WP_HTML_Processor::create_fragment()` docs", + "problem": "The docs say to detect unsupported markup after scanning, but they do not clearly separate read-only extraction policy from mutation or serialization policy. This can lead callers to throw away already collected data even when their contract only needs a bounded prefix.", + "suggestion": "Clarify that non-null `get_last_error()` means the walk stopped before completing the document; mutation and serialization routines should reject or fall back, while read-only extractors must choose and document whether partial accumulated data is acceptable." + }, + { + "location": "`WP_HTML_Processor::next_token()` text-walking examples", + "problem": "The examples emphasize complete scans, but bounded reads are a common pattern. Continuing after the caller has enough data can expose later unsupported markup and change the result under an overbroad error policy.", + "suggestion": "Add a general note that callers collecting a prefix, count, or first match may stop once the result is satisfied, and that any subsequent unsupported markup is irrelevant unless the caller's contract requires validating the whole input." + }, + { + "location": "`WP_HTML_Tag_Processor::paused_at_incomplete_token()` cross-references from `WP_HTML_Processor::next_token()`", + "problem": "Incomplete-input behavior is documented, but the read-only extraction consequence is spread across sections: incomplete trailing tokens are not visited, while already visited text remains available.", + "suggestion": "Add a concise policy note for token collectors: check `paused_at_incomplete_token()` only when complete source bytes are required; otherwise accumulated text from visited tokens is a best-effort result and incomplete trailing syntax contributes nothing." + } + ] +} diff --git a/doc-experiment/results/round-46/T05-text-excerpt/trial-1/candidate.php b/doc-experiment/results/round-46/T05-text-excerpt/trial-1/candidate.php new file mode 100644 index 0000000000000..a2e98bc562a27 --- /dev/null +++ b/doc-experiment/results/round-46/T05-text-excerpt/trial-1/candidate.php @@ -0,0 +1,40 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + continue; + } + + if ( '#tag' !== $processor->get_token_type() || $processor->is_tag_closer() ) { + continue; + } + + $tag_name = $processor->get_token_name(); + if ( 'TITLE' === $tag_name || 'TEXTAREA' === $tag_name ) { + $text .= $processor->get_modifiable_text(); + } + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + if ( mb_strlen( $text, 'UTF-8' ) <= $max_codepoints ) { + return $text; + } + + return mb_substr( $text, 0, $max_codepoints, 'UTF-8' ); +} diff --git a/doc-experiment/results/round-46/T05-text-excerpt/trial-1/execution.json b/doc-experiment/results/round-46/T05-text-excerpt/trial-1/execution.json new file mode 100644 index 0000000000000..046a787fea29b --- /dev/null +++ b/doc-experiment/results/round-46/T05-text-excerpt/trial-1/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T05-text-excerpt/trial-1/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/T05-text-excerpt/trial-1/response.json b/doc-experiment/results/round-46/T05-text-excerpt/trial-1/response.json new file mode 100644 index 0000000000000..5ee9cf1e67de4 --- /dev/null +++ b/doc-experiment/results/round-46/T05-text-excerpt/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, then walks all tokens with `next_token()`, appending decoded text from ordinary `#text` tokens and, as the docs require, also appending opener-token text from `TITLE` and `TEXTAREA` via `get_modifiable_text()`. It skips other special-element text such as `SCRIPT` and `STYLE`, and truncates by Unicode code points with `mb_strlen()` and `mb_substr()` using explicit `UTF-8` encoding.", + "confidence": 86 +} diff --git a/doc-experiment/results/round-46/T05-text-excerpt/trial-2/candidate.php b/doc-experiment/results/round-46/T05-text-excerpt/trial-2/candidate.php new file mode 100644 index 0000000000000..12ec53d93ca29 --- /dev/null +++ b/doc-experiment/results/round-46/T05-text-excerpt/trial-2/candidate.php @@ -0,0 +1,44 @@ + 0 && $processor->next_token() ) { + $chunk = ''; + $token_type = $processor->get_token_type(); + + if ( '#text' === $token_type ) { + $chunk = $processor->get_modifiable_text(); + } elseif ( '#tag' === $token_type && ! $processor->is_tag_closer() ) { + $token_name = $processor->get_token_name(); + if ( 'TITLE' === $token_name || 'TEXTAREA' === $token_name ) { + $chunk = $processor->get_modifiable_text(); + } + } + + if ( '' === $chunk ) { + continue; + } + + $chunk_length = mb_strlen( $chunk, 'UTF-8' ); + if ( $chunk_length <= $remaining ) { + $text .= $chunk; + $remaining -= $chunk_length; + continue; + } + + $text .= mb_substr( $chunk, 0, $remaining, 'UTF-8' ); + break; + } + + return $text; +} diff --git a/doc-experiment/results/round-46/T05-text-excerpt/trial-2/execution.json b/doc-experiment/results/round-46/T05-text-excerpt/trial-2/execution.json new file mode 100644 index 0000000000000..d4e8d014610eb --- /dev/null +++ b/doc-experiment/results/round-46/T05-text-excerpt/trial-2/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T05-text-excerpt/trial-2/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/T05-text-excerpt/trial-2/response.json b/doc-experiment/results/round-46/T05-text-excerpt/trial-2/response.json new file mode 100644 index 0000000000000..6739f4d1ef15c --- /dev/null +++ b/doc-experiment/results/round-46/T05-text-excerpt/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY-context fragment structurally, then walks tokens with `next_token()`, appending only ordinary `#text` tokens plus opening `TITLE` and `TEXTAREA` tokens whose text is exposed via `get_modifiable_text()`. It preserves whitespace exactly as reported by the parser, relies on the documented decoded-text behavior for character references, excludes non-text-bearing tokens like `SCRIPT` and `STYLE`, and truncates safely by Unicode code points with `mb_strlen()`/`mb_substr()` in UTF-8.", + "confidence": 91 +} diff --git a/doc-experiment/results/round-46/T05-text-excerpt/trial-3/candidate.php b/doc-experiment/results/round-46/T05-text-excerpt/trial-3/candidate.php new file mode 100644 index 0000000000000..e728bf3dca642 --- /dev/null +++ b/doc-experiment/results/round-46/T05-text-excerpt/trial-3/candidate.php @@ -0,0 +1,39 @@ +next_token() ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + continue; + } + + if ( '#tag' !== $processor->get_token_type() || $processor->is_tag_closer() ) { + continue; + } + + $tag_name = $processor->get_token_name(); + if ( 'TEXTAREA' === $tag_name || 'TITLE' === $tag_name ) { + $text .= $processor->get_modifiable_text(); + } + } + + if ( null !== $processor->get_last_error() ) { + return ''; + } + + if ( mb_strlen( $text, 'UTF-8' ) <= $max_codepoints ) { + return $text; + } + + return mb_substr( $text, 0, $max_codepoints, 'UTF-8' ); +} diff --git a/doc-experiment/results/round-46/T05-text-excerpt/trial-3/execution.json b/doc-experiment/results/round-46/T05-text-excerpt/trial-3/execution.json new file mode 100644 index 0000000000000..0fbcaa3eeeca0 --- /dev/null +++ b/doc-experiment/results/round-46/T05-text-excerpt/trial-3/execution.json @@ -0,0 +1,98 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T05-text-excerpt/trial-3/candidate.php", + "function": "html_text_excerpt", + "passed": 10, + "total": 10, + "cases": [ + { + "id": "no-truncation-needed", + "status": "pass", + "expected": "Just a link to content.", + "actual": "Just a link to content.", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "truncate-mid-link", + "status": "pass", + "expected": "Just a l", + "actual": "Just a l", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-count-decoded", + "status": "pass", + "expected": "Fish &", + "actual": "Fish &", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multibyte-emoji", + "status": "pass", + "expected": "ab🌨️", + "actual": "ab🌨️", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "accented", + "status": "pass", + "expected": "cafés", + "actual": "cafés", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "script-excluded", + "status": "pass", + "expected": "beforeafter", + "actual": "beforeafter", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "textarea-title-counts-script-style-excluded", + "status": "pass", + "expected": "form & fieldDoc & TitleBody", + "actual": "form & fieldDoc & TitleBody", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "interelement-whitespace", + "status": "pass", + "expected": "a b", + "actual": "a b", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "zero-limit", + "status": "pass", + "expected": "", + "actual": "", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "malformed-nesting", + "status": "pass", + "expected": "onetwotail", + "actual": "onetwotail", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/T05-text-excerpt/trial-3/response.json b/doc-experiment/results/round-46/T05-text-excerpt/trial-3/response.json new file mode 100644 index 0000000000000..8a5a546349ffc --- /dev/null +++ b/doc-experiment/results/round-46/T05-text-excerpt/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, then walks tokens with `next_token()`. It appends decoded text from ordinary `#text` tokens via `get_token_type()` and `get_modifiable_text()`, and explicitly opt-ins to special opener tokens `TITLE` and `TEXTAREA` using `get_token_name()` and `is_tag_closer()`, which excludes `SCRIPT` and `STYLE` as required. The final excerpt is truncated by Unicode code points with `mb_strlen()` and `mb_substr()` using UTF-8.", + "confidence": 90 +} diff --git a/doc-experiment/results/round-46/T06-collect-links/judge.json b/doc-experiment/results/round-46/T06-collect-links/judge.json new file mode 100644 index 0000000000000..6f4a03343ddb1 --- /dev/null +++ b/doc-experiment/results/round-46/T06-collect-links/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used WP_HTML_Processor::create_fragment(), the documented depth-bounded next_token() subtree walk, #text filtering, get_modifiable_text() for decoded text, and is_string(get_attribute('href')) to exclude absent and valueless attributes while preserving empty-string href values. All called methods appear in the rendered docs; no _doing_it_wrong records." + }, + { + "trial_id": "trial-2", + "adherence": 95, + "hallucinated_methods": [], + "notes": "Same correct API pattern as the reference, and all called methods are documented. The final paused_at_incomplete_token()/get_last_error() rejection is overbroad for this read-only extraction contract: a valid collected link followed by a truncated trailing token would be discarded. Hidden tests still passed and no _doing_it_wrong records appeared." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Used the intended HTML Processor, documented token walk, depth boundary, #text token filtering, decoded text retrieval, and string-only href filtering. All called methods appear in the rendered docs; no _doing_it_wrong records." + } + ], + "failure_analysis": "All trials passed all 8 hidden cases, so there are no failed hidden cases to attribute. The docs did well at steering models toward WP_HTML_Processor instead of WP_HTML_Tag_Processor: the processor-choice sections explicitly say text extraction and subtree walking need structural awareness. The strongest passage was the HTML Processor recipe for collecting DOM-style text from a subtree, plus next_token()/get_current_depth() guidance showing the >= depth guard, split #text tokens, virtual closers for malformed input, and decoded get_modifiable_text(). Attribute handling was also mostly clear: get_attribute() documents string|true|null, boolean attributes returning true, absent attributes returning null, and decoded attribute values in the Tag Processor page. The only near-miss was trial-2's global incomplete-input rejection. The docs say incomplete-token handling is caller policy, but examples showing $scan_finished_cleanly after subtree walks can be read as a default extraction pattern rather than a policy choice for mutations or strict-input callers.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_attribute() rendered method docs", + "problem": "The HTML Processor page lists the string|true|null contract but omits the decoded-string paragraph that appears on the Tag Processor page. A model using WP_HTML_Processor directly has to infer inherited decoding semantics from the other file.", + "suggestion": "Duplicate or inherit-render the key contract on the HTML Processor method: string values are already decoded, valueless boolean attributes return true, absent/unavailable attributes return null, and an explicit empty value returns ''. " + }, + { + "location": "WP_HTML_Processor::next_token() and get_current_depth() incomplete-input guidance", + "problem": "The examples correctly mention paused_at_incomplete_token(), but they do not sharply separate read-only best-effort extraction from strict validation or mutation workflows. This encouraged trial-2 to discard valid collected data because unrelated trailing syntax was incomplete.", + "suggestion": "Add a policy note: virtual closers make subtree extraction structurally reliable even for malformed/unclosed elements; check paused_at_incomplete_token() only when the caller contract requires rejecting truncated source, and avoid throwing away already-collected read-only results by default." + }, + { + "location": "Inherited methods on the WP_HTML_Processor page", + "problem": "paused_at_incomplete_token() is callable on WP_HTML_Processor through inheritance but is only fully documented on the Tag Processor page. The HTML Processor page references it without a local inherited-method entry explaining the same semantics in processor terms.", + "suggestion": "Render inherited public methods used by processor workflows, or add a short inherited-methods section linking to paused_at_incomplete_token() with HTML Processor-specific wording about scanning to the end before reading the flag." + } + ] +} diff --git a/doc-experiment/results/round-46/T06-collect-links/trial-1/candidate.php b/doc-experiment/results/round-46/T06-collect-links/trial-1/candidate.php new file mode 100644 index 0000000000000..9650e76cce616 --- /dev/null +++ b/doc-experiment/results/round-46/T06-collect-links/trial-1/candidate.php @@ -0,0 +1,34 @@ +next_tag( 'A' ) ) { + $href = $processor->get_attribute( 'href' ); + + if ( ! is_string( $href ) ) { + continue; + } + + $depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + $links[] = array( + 'href' => $href, + 'text' => $text, + ); + } + + return $links; +} diff --git a/doc-experiment/results/round-46/T06-collect-links/trial-1/execution.json b/doc-experiment/results/round-46/T06-collect-links/trial-1/execution.json new file mode 100644 index 0000000000000..389158060daff --- /dev/null +++ b/doc-experiment/results/round-46/T06-collect-links/trial-1/execution.json @@ -0,0 +1,148 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T06-collect-links/trial-1/candidate.php", + "function": "collect_links", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "actual": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-excluded", + "status": "pass", + "expected": [ + { + "href": "/only", + "text": "real" + } + ], + "actual": [ + { + "href": "/only", + "text": "real" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-in-href-decoded", + "status": "pass", + "expected": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "actual": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-link-empty-text", + "status": "pass", + "expected": [ + { + "href": "/img", + "text": "" + } + ], + "actual": [ + { + "href": "/img", + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-text", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "actual": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-links", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-link", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "actual": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/T06-collect-links/trial-1/response.json b/doc-experiment/results/round-46/T06-collect-links/trial-1/response.json new file mode 100644 index 0000000000000..f701249fb7ce4 --- /dev/null +++ b/doc-experiment/results/round-46/T06-collect-links/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because text extraction depends on document structure. It scans `A` openers with `next_tag( 'A' )`, filters to links whose `href` is reported as a string via `get_attribute()`, then walks that anchor\u2019s subtree with `next_token()` and `get_current_depth()`, concatenating only `#text` tokens with `get_modifiable_text()` so markup contributes nothing and character references are already decoded by the API.", + "confidence": 91 +} diff --git a/doc-experiment/results/round-46/T06-collect-links/trial-2/candidate.php b/doc-experiment/results/round-46/T06-collect-links/trial-2/candidate.php new file mode 100644 index 0000000000000..a574e77a18672 --- /dev/null +++ b/doc-experiment/results/round-46/T06-collect-links/trial-2/candidate.php @@ -0,0 +1,36 @@ +next_tag( 'A' ) ) { + $href = $processor->get_attribute( 'href' ); + if ( ! is_string( $href ) ) { + continue; + } + + $depth = $processor->get_current_depth(); + $text = ''; + + while ( $processor->next_token() && $processor->get_current_depth() >= $depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + $links[] = array( + 'href' => $href, + 'text' => $text, + ); + } + + if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) { + return array(); + } + + return $links; +} diff --git a/doc-experiment/results/round-46/T06-collect-links/trial-2/execution.json b/doc-experiment/results/round-46/T06-collect-links/trial-2/execution.json new file mode 100644 index 0000000000000..1fce0c083259d --- /dev/null +++ b/doc-experiment/results/round-46/T06-collect-links/trial-2/execution.json @@ -0,0 +1,148 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T06-collect-links/trial-2/candidate.php", + "function": "collect_links", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "actual": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-excluded", + "status": "pass", + "expected": [ + { + "href": "/only", + "text": "real" + } + ], + "actual": [ + { + "href": "/only", + "text": "real" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-in-href-decoded", + "status": "pass", + "expected": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "actual": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-link-empty-text", + "status": "pass", + "expected": [ + { + "href": "/img", + "text": "" + } + ], + "actual": [ + { + "href": "/img", + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-text", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "actual": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-links", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-link", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "actual": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/T06-collect-links/trial-2/response.json b/doc-experiment/results/round-46/T06-collect-links/trial-2/response.json new file mode 100644 index 0000000000000..bff7b20e04eb9 --- /dev/null +++ b/doc-experiment/results/round-46/T06-collect-links/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because link text extraction is a tree-aware task. It scans each `A` opener with `next_tag( 'A' )`, reads `href` with `get_attribute()` and keeps only string-valued attributes, then walks that anchor\u2019s subtree with `next_token()` bounded by `get_current_depth()`, concatenating only `#text` tokens via `get_modifiable_text()` so markup contributes nothing and character references stay decoded as documented.", + "confidence": 86 +} diff --git a/doc-experiment/results/round-46/T06-collect-links/trial-3/candidate.php b/doc-experiment/results/round-46/T06-collect-links/trial-3/candidate.php new file mode 100644 index 0000000000000..5457a096209c1 --- /dev/null +++ b/doc-experiment/results/round-46/T06-collect-links/trial-3/candidate.php @@ -0,0 +1,32 @@ +next_tag( 'A' ) ) { + $href = $processor->get_attribute( 'href' ); + if ( ! is_string( $href ) ) { + continue; + } + + $text = ''; + $a_depth = $processor->get_current_depth(); + + while ( $processor->next_token() && $processor->get_current_depth() >= $a_depth ) { + if ( '#text' === $processor->get_token_type() ) { + $text .= $processor->get_modifiable_text(); + } + } + + $links[] = array( + 'href' => $href, + 'text' => $text, + ); + } + + return $links; +} diff --git a/doc-experiment/results/round-46/T06-collect-links/trial-3/execution.json b/doc-experiment/results/round-46/T06-collect-links/trial-3/execution.json new file mode 100644 index 0000000000000..b408c4e3d9d1a --- /dev/null +++ b/doc-experiment/results/round-46/T06-collect-links/trial-3/execution.json @@ -0,0 +1,148 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T06-collect-links/trial-3/candidate.php", + "function": "collect_links", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "actual": [ + { + "href": "/a", + "text": "First" + }, + { + "href": "/b", + "text": "second link" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-href-excluded", + "status": "pass", + "expected": [ + { + "href": "/only", + "text": "real" + } + ], + "actual": [ + { + "href": "/only", + "text": "real" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entity-in-href-decoded", + "status": "pass", + "expected": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "actual": [ + { + "href": "/search?q=a&b", + "text": "query" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "valueless-href", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "image-link-empty-text", + "status": "pass", + "expected": [ + { + "href": "/img", + "text": "" + } + ], + "actual": [ + { + "href": "/img", + "text": "" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-text", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "actual": [ + { + "href": "/x", + "text": "Fish & Chips" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-links", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "unclosed-link", + "status": "pass", + "expected": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "actual": [ + { + "href": "/x", + "text": "runs to the end" + } + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/T06-collect-links/trial-3/response.json b/doc-experiment/results/round-46/T06-collect-links/trial-3/response.json new file mode 100644 index 0000000000000..3879f0ca5e704 --- /dev/null +++ b/doc-experiment/results/round-46/T06-collect-links/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the BODY fragment structurally, then scans `A` openers with `next_tag('A')`. For each anchor, it reads `href` with `get_attribute()` and keeps only string-valued attributes, then walks that anchor\u2019s subtree with `next_token()` bounded by `get_current_depth()`, concatenating only `#text` token contents via `get_modifiable_text()` so character references are decoded and markup contributes nothing.", + "confidence": 91 +} diff --git a/doc-experiment/results/round-46/T07-nested-lists/judge.json b/doc-experiment/results/round-46/T07-nested-lists/judge.json new file mode 100644 index 0000000000000..6add73b5c0b43 --- /dev/null +++ b/doc-experiment/results/round-46/T07-nested-lists/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Correctly chose WP_HTML_Processor::create_fragment for a body fragment requiring ancestor awareness. All called methods are documented in the two rendered docs. Uses next_tag(), get_tag(), get_breadcrumbs(), add_class(), get_updated_html(), get_last_error(), and paused_at_incomplete_token() idiomatically; excludes the current list from the ancestor check and falls back on unsupported or incomplete input. Passed 7/7." + }, + { + "trial_id": "trial-2", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Correct processor and documented API surface throughout. The token walk and breadcrumb ancestor check are idiomatic, and get_updated_html() is the right output path after add_class(). Minor edge-case gap: it checks get_last_error() but not paused_at_incomplete_token(), even though the docs describe incomplete trailing syntax as a separate condition. Passed 7/7." + }, + { + "trial_id": "trial-3", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Same strong API use as trial-1: structural processor, documented methods only, correct breadcrumb ancestor logic, add_class() for preserving existing class values, and get_updated_html() for byte-preserving output. It also handles null processor creation, unsupported markup, and incomplete trailing tokens. Passed 7/7." + } + ], + "failure_analysis": "All trials passed every hidden case: simple nested OL in UL, top-level lists left untouched, UL inside OL, deep descendant lists, preserving an existing class, multiple nested levels, and mixed top-level/nested content. The docs did well in the places this task depended on: the Tag Processor overview explicitly says it has no tree awareness and points structural work to WP_HTML_Processor; the HTML Processor overview and Supported elements sections explain fragment creation and structural awareness; the Breadcrumbs section says get_breadcrumbs() returns the full root-to-current path, which led subjects to ignore the final breadcrumb when checking ancestors; add_class() documentation explains class creation/appending/preservation; and get_updated_html() is documented as the correct byte-preserving output method after queued class edits. The only near-miss was incomplete input handling: trial-2 did not check paused_at_incomplete_token(), likely because that inherited method is documented primarily on the Tag Processor page and only referenced from HTML Processor prose/examples rather than being easy to discover as part of the HTML Processor method surface.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::get_breadcrumbs() docblock / Breadcrumbs guide", + "problem": "The docs state that breadcrumbs include the currently matched node, but they do not explicitly call out the common ancestor-test pattern or the off-by-one risk.", + "suggestion": "Add a short note: when testing ancestors of the current token, ignore the last breadcrumb entry because it is the current matched node; use the full array only when matching the complete path including the current node." + }, + { + "location": "WP_HTML_Processor inherited method documentation for paused_at_incomplete_token()", + "problem": "paused_at_incomplete_token() is usable on WP_HTML_Processor through inheritance and appears in examples, but it is easier to discover on the Tag Processor page than in the HTML Processor method surface.", + "suggestion": "Expose inherited public parser-status methods in the HTML Processor docs, or add a dedicated see-also note near get_last_error() explaining that unsupported markup and incomplete trailing syntax are separate checks." + }, + { + "location": "WP_HTML_Processor::add_class() docblock", + "problem": "The HTML Processor add_class() entry is brief, while the detailed class-preservation semantics live on the Tag Processor page.", + "suggestion": "Add a concise inherited-behavior summary or direct cross-reference stating that add_class() creates a class attribute when missing, appends without removing existing classes, avoids duplicate exact class names, and should be read back with get_updated_html()." + } + ] +} diff --git a/doc-experiment/results/round-46/T07-nested-lists/trial-1/candidate.php b/doc-experiment/results/round-46/T07-nested-lists/trial-1/candidate.php new file mode 100644 index 0000000000000..27a6e3749447c --- /dev/null +++ b/doc-experiment/results/round-46/T07-nested-lists/trial-1/candidate.php @@ -0,0 +1,36 @@ +next_tag() ) { + $tag = $processor->get_tag(); + if ( 'UL' !== $tag && 'OL' !== $tag ) { + continue; + } + + $breadcrumbs = $processor->get_breadcrumbs(); + $is_nested = false; + $last_index = count( $breadcrumbs ) - 1; + + for ( $i = 0; $i < $last_index; $i++ ) { + if ( 'UL' === $breadcrumbs[ $i ] || 'OL' === $breadcrumbs[ $i ] ) { + $is_nested = true; + break; + } + } + + if ( $is_nested ) { + $processor->add_class( 'nested-list' ); + } + } + + if ( null !== $processor->get_last_error() || $processor->paused_at_incomplete_token() ) { + return $html; + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-46/T07-nested-lists/trial-1/execution.json b/doc-experiment/results/round-46/T07-nested-lists/trial-1/execution.json new file mode 100644 index 0000000000000..f6077f43c6068 --- /dev/null +++ b/doc-experiment/results/round-46/T07-nested-lists/trial-1/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T07-nested-lists/trial-1/candidate.php", + "function": "mark_nested_lists", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple-ol-inside-ul", + "status": "pass", + "expected": "
                                • One
                                  1. Nested
                                ", + "actual": "
                                • One
                                  1. Nested
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "top-level-lists-untouched", + "status": "pass", + "expected": "
                                1. Top
                                • Also top
                                ", + "actual": "
                                1. Top
                                • Also top
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ul-inside-ol", + "status": "pass", + "expected": "
                                1. One
                                  • Nested
                                ", + "actual": "
                                1. One
                                  • Nested
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "deep-descendant", + "status": "pass", + "expected": "
                                  1. Deep
                                ", + "actual": "
                                  1. Deep
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class-preserved", + "status": "pass", + "expected": "
                                  1. Nested
                                ", + "actual": "
                                  1. Nested
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-nested-levels", + "status": "pass", + "expected": "
                                • A
                                  1. B
                                    • C
                                ", + "actual": "
                                • A
                                  1. B
                                    • C
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "mixed-document", + "status": "pass", + "expected": "

                                intro

                                • A
                                  1. B
                                1. C
                                ", + "actual": "

                                intro

                                • A
                                  1. B
                                1. C
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/T07-nested-lists/trial-1/response.json b/doc-experiment/results/round-46/T07-nested-lists/trial-1/response.json new file mode 100644 index 0000000000000..97e61b1afd022 --- /dev/null +++ b/doc-experiment/results/round-46/T07-nested-lists/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure. It scans list openers with `next_tag()`, checks each matched tag\u2019s `get_breadcrumbs()` for any ancestor `UL` or `OL`, adds the class with `add_class()`, and returns the edited fragment with `get_updated_html()`. If the processor cannot safely parse the fragment or stops on unsupported/incomplete input, it returns the original HTML unchanged.", + "confidence": 90 +} diff --git a/doc-experiment/results/round-46/T07-nested-lists/trial-2/candidate.php b/doc-experiment/results/round-46/T07-nested-lists/trial-2/candidate.php new file mode 100644 index 0000000000000..fb33203ad0453 --- /dev/null +++ b/doc-experiment/results/round-46/T07-nested-lists/trial-2/candidate.php @@ -0,0 +1,32 @@ +next_tag() ) { + $tag_name = $processor->get_tag(); + + if ( 'UL' !== $tag_name && 'OL' !== $tag_name ) { + continue; + } + + $breadcrumbs = $processor->get_breadcrumbs(); + $limit = count( $breadcrumbs ) - 1; + + for ( $i = 0; $i < $limit; $i++ ) { + if ( 'UL' === $breadcrumbs[ $i ] || 'OL' === $breadcrumbs[ $i ] ) { + $processor->add_class( 'nested-list' ); + break; + } + } + } + + if ( null !== $processor->get_last_error() ) { + return $html; + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-46/T07-nested-lists/trial-2/execution.json b/doc-experiment/results/round-46/T07-nested-lists/trial-2/execution.json new file mode 100644 index 0000000000000..806c2fc7a4203 --- /dev/null +++ b/doc-experiment/results/round-46/T07-nested-lists/trial-2/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T07-nested-lists/trial-2/candidate.php", + "function": "mark_nested_lists", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple-ol-inside-ul", + "status": "pass", + "expected": "
                                • One
                                  1. Nested
                                ", + "actual": "
                                • One
                                  1. Nested
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "top-level-lists-untouched", + "status": "pass", + "expected": "
                                1. Top
                                • Also top
                                ", + "actual": "
                                1. Top
                                • Also top
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ul-inside-ol", + "status": "pass", + "expected": "
                                1. One
                                  • Nested
                                ", + "actual": "
                                1. One
                                  • Nested
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "deep-descendant", + "status": "pass", + "expected": "
                                  1. Deep
                                ", + "actual": "
                                  1. Deep
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class-preserved", + "status": "pass", + "expected": "
                                  1. Nested
                                ", + "actual": "
                                  1. Nested
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-nested-levels", + "status": "pass", + "expected": "
                                • A
                                  1. B
                                    • C
                                ", + "actual": "
                                • A
                                  1. B
                                    • C
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "mixed-document", + "status": "pass", + "expected": "

                                intro

                                • A
                                  1. B
                                1. C
                                ", + "actual": "

                                intro

                                • A
                                  1. B
                                1. C
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/T07-nested-lists/trial-2/response.json b/doc-experiment/results/round-46/T07-nested-lists/trial-2/response.json new file mode 100644 index 0000000000000..faee11955acfe --- /dev/null +++ b/doc-experiment/results/round-46/T07-nested-lists/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure. It scans list openers with `next_tag()`, checks each matched element\u2019s ancestors via `get_breadcrumbs()`, and calls `add_class()` only when a `UL` or `OL` appears earlier in that breadcrumb trail. The final HTML is returned with `get_updated_html()`, and the original input is preserved unchanged if fragment creation fails or the HTML processor aborts on unsupported markup.", + "confidence": 84 +} diff --git a/doc-experiment/results/round-46/T07-nested-lists/trial-3/candidate.php b/doc-experiment/results/round-46/T07-nested-lists/trial-3/candidate.php new file mode 100644 index 0000000000000..60a5e456ac67f --- /dev/null +++ b/doc-experiment/results/round-46/T07-nested-lists/trial-3/candidate.php @@ -0,0 +1,37 @@ +next_tag() ) { + $tag = $processor->get_tag(); + + if ( 'UL' !== $tag && 'OL' !== $tag ) { + continue; + } + + $breadcrumbs = $processor->get_breadcrumbs(); + $has_list_ancestor = false; + $last_index = count( $breadcrumbs ) - 1; + + for ( $i = 0; $i < $last_index; $i++ ) { + if ( 'UL' === $breadcrumbs[ $i ] || 'OL' === $breadcrumbs[ $i ] ) { + $has_list_ancestor = true; + break; + } + } + + if ( $has_list_ancestor ) { + $processor->add_class( 'nested-list' ); + } + } + + if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) { + return $html; + } + + return $processor->get_updated_html(); +} diff --git a/doc-experiment/results/round-46/T07-nested-lists/trial-3/execution.json b/doc-experiment/results/round-46/T07-nested-lists/trial-3/execution.json new file mode 100644 index 0000000000000..a3d66de26f902 --- /dev/null +++ b/doc-experiment/results/round-46/T07-nested-lists/trial-3/execution.json @@ -0,0 +1,71 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T07-nested-lists/trial-3/candidate.php", + "function": "mark_nested_lists", + "passed": 7, + "total": 7, + "cases": [ + { + "id": "simple-ol-inside-ul", + "status": "pass", + "expected": "
                                • One
                                  1. Nested
                                ", + "actual": "
                                • One
                                  1. Nested
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "top-level-lists-untouched", + "status": "pass", + "expected": "
                                1. Top
                                • Also top
                                ", + "actual": "
                                1. Top
                                • Also top
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "ul-inside-ol", + "status": "pass", + "expected": "
                                1. One
                                  • Nested
                                ", + "actual": "
                                1. One
                                  • Nested
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "deep-descendant", + "status": "pass", + "expected": "
                                  1. Deep
                                ", + "actual": "
                                  1. Deep
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "existing-class-preserved", + "status": "pass", + "expected": "
                                  1. Nested
                                ", + "actual": "
                                  1. Nested
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "multiple-nested-levels", + "status": "pass", + "expected": "
                                • A
                                  1. B
                                    • C
                                ", + "actual": "
                                • A
                                  1. B
                                    • C
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "mixed-document", + "status": "pass", + "expected": "

                                intro

                                • A
                                  1. B
                                1. C
                                ", + "actual": "

                                intro

                                • A
                                  1. B
                                1. C
                                ", + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/T07-nested-lists/trial-3/response.json b/doc-experiment/results/round-46/T07-nested-lists/trial-3/response.json new file mode 100644 index 0000000000000..b768583e353bd --- /dev/null +++ b/doc-experiment/results/round-46/T07-nested-lists/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` because the task depends on document structure, then scans tag openers with `next_tag()`. For each `UL` or `OL`, it checks `get_breadcrumbs()` for any ancestor list element and calls `add_class( 'nested-list' )` only when such an ancestor exists, finally returning `get_updated_html()` so untouched bytes remain unchanged; if parsing is incomplete or unsupported, it returns the original HTML unchanged.", + "confidence": 88 +} diff --git a/doc-experiment/results/round-46/T08-table-extract/judge.json b/doc-experiment/results/round-46/T08-table-extract/judge.json new file mode 100644 index 0000000000000..d3aa01ad9e901 --- /dev/null +++ b/doc-experiment/results/round-46/T08-table-extract/judge.json @@ -0,0 +1,45 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 92, + "hallucinated_methods": [], + "notes": "Passed 8/8. Correctly chose WP_HTML_Processor::create_fragment(), used a single depth-bounded next_token() walk, relied on virtual closers, and read decoded #text with get_modifiable_text(). All called API methods are documented. Main adherence issue: it opted in SCRIPT/STYLE/TEXTAREA/TITLE opener text inside cells, but the docs' subtree-text recipe says ordinary text extraction should append only #text tokens unless the caller explicitly asks for special-element contents; SCRIPT/STYLE would also be raw, not decoded." + }, + { + "trial_id": "trial-2", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Passed 8/8. This is the cleanest match to the documented pattern: HTML Processor, first TABLE, one bounded token walk, closer-driven row/cell flushing, #text-only accumulation, and get_last_error() check. All API calls appear in the rendered docs and no _doing_it_wrong records were reported. Only minor gap is that it does not make an explicit paused_at_incomplete_token() policy, though its behavior is reasonable for extraction." + }, + { + "trial_id": "trial-3", + "adherence": 96, + "hallucinated_methods": [], + "notes": "Passed 8/8. Correct processor and documented APIs throughout, with idiomatic one-pass state tracking and #text-only decoded text collection. It also checks paused_at_incomplete_token(), which is documented, but applies a blanket empty-array fallback on truncated syntax. The docs frame that as a caller policy decision, so this is slightly over-strict for a browser-style extraction task that can still produce virtual closers and partial text." + } + ], + "failure_analysis": "No hidden case failed in any trial. The docs did well on the central risks for this task: the Tag Processor overview explicitly steers structural and text-content work to WP_HTML_Processor; the HTML Processor next_token() docs explain virtual closers, implied table structure such as TBODY, one-cursor state-machine walking, and depth-bounded subtree scans; get_current_depth() emphasizes the >= guard; get_modifiable_text() explains decoded #text. Near-misses: trial-1 over-read the special-element opt-in guidance and would include SCRIPT/STYLE/TEXTAREA contents even though the ordinary subtree-text recipe says not to; trial-3 treated paused_at_incomplete_token() as mandatory rejection rather than a contract-dependent policy.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::next_token() / subtree text recipe", + "problem": "The docs distinguish ordinary #text extraction from special-element modifiable text, but a subject still interpreted special elements as part of normal subtree text collection.", + "suggestion": "Add a short docblock note that 'ordinary text descendants' means visited #text tokens only, and that SCRIPT/STYLE/TEXTAREA/TITLE opener text is opt-in with different decoding/raw-text semantics." + }, + { + "location": "WP_HTML_Tag_Processor::paused_at_incomplete_token() and WP_HTML_Processor::next_token()", + "problem": "The incomplete-input guidance does not fully spell out that HTML Processor may still emit virtual closers and usable parsed content before reporting a paused lexical token.", + "suggestion": "Clarify that paused_at_incomplete_token() is a caller policy signal: extraction APIs may accept the accumulated result, while mutations or contracts requiring complete source should reject or fall back." + }, + { + "location": "WP_HTML_Processor::get_last_error()", + "problem": "Bounded scans can stop before unprocessed later markup, so get_last_error() only reflects tokens the processor actually reached.", + "suggestion": "Document that callers needing whole-input validation must drain the processor; callers scanning one subtree should treat get_last_error() as applying to the processed region only." + }, + { + "location": "Rendered method index for WP_HTML_Processor", + "problem": "Private parser internals such as step_in_table(), close_cell(), and insertion-mode helpers are rendered beside public methods, which can distract API users or invite private API use.", + "suggestion": "Filter private methods from consumer docs or mark the private/internal section much more prominently as not callable by plugin/theme code." + } + ] +} diff --git a/doc-experiment/results/round-46/T08-table-extract/trial-1/candidate.php b/doc-experiment/results/round-46/T08-table-extract/trial-1/candidate.php new file mode 100644 index 0000000000000..bb2517e0fee21 --- /dev/null +++ b/doc-experiment/results/round-46/T08-table-extract/trial-1/candidate.php @@ -0,0 +1,57 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = null; + $special_text_tags = array( 'SCRIPT' => true, 'STYLE' => true, 'TEXTAREA' => true, 'TITLE' => true ); + + while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) { + $token_type = $processor->get_token_type(); + $tag_name = $processor->get_tag(); + + if ( '#tag' === $token_type ) { + if ( $processor->is_tag_closer() ) { + if ( ( 'TD' === $tag_name || 'TH' === $tag_name ) && null !== $current_cell && is_array( $current_row ) ) { + $current_row[] = $current_cell; + $current_cell = null; + } elseif ( 'TR' === $tag_name && is_array( $current_row ) ) { + $rows[] = $current_row; + $current_row = null; + } + } else { + if ( 'TR' === $tag_name ) { + $current_row = array(); + } elseif ( 'TD' === $tag_name || 'TH' === $tag_name ) { + if ( ! is_array( $current_row ) ) { + $current_row = array(); + } + $current_cell = ''; + } elseif ( null !== $current_cell && isset( $special_text_tags[ $tag_name ] ) ) { + $current_cell .= $processor->get_modifiable_text(); + } + } + + continue; + } + + if ( '#text' === $token_type && null !== $current_cell ) { + $current_cell .= $processor->get_modifiable_text(); + } + } + + if ( null !== $processor->get_last_error() ) { + return array(); + } + + return $rows; +} diff --git a/doc-experiment/results/round-46/T08-table-extract/trial-1/execution.json b/doc-experiment/results/round-46/T08-table-extract/trial-1/execution.json new file mode 100644 index 0000000000000..5e59a779ddf3a --- /dev/null +++ b/doc-experiment/results/round-46/T08-table-extract/trial-1/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T08-table-extract/trial-1/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/T08-table-extract/trial-1/response.json b/doc-experiment/results/round-46/T08-table-extract/trial-1/response.json new file mode 100644 index 0000000000000..963f71d4b3bc2 --- /dev/null +++ b/doc-experiment/results/round-46/T08-table-extract/trial-1/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with browser-like table handling, finds the first `TABLE` with `next_tag()`, then does a single `next_token()` walk bounded by the table depth from `get_current_depth()`. It builds rows on `TR` open/close, cells on `TD`/`TH` open/close, appends ordinary text from `#text` tokens via `get_modifiable_text()`, and also opts in to special text-bearing element openers (`SCRIPT`, `STYLE`, `TEXTAREA`, `TITLE`) as documented for subtree text collection.", + "confidence": 88 +} diff --git a/doc-experiment/results/round-46/T08-table-extract/trial-2/candidate.php b/doc-experiment/results/round-46/T08-table-extract/trial-2/candidate.php new file mode 100644 index 0000000000000..7098fb39e5c7a --- /dev/null +++ b/doc-experiment/results/round-46/T08-table-extract/trial-2/candidate.php @@ -0,0 +1,66 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = null; + $in_cell = false; + + while ( $processor->next_token() ) { + if ( $processor->get_current_depth() < $table_depth ) { + break; + } + + if ( '#tag' === $processor->get_token_type() ) { + $tag = $processor->get_tag(); + + if ( 'TR' === $tag ) { + if ( $processor->is_tag_closer() ) { + if ( is_array( $current_row ) ) { + $rows[] = $current_row; + $current_row = null; + } + } else { + $current_row = array(); + } + + continue; + } + + if ( 'TD' === $tag || 'TH' === $tag ) { + if ( $processor->is_tag_closer() ) { + if ( $in_cell && is_array( $current_row ) ) { + $current_row[] = $current_cell; + } + $current_cell = null; + $in_cell = false; + } else { + $current_cell = ''; + $in_cell = true; + } + + continue; + } + } + + if ( $in_cell && '#text' === $processor->get_token_type() ) { + $current_cell .= $processor->get_modifiable_text(); + } + } + + if ( null !== $processor->get_last_error() ) { + return array(); + } + + return $rows; +} diff --git a/doc-experiment/results/round-46/T08-table-extract/trial-2/execution.json b/doc-experiment/results/round-46/T08-table-extract/trial-2/execution.json new file mode 100644 index 0000000000000..7e61ee0f67e08 --- /dev/null +++ b/doc-experiment/results/round-46/T08-table-extract/trial-2/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T08-table-extract/trial-2/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/T08-table-extract/trial-2/response.json b/doc-experiment/results/round-46/T08-table-extract/trial-2/response.json new file mode 100644 index 0000000000000..7b469fc5673ea --- /dev/null +++ b/doc-experiment/results/round-46/T08-table-extract/trial-2/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` to parse the body fragment with browser-like table handling, finds the first `TABLE` via `next_tag()`, then walks its subtree with `next_token()` and `get_current_depth()`. It tracks `TR`, `TD`, and `TH` openers/closers with `get_tag()` and `is_tag_closer()`, and concatenates only `#text` token contents using `get_modifiable_text()`, which provides decoded text.", + "confidence": 89 +} diff --git a/doc-experiment/results/round-46/T08-table-extract/trial-3/candidate.php b/doc-experiment/results/round-46/T08-table-extract/trial-3/candidate.php new file mode 100644 index 0000000000000..a47c3c5920109 --- /dev/null +++ b/doc-experiment/results/round-46/T08-table-extract/trial-3/candidate.php @@ -0,0 +1,68 @@ +next_tag( 'TABLE' ) ) { + return array(); + } + + $table_depth = $processor->get_current_depth(); + $rows = array(); + $current_row = null; + $current_cell = null; + + while ( $processor->next_token() && $processor->get_current_depth() >= $table_depth ) { + $token_type = $processor->get_token_type(); + $token_name = $processor->get_token_name(); + + if ( '#text' === $token_type ) { + if ( null !== $current_cell ) { + $current_cell .= $processor->get_modifiable_text(); + } + continue; + } + + if ( '#tag' !== $token_type ) { + continue; + } + + if ( 'TR' === $token_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_cell && null !== $current_row ) { + $current_row[] = $current_cell; + $current_cell = null; + } + + if ( null !== $current_row ) { + $rows[] = $current_row; + $current_row = null; + } + } else { + $current_row = array(); + } + + continue; + } + + if ( 'TD' === $token_name || 'TH' === $token_name ) { + if ( $processor->is_tag_closer() ) { + if ( null !== $current_cell && null !== $current_row ) { + $current_row[] = $current_cell; + $current_cell = null; + } + } elseif ( null !== $current_row ) { + $current_cell = ''; + } + } + } + + if ( $processor->paused_at_incomplete_token() || null !== $processor->get_last_error() ) { + return array(); + } + + return $rows; +} diff --git a/doc-experiment/results/round-46/T08-table-extract/trial-3/execution.json b/doc-experiment/results/round-46/T08-table-extract/trial-3/execution.json new file mode 100644 index 0000000000000..079dcc4d756b0 --- /dev/null +++ b/doc-experiment/results/round-46/T08-table-extract/trial-3/execution.json @@ -0,0 +1,172 @@ +{ + "candidate": "/Users/jonsurrell/a8c/wordpress-develop/html-api-docs-improvement/doc-experiment/results/round-46/T08-table-extract/trial-3/candidate.php", + "function": "table_to_array", + "passed": 8, + "total": 8, + "cases": [ + { + "id": "simple", + "status": "pass", + "expected": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "actual": [ + [ + "Name", + "Age" + ], + [ + "Ada", + "36" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "thead-tbody", + "status": "pass", + "expected": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "actual": [ + [ + "H" + ], + [ + "a" + ], + [ + "b" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "omitted-closers", + "status": "pass", + "expected": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "actual": [ + [ + "one", + "two" + ], + [ + "three", + "four" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "markup-in-cells", + "status": "pass", + "expected": [ + [ + "bold text", + "link" + ] + ], + "actual": [ + [ + "bold text", + "link" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "entities-in-cells", + "status": "pass", + "expected": [ + [ + "Fish & Chips" + ] + ], + "actual": [ + [ + "Fish & Chips" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "no-table", + "status": "pass", + "expected": [], + "actual": [], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "first-table-only", + "status": "pass", + "expected": [ + [ + "first" + ] + ], + "actual": [ + [ + "first" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + }, + { + "id": "empty-cells", + "status": "pass", + "expected": [ + [ + "", + "x" + ] + ], + "actual": [ + [ + "", + "x" + ] + ], + "error": null, + "doing_it_wrong": [], + "trigger_error": [] + } + ] +} diff --git a/doc-experiment/results/round-46/T08-table-extract/trial-3/response.json b/doc-experiment/results/round-46/T08-table-extract/trial-3/response.json new file mode 100644 index 0000000000000..272e4ee0f1646 --- /dev/null +++ b/doc-experiment/results/round-46/T08-table-extract/trial-3/response.json @@ -0,0 +1,5 @@ +{ + "ok": true, + "explanation": "This uses `WP_HTML_Processor::create_fragment()` so table parsing follows browser-like HTML structure, including implied table elements. It finds the first `TABLE` with `next_tag()`, then makes one bounded `next_token()` walk through that table subtree, collecting decoded text only from `#text` tokens while tracking `TR`, `TD`, and `TH` openers/closers with `get_token_type()`, `get_token_name()`, `is_tag_closer()`, `get_current_depth()`, and `get_modifiable_text()`.", + "confidence": 88 +} diff --git a/doc-experiment/results/round-46/T09-mark-keyword/judge.json b/doc-experiment/results/round-46/T09-mark-keyword/judge.json new file mode 100644 index 0000000000000..436b2537dd4c2 --- /dev/null +++ b/doc-experiment/results/round-46/T09-mark-keyword/judge.json @@ -0,0 +1,40 @@ +{ + "trials": [ + { + "trial_id": "trial-1", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Used the correct `WP_HTML_Processor::create_fragment()` path, walked tokens with `next_token()`, guarded on `#text`, matched decoded text via `get_modifiable_text()`, and rebuilt normalized output with `serialize_token()`. All called HTML API methods are documented, and execution passed 8/8. Minor near-miss: returning raw `$html` on `create_fragment()` null or `get_last_error()` conflicts with a normalized-output contract; the docs warn that original input is neither normalized nor rewritten." + }, + { + "trial_id": "trial-2", + "adherence": 100, + "hallucinated_methods": [], + "notes": "Fully idiomatic use of the documented API: HTML Processor fragment parsing, token walking, `#text` filtering, decoded text comparison, and token-by-token serialization with wrappers. All called methods are documented and there were no `_doing_it_wrong` records. Execution passed 8/8." + }, + { + "trial_id": "trial-3", + "adherence": 98, + "hallucinated_methods": [], + "notes": "Same correct documented pattern as trial 1: `create_fragment()`, `next_token()`, `get_token_type()`, `get_modifiable_text()`, `serialize_token()`, and `get_last_error()` are all present in the rendered docs. Execution passed 8/8. Minor near-miss: raw-input fallback on parser creation/error is not normalized output." + } + ], + "failure_analysis": "No hidden case failed in any trial. The docs worked well for this task: `create_fragment()` and the HTML Support overview made the HTML Processor the clear choice for BODY fragments and normalization; the DOM-style text recipe warned to use only ordinary `#text` tokens, which avoided comments, attributes, and special text-bearing elements; `get_modifiable_text()` documented decoded text for `#text` nodes, which handled entity-encoded keywords; and `serialize_token()` documented token-by-token normalized rewriting, which led all trials to wrap serialized tokens rather than mutate raw strings. The main near-miss was error fallback policy: two trials returned the original raw input on parser failure, even though the `serialize_token()` docs say this discards accumulated rewrites and is not normalized.", + "doc_gaps": [ + { + "location": "WP_HTML_Processor::serialize_token() / rewrite-while-serializing guidance", + "problem": "The docs mention that returning original input is not normalized, but two trials still chose that fallback for parser errors in a function whose contract requires normalized output.", + "suggestion": "Make the fallback guidance more prescriptive: for normalized-output rewrites, return a caller-defined failure sentinel such as `null`/`''` or documented partial output; return original input only when the contract explicitly prioritizes preserving source bytes over normalization and emitted edits." + }, + { + "location": "WP_HTML_Processor::next_token() method docs", + "problem": "The public method page recommends `next_token()` throughout, but its changelog still says `Added for internal support; do not use`, which contradicts the rendered recipes.", + "suggestion": "Remove or qualify the `do not use` phrase in rendered public docs, or replace it with current guidance about when token walking is appropriate." + }, + { + "location": "WP_HTML_Processor::get_last_error() example", + "problem": "The documented unsupported-markup example appears stale in the probed environment: the shown `