Skip to content

Document corpus refresh processed-item conventions#2491

Open
chubes4 wants to merge 2 commits into
mainfrom
issue/2489-corpus-refresh-conventions
Open

Document corpus refresh processed-item conventions#2491
chubes4 wants to merge 2 commits into
mainfrom
issue/2489-corpus-refresh-conventions

Conversation

@chubes4
Copy link
Copy Markdown
Member

@chubes4 chubes4 commented Jun 3, 2026

Summary

  • Adds generic corpus refresh processed-item key conventions for document, chunk, and embedding revision dedupe.
  • Documents the corpus refresh source types, key shapes, and batch metadata counters.
  • Surfaces selected and retried batch result counts through run metrics and covers the convention with smoke tests.

Closes #2489

Tests run

  • php tests/corpus-refresh-conventions-smoke.php
  • php tests/run-metrics-smoke.php
  • php tests/processed-item-claims-smoke.php
  • ./vendor/bin/phpcs inc/Core/Corpus/CorpusRefreshConventions.php inc/Core/RunMetrics.php tests/corpus-refresh-conventions-smoke.php tests/run-metrics-smoke.php

Remaining risks

  • This PR defines generic conventions and metrics plumbing only; corpus consumers still need to adopt these keys in their own refresh jobs.
  • No dependency on another issue or PR.

AI assistance

  • AI assistance: Yes
  • Tool(s): OpenCode (gpt-5.5)
  • Used for: Drafted the implementation, documentation, smoke tests, self-review cleanup, and verification commands for Chris to review.

@homeboy-ci
Copy link
Copy Markdown
Contributor

homeboy-ci Bot commented Jun 3, 2026

Homeboy Results — data-machine

Lint

lint — passed

ℹ️ Full options: homeboy docs commands/lint
Deep dive: homeboy lint data-machine --changed-since 6745e42

Artifacts and drill-down
  • CI results artifact: homeboy-ci-results-data-machine-lint-quality-Linux-node24 contains immediate command JSON for this action invocation.
  • Observation artifact: homeboy-observations-data-machine-lint-quality-Linux-node24 contains exported Homeboy run history for deeper queries.
  • Drill-down: download the observation artifact, then run homeboy runs import <dir>, homeboy runs list, and homeboy runs findings <run-id>.
  • Artifacts are attached to the workflow run: https://github.com/Extra-Chill/data-machine/actions/runs/26920077793

Test

test — passed

  • 212 passed

ℹ️ Auto-fix lint issues: homeboy refactor data-machine --from lint --write
ℹ️ Collect coverage: homeboy test data-machine --coverage
ℹ️ Save test baseline: homeboy test data-machine --baseline
ℹ️ Pass args to test runner: homeboy test -- [args]
ℹ️ Full options: homeboy docs commands/test
Deep dive: homeboy test data-machine --changed-since 6745e42

Artifacts and drill-down
  • CI results artifact: homeboy-ci-results-data-machine-test-quality-Linux-node24 contains immediate command JSON for this action invocation.
  • Observation artifact: homeboy-observations-data-machine-test-quality-Linux-node24 contains exported Homeboy run history for deeper queries.
  • Drill-down: download the observation artifact, then run homeboy runs import <dir>, homeboy runs list, and homeboy runs findings <run-id>.
  • Artifacts are attached to the workflow run: https://github.com/Extra-Chill/data-machine/actions/runs/26920077793

Audit

audit — passed

  • audit — 4 finding(s)
  • Total: 4 finding(s)

Deep dive: homeboy audit data-machine --changed-since 6745e42

Artifacts and drill-down
  • CI results artifact: homeboy-ci-results-data-machine-audit-quality-Linux-node24 contains immediate command JSON for this action invocation.
  • Observation artifact: homeboy-observations-data-machine-audit-quality-Linux-node24 contains exported Homeboy run history for deeper queries.
  • Drill-down: download the observation artifact, then run homeboy runs import <dir>, homeboy runs list, and homeboy runs findings <run-id>.
  • Artifacts are attached to the workflow run: https://github.com/Extra-Chill/data-machine/actions/runs/26920077793
Tooling versions
  • Homeboy CLI: homeboy 0.220.1+26d94b5e
  • Extension: wordpress from https://github.com/Extra-Chill/homeboy-extensions
  • Extension revision: 1f9ab9d7
  • Action: unknown@unknown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add processed-item and batch conventions for corpus document refreshes

1 participant