Skip to content

chore(docs): redact personal handle for qukaizen.com/docs publish#60

Open
cdarnell wants to merge 10 commits into
mainfrom
qukaizen/arail-docs-pii-redact
Open

chore(docs): redact personal handle for qukaizen.com/docs publish#60
cdarnell wants to merge 10 commits into
mainfrom
qukaizen/arail-docs-pii-redact

Conversation

@cdarnell
Copy link
Copy Markdown
Owner

Summary

PII pass over the docs corpus ahead of publishing to qukaizen.com/docs. Replaces all cdarnell/<repo> references with qukaizen/<repo> and fixes the stale autoresearch-lab repo name in install guides.

  • cdarnell/autoresearch-labqukaizen/arail (clone URLs in INSTALL/MACOS/LINUX/WSL + the bare cd autoresearch-lab line)
  • cdarnell/arailqukaizen/arail
  • cdarnell/aerollmqukaizen/aerollm
  • cdarnell/airllmqukaizen/airllm
  • ghcr.io/cdarnell/*ghcr.io/qukaizen/*

20 substitutions across 10 files.

PII auditor flagged only these CRITICAL items. No emails, local paths, API keys/tokens, internal hostnames, teammate names, or paperagents references in the corpus. Buddy-prompt frontmatter from Sprints 1-3 also audited clean.

This PR stacks on top of #59 (Sprint 3). When #59 merges, this rebases trivially.

Note: The replacement URLs assume the qukaizen GitHub org exists (or will) and hosts public mirrors of arail, aerollm, airllm. Verify before /docs ships — otherwise every install guide will 404.

Test plan

  • grep -rn "cdarnell\|autoresearch-lab" docs/ BLUEPRINTS.md returns no matches
  • pytest tests/test_docs_*.py -q → 111 passed (verified locally pre-push)
  • Confirm qukaizen/arail, qukaizen/aerollm, qukaizen/airllm repos exist publicly (or queue org/repo creation)
  • Re-run the install guide on a clean machine end-to-end to confirm the rewritten clone URLs work

🤖 Generated with Claude Code

cdarnell and others added 10 commits May 16, 2026 21:57
…+ E + cleanup)

Closure sprint of the Docs Hub effort. Scope: LanceDB ingest of docs/,
cross-link audit, delete docs/INDEX.md, fix sys.modules test hazard.
All four are Sprint 2 carry-overs. Scope ceiling: 8 files, 120 LOC
production, 300 LOC tests, 0 new deps. Implementation order is
revert-able step-by-step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ys.modules with reload + setattr

The old helper used `del sys.modules[mod_name]` + re-import, creating a
new module object.  arail.portal.app._docs_registry kept the old object,
so patches applied via the helper were invisible to app routes — the F5
carry-over flake from TEST_REPORT.md.

Fix:
- _fresh_registry now calls importlib.reload() on the existing module
  object (preserving identity) then patches _repo_root and _docs_registry
  on app.py via monkeypatch.setattr for proper teardown.
- test_live_repo_* tests switch from del sys.modules to importlib.reload
  for the same reason — they were the contamination source that caused
  the ordering-dependent failures.
- New test: test_fresh_registry_rebinds_app_module_reference pins the fix.

All 126 docs test pass together (125 pre-existing + 1 new).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a slim sticky strip below the meter bar with six anchor chips —
Mission · Status · Activity · Report · Knowledge · Network — that
smooth-scroll to each card. Quick Actions stays where it is (those are
real verbs, not nav).

- Invisible .dash-anchor spans placed just above each card carry the
  #dash-* targets so we don't have to repurpose existing card ids.
- scroll-margin-top: 56px keeps the heading clear of the sticky TOC.
- Translucent dark background + backdrop blur so the strip reads as
  chrome, not a section.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n.md/wiki.md links

New test file tests/test_docs_cross_links.py:
- test_cross_link_audit_all_internal_links_resolve: walks the registered
  docs corpus, strips fenced code blocks, asserts every [label](*.md) link
  resolves to an existing file (F3, F4).
- test_cross_link_audit_allowlist_is_minimal: pins allowlist size ≤10 so
  unexpected growth surfaces in CI.
- test_cross_link_audit_code_fence_false_negative_is_blocked: confirms
  links inside ``` fences are stripped before the regex sweep (F4).
- test_cross_link_audit_real_link_outside_fence_is_caught: positive case.
- test_cross_link_audit_perf_under_one_second: full audit < 1s (F10).

Broken links fixed (edit the doc, not the allowlist — per architect spec):
- docs/build-and-finetune-plan.md: ./design.md → ./portal-design.md
- ROADMAP.md: docs/design.md → docs/portal-design.md
- CONTRIBUTING.md: docs/wiki.md removed (no wiki.md exists; pointed at /wiki tab)

Broken-link count in registered docs: 3 → 0.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rect + regression tests

- docs/INDEX.md deleted: the legacy Hub placeholder that the denylist
  has blocked since Sprint 1. /docs now renders the registry-driven Hub
  directly; the file served no purpose on disk.

- app.py: add GET /docs/INDEX.md → 301 /docs permanent redirect handler.
  The handler intentionally does NOT read the file — it fires regardless
  of whether INDEX.md is on disk, preserving any external bookmarks (F6).

- tests/test_docs_routes.py:
  - test_viewer_renders_doc_without_registry_entry: updated from 200
    assertion (Sprint 2, file existed) to 301 assertion (Sprint 3, file
    deleted). Documents the intent change inline.
  - test_docs_index_md_redirect_still_works: primary F6 sentinel — asserts
    301 to /docs after file deletion.
  - test_index_md_file_does_not_exist: belt-and-suspenders file-gone check
    so a future restore is caught immediately.

All 133 docs tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…des registered docs

src/arail/pkb.py:
- New helper _build_docs_rows(): iterates docs_registry.all_docs(), builds
  one LanceDB row per doc with source_kind='docs'.  Row schema matches PKB
  (path/name/vector/mtime/source_kind) so search callers need no changes.
  Namespaces paths as "docs/<slug>.md" or "root/<slug>.md" to prevent future
  PKB/docs collisions.  Body capped at 4 KB (same as PKB, per F1 perf guard).
  Registry failures log a warning and return [] — docs ingest never blocks
  PKB ingest (F8 isolation contract).
- index_all() gains include_docs=True kwarg (default preserves prior behaviour).
  Return dict gains indexed_docs key (0 when include_docs=False or registry empty).

tests/test_docs_ingest.py (new, 7 tests):
- test_index_all_includes_docs_rows: indexed_docs ≥ 1, source_kind='docs' (contract)
- test_index_all_include_docs_false_skips_docs: opt-out works, no docs rows in DB
- test_index_all_handles_registry_failure_gracefully: PKB still indexes on failure (F8)
- test_index_all_empty_body_doc_does_not_crash: empty body slice '' is safe (F7)
- test_index_all_stale_doc_removed_on_reingest: replace() semantics removes stale (F2)
- test_index_all_source_kind_docs_does_not_pollute_pkb_source_kind: isolation (F8)
- test_index_all_perf_under_2s: synthetic-50 PKB + real-24 docs < 2.0s (F1)

Full 140-test docs suite passes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…plete

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tests

QA findings: 0 FAIL, 2 LOW (mixed-case .MD audit gap pinned; F5
cross-domain test contamination is pre-existing, not introduced by
this sprint — recommended session-scoped autouse fixture follow-up).

Coverage: LanceDB re-ingest idempotency, 4 KB embedding cap, empty
docs/ dir, missing frontmatter, INDEX.md redirect (301 + query string
+ case variants), 3-way concurrent index_all (no deadlock), cross-link
edge cases (anchors, query strings, backticks, mixed-case .MD),
sys.modules rebind hermeticity over repeated calls.

Closes docs-hub-sprint-3. Ready to ship.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace cdarnell/<repo> with qukaizen/<repo> across user-facing docs
and fix the stale autoresearch-lab repo name in install guides:

- cdarnell/autoresearch-lab → qukaizen/arail (clone URLs + the bare
  `cd autoresearch-lab` line in INSTALL.md)
- cdarnell/arail → qukaizen/arail
- cdarnell/aerollm → qukaizen/aerollm
- cdarnell/airllm → qukaizen/airllm
- ghcr.io/cdarnell/* → ghcr.io/qukaizen/*

Touched: docs/INSTALL.md, docs/MACOS.md, docs/LINUX.md, docs/WSL.md,
docs/CERTIFIED_MODELS.md, docs/tuning-loop.md, docs/build-and-finetune-plan.md,
docs/maximus.plan.md, docs/plans/oracle-frontier.md, BLUEPRINTS.md.

PII audit findings: only the personal handle and the stale repo name
were flagged as CRITICAL. No emails, local paths, secrets, or
teammate names found. The buddy_prompt frontmatter and other
recently-added Sprint 1-3 content audited clean.

All 111 docs-related tests pass (test_docs_registry, _qa,
_routes, _ingest, _cross_links, _sprint3_qa).

Prepares the corpus for the qukaizen.com/docs publish path
(submodule + build-time MDX render — Option B from the publish
audit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant