feat(health): add tools/health generator and recipe-health matrix#1304
feat(health): add tools/health generator and recipe-health matrix#1304njhensley wants to merge 1 commit into
Conversation
Add a hermetic, offline recipe-health generator paralleling tools/bom (minus the live Helm render). tools/health calls pkg/health.Compute() and renders a Markdown matrix — per-recipe rolled-up status, a declared- coverage column (R:n D:n P:n C:n), and a literal `pending` Evidence column — with -deterministic/-no-title flags for committable output. Publication mirrors the BOM precedent: - make recipe-health-docs splices the matrix into the marked region of docs/user/recipe-health.md; make recipe-health-check is an advisory staleness check (not wired into make qualify). Named recipe-health-* to avoid the existing check-health/component-health chainsaw family. - docs/user/recipe-health.md: hand-written header (layer-1 chart_pinned and uniform pending-Evidence notes) plus the spliced region. - Registered in docs/index.yml, cross-linked from component-catalog.md, owned via CODEOWNERS, and excluded from the MDX check (splice markers). Tests cover render content, byte-stable determinism, the marker-presence guard, and a compute-budget gate over the full catalog (ADR-009 §2). The matrix body is intentionally deferred behind an AICR-HEALTH-DEFERRED sentinel: pkg/health's chart_pinned signal currently over-reports fail for manifest-only Helm components (no external chart to pin), so the published matrix would mislead. The tooling is fully functional; recipe-health-check skips while the sentinel is present. Implements NVIDIA#1229. Design: ADR-009.
📝 WalkthroughWalkthroughThis pull request introduces a recipe-health documentation tool and integrates it into the project. It adds a new Go CLI ( Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tools/health/main_test.go`:
- Around line 169-181: TestCoverageCell currently contains two inline
assertions; refactor it into a table-driven test to match the style of
TestDimCell and make adding cases easier. Replace the body of TestCoverageCell
with a slice of test cases (each with name, input *health.DeclaredCoverage,
expected string) and iterate using t.Run for each case, calling coverageCell and
comparing results; keep references to the existing coverageCell function and
health.DeclaredCoverage/health.PhaseCoverage types so the same inputs ("nil" and
the sample cov with Readiness/Deployment/Conformance checks) are used as test
cases.
In `@tools/health/main.go`:
- Around line 88-90: The current code double-wraps errors from renderMatrix by
wrapping mdErr again with errors.Wrap(errors.ErrCodeInternal, "render
markdown"); instead, detect if mdErr != nil and return mdErr directly to
preserve the original structured pkg/errors context—update the error handling in
the block that checks mdErr (the variable returned from renderMatrix) to return
mdErr rather than wrapping it, leaving other error paths unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: db6fee11-770a-432e-a132-1de1fd5d3cbd
📒 Files selected for processing (11)
.github/CODEOWNERS.gitignoreMakefiledocs/design/009-recipe-health-tracking.mddocs/index.ymldocs/user/component-catalog.mddocs/user/recipe-health.mdtools/check-docs-mdxtools/health/main.gotools/health/main_test.gotools/health/markdown.go
| func TestCoverageCell(t *testing.T) { | ||
| if got := coverageCell(nil); got != "—" { | ||
| t.Errorf("coverageCell(nil) = %q, want em dash", got) | ||
| } | ||
| cov := &health.DeclaredCoverage{ | ||
| Readiness: health.PhaseCoverage{Checks: []string{"r1", "r2"}}, | ||
| Deployment: health.PhaseCoverage{Checks: []string{"d1"}}, | ||
| Conformance: health.PhaseCoverage{Checks: []string{"c1", "c2", "c3"}}, | ||
| } | ||
| if got := coverageCell(cov); got != "R:2 D:1 P:0 C:3" { | ||
| t.Errorf("coverageCell() = %q, want %q", got, "R:2 D:1 P:0 C:3") | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Consider table-driven structure for consistency.
This test validates two independent scenarios (nil coverage and non-nil coverage). Converting to table-driven format would improve consistency with TestDimCell and make it easier to add coverage edge cases in the future.
♻️ Suggested table-driven refactor
func TestCoverageCell(t *testing.T) {
- if got := coverageCell(nil); got != "—" {
- t.Errorf("coverageCell(nil) = %q, want em dash", got)
- }
- cov := &health.DeclaredCoverage{
- Readiness: health.PhaseCoverage{Checks: []string{"r1", "r2"}},
- Deployment: health.PhaseCoverage{Checks: []string{"d1"}},
- Conformance: health.PhaseCoverage{Checks: []string{"c1", "c2", "c3"}},
- }
- if got := coverageCell(cov); got != "R:2 D:1 P:0 C:3" {
- t.Errorf("coverageCell() = %q, want %q", got, "R:2 D:1 P:0 C:3")
- }
+ tests := []struct {
+ name string
+ in *health.DeclaredCoverage
+ want string
+ }{
+ {"nil coverage is em dash", nil, "—"},
+ {"non-nil formats per-phase counts", &health.DeclaredCoverage{
+ Readiness: health.PhaseCoverage{Checks: []string{"r1", "r2"}},
+ Deployment: health.PhaseCoverage{Checks: []string{"d1"}},
+ Conformance: health.PhaseCoverage{Checks: []string{"c1", "c2", "c3"}},
+ }, "R:2 D:1 P:0 C:3"},
+ }
+ for _, tt := range tests {
+ t.Run(tt.name, func(t *testing.T) {
+ if got := coverageCell(tt.in); got != tt.want {
+ t.Errorf("coverageCell() = %q, want %q", got, tt.want)
+ }
+ })
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func TestCoverageCell(t *testing.T) { | |
| if got := coverageCell(nil); got != "—" { | |
| t.Errorf("coverageCell(nil) = %q, want em dash", got) | |
| } | |
| cov := &health.DeclaredCoverage{ | |
| Readiness: health.PhaseCoverage{Checks: []string{"r1", "r2"}}, | |
| Deployment: health.PhaseCoverage{Checks: []string{"d1"}}, | |
| Conformance: health.PhaseCoverage{Checks: []string{"c1", "c2", "c3"}}, | |
| } | |
| if got := coverageCell(cov); got != "R:2 D:1 P:0 C:3" { | |
| t.Errorf("coverageCell() = %q, want %q", got, "R:2 D:1 P:0 C:3") | |
| } | |
| } | |
| func TestCoverageCell(t *testing.T) { | |
| tests := []struct { | |
| name string | |
| in *health.DeclaredCoverage | |
| want string | |
| }{ | |
| {"nil coverage is em dash", nil, "—"}, | |
| {"non-nil formats per-phase counts", &health.DeclaredCoverage{ | |
| Readiness: health.PhaseCoverage{Checks: []string{"r1", "r2"}}, | |
| Deployment: health.PhaseCoverage{Checks: []string{"d1"}}, | |
| Conformance: health.PhaseCoverage{Checks: []string{"c1", "c2", "c3"}}, | |
| }, "R:2 D:1 P:0 C:3"}, | |
| } | |
| for _, tt := range tests { | |
| t.Run(tt.name, func(t *testing.T) { | |
| if got := coverageCell(tt.in); got != tt.want { | |
| t.Errorf("coverageCell() = %q, want %q", got, tt.want) | |
| } | |
| }) | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/health/main_test.go` around lines 169 - 181, TestCoverageCell currently
contains two inline assertions; refactor it into a table-driven test to match
the style of TestDimCell and make adding cases easier. Replace the body of
TestCoverageCell with a slice of test cases (each with name, input
*health.DeclaredCoverage, expected string) and iterate using t.Run for each
case, calling coverageCell and comparing results; keep references to the
existing coverageCell function and health.DeclaredCoverage/health.PhaseCoverage
types so the same inputs ("nil" and the sample cov with
Readiness/Deployment/Conformance checks) are used as test cases.
| if mdErr != nil { | ||
| return errors.Wrap(errors.ErrCodeInternal, "render markdown", mdErr) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate and show the relevant section in tools/health/main.go
FILE="tools/health/main.go"
echo "=== File: $FILE (lines 1-200) ==="
sed -n '1,200p' "$FILE" | nl -ba | sed -n '1,200p'
# Show the exact lines around 88-90 (defensive; line numbers in snippet are 88-90)
echo "=== File: $FILE (lines 70-110) ==="
sed -n '70,110p' "$FILE" | nl -ba
# Find renderMatrix definition and show its body
echo "=== renderMatrix usages/definition ==="
rg -n "func\s+renderMatrix|renderMatrix\s*\(" "$FILE"
# Show renderMatrix function body (best-effort: from definition to next func)
if rg -n "func\s+renderMatrix" "$FILE" >/dev/null; then
START=$(rg -n "func\s+renderMatrix" "$FILE" | head -n1 | cut -d: -f1)
echo "renderMatrix starts at line: $START"
# print a window around it
sed -n "$START,$((START+200))p" "$FILE" | nl -ba
fi
# Also check for pkg/errors import and error code conventions in this file
echo "=== Imports (top of file) ==="
sed -n '1,60p' "$FILE" | nl -baRepository: NVIDIA/aicr
Length of output: 165
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="tools/health/main.go"
if [ ! -f "$FILE" ]; then
echo "Missing file: $FILE"
exit 1
fi
echo "=== File head (1-120) ==="
awk '{printf "%5d\t%s\n", NR, $0}' "$FILE" | sed -n '1,120p'
echo "=== Imports (top block) ==="
awk '{printf "%5d\t%s\n", NR, $0}' "$FILE" | sed -n '1,80p'
echo "=== renderMatrix definition (search) ==="
rg -n "func[[:space:]]+renderMatrix" "$FILE" || true
# If found, print from definition start up to next func
START_LINE="$(rg -n "func[[:space:]]+renderMatrix" "$FILE" | head -n1 | cut -d: -f1)"
if [ -n "${START_LINE:-}" ]; then
echo "=== renderMatrix body (from $START_LINE for ~250 lines) ==="
awk -v start="$START_LINE" 'NR>=start && NR<start+250{printf "%5d\t%s\n", NR, $0}' "$FILE"
fi
echo "=== Exact block around 80-100 ==="
awk '{printf "%5d\t%s\n", NR, $0}' "$FILE" | sed -n '80,110p'Repository: NVIDIA/aicr
Length of output: 8039
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="tools/health/main.go"
echo "=== wc -l ==="
wc -l "$FILE"
echo "=== tail 120 lines ==="
tail -n 120 "$FILE" | awk '{printf "%5d\t%s\n", NR+('$(
wc -l "$FILE" | awk '{print $1}' )'-120), $0}' || true
echo "=== search for renderMatrix in repo ==="
rg -n "renderMatrix" tools/health || true
rg -n "func[[:space:]]+renderMatrix" . || trueRepository: NVIDIA/aicr
Length of output: 5692
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="tools/health/markdown.go"
echo "=== wc -l markdown.go ==="
wc -l "$FILE"
echo "=== renderMatrix definition + body ==="
# Print from renderMatrix start to end of function (best-effort by window)
START_LINE="$(rg -n "func[[:space:]]+renderMatrix" "$FILE" | head -n1 | cut -d: -f1)"
echo "renderMatrix starts at: $START_LINE"
awk -v start="$START_LINE" 'NR>=start && NR<start+220{printf "%5d\t%s\n", NR, $0}' "$FILE" | sed -n '1,220p'
echo "=== exact error return sites inside renderMatrix ==="
awk -v start="$START_LINE" 'NR>=start && NR<start+220{print}' "$FILE" | rg -n "return .*errors\.|Wrap|Propagate|ErrCode" || trueRepository: NVIDIA/aicr
Length of output: 3692
Avoid double-wrapping renderMatrix errors in tools/health/main.go
renderMatrix already returns a structured pkg/errors error (errors.Wrap(errors.ErrCodeInternal, "failed to write recipe-health markdown", sw.err)), so wrapping mdErr again in the if mdErr != nil block is redundant. Return mdErr directly to preserve the original error context.
Proposed fix
if mdErr != nil {
- return errors.Wrap(errors.ErrCodeInternal, "render markdown", mdErr)
+ return mdErr
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if mdErr != nil { | |
| return errors.Wrap(errors.ErrCodeInternal, "render markdown", mdErr) | |
| } | |
| if mdErr != nil { | |
| return mdErr | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/health/main.go` around lines 88 - 90, The current code double-wraps
errors from renderMatrix by wrapping mdErr again with
errors.Wrap(errors.ErrCodeInternal, "render markdown"); instead, detect if mdErr
!= nil and return mdErr directly to preserve the original structured pkg/errors
context—update the error handling in the block that checks mdErr (the variable
returned from renderMatrix) to return mdErr rather than wrapping it, leaving
other error paths unchanged.
Source: Learnings
Summary
Add a hermetic, offline recipe-health generator (
tools/health) plus the publication machinery (make targets, public doc, nav/ownership) that surfaces per-recipe structural health across the whole criteria matrix — mirroring the BOM generate-and-PR precedent, but with no live Helm render. The matrix body is intentionally deferred (see Implementation Notes); the tooling itself is complete and functional.Motivation / Context
ADR-009 (Recipe Health Tracking, V1) calls for a hermetic generator, BOM-style make targets that splice into a marked region, and a public matrix doc. This PR delivers the
tools/healthgenerator + targets + doc; the validation logic it consumes (pkg/health.Compute) already landed via #1226/#1225.Fixes: N/A — implements the #1229 tooling; public-matrix publication is deferred (see Implementation Notes), so I left #1229 open for the maintainer to close after the matrix publishes.
Related: #1229, #1224 (epic), #1226/#1227 (deps), #1230 (blocked: weekly refresh)
Type of Change
Component(s) Affected
docs/,examples/)tools/healthgenerator +Makefiletargets +tools/check-docs-mdxImplementation Notes
tools/health) parallelstools/bomminus the network render: it callspkg/health.Compute()against the embedded catalog (Provider: nil) and renders a Markdown matrix — per-recipe rolled-up status, a declared-coverage column (R:n D:n P:n C:n), and a literalpendingEvidence column. Flags-deterministic/-no-titlemake the output byte-stable and embeddable.bom-docs/bom-check:make recipe-health-docssplices the body between<!-- BEGIN/END AICR-HEALTH -->;make recipe-health-checkis advisory (not wired intomake qualify/lint/merge gate). Namedrecipe-health-*, not the ADR's originalhealth-*, to avoid thecheck-health/check-health-all/component-healthchainsaw family (homophone footgun); ADR-009 §5 updated to match.AICR-HEALTH-DEFERREDsentinel.pkg/health'schart_pinnedsignal currently over-reportsfailfor manifest-only Helm components (declaredtype: Helmbut shipping local manifests with no external chart to pin), which would make 17/32 recipes showfailand render a misleading public matrix. The generator/targets are fully functional;recipe-health-checkskips while the sentinel is present. Publishing is a follow-up gated on fixingclassifyChartPinned(apkg/healthchange, CODEOWNERS-pinned to the pkg/health: core Compute() loop, resolves signal, rollup, determinism #1225/pkg/health: chart_pinned + declared_coverage signals #1226 owners).Status(+ coverage), never per-dimension keys, andpkg/health.rollup()is generic over the dimension map — soconstraints_wellformedfolds in with zero code change, just a doc regen.docs/user/recipe-health.mdexcluded from the MDX check (carries splice-marker HTML comments, exactly likecontainer-images.md).Testing
Note: I ran
make lint+ race/coverage ontools/health+ the target/splice behavior, not the fullmake qualify(e2e/scan) — CI covers those. New package coverage 72.1% (above the 70% floor); no pre-existing package's coverage changed (no edits outsidetools/health+ docs/build files).Risk Assessment
Rollout notes: No migration. The public matrix stays a placeholder until a follow-up fixes the
chart_pinnedfalse-positive and runsmake recipe-health-docs.Checklist
make testwith-race)make lint)git commit -S)