diff --git a/docs/superpowers/baselines/2026-04-17/BASELINE.md b/docs/superpowers/baselines/2026-04-17/BASELINE.md index 60550846..1d0eccfc 100644 --- a/docs/superpowers/baselines/2026-04-17/BASELINE.md +++ b/docs/superpowers/baselines/2026-04-17/BASELINE.md @@ -227,6 +227,16 @@ Ordered by severity. Each item cites the raw artifact it was derived from. - **Pipeline serve-smoke failed on both seed repos** (`health=fail`, `stats=null`). `index` and `enrich` succeeded (petclinic 8+13s, express 5+10s) but the 8-second sleep between starting `serve` and `curl /actuator/health` is at the low end of the documented 8–16s Spring Boot + embedded Neo4j cold-start window (see CLAUDE.md §Gotchas). Fix in Phase F hardening: poll `/actuator/health` with a retry budget instead of a fixed sleep. - Raw: `raw/pipeline/spring-petclinic/`, `raw/pipeline/realworld-express/`. + - **RESOLVED (2026-04-17, branch `phase-a/fixups-pipeline-smoke`)**: patched `run-pipeline.sh` to poll `/api/stats` (up to 60s at 2s interval) as the readiness probe and to capture `/actuator/health` only as a diagnostic. Root cause was *not* a too-short sleep — the server cold-starts in 10–11s on both seeds and `/api/stats` responds with real data, but `/actuator/health` returns HTTP **503 `OUT_OF_SERVICE`** because the `GraphHealthIndicator` reports OUT_OF_SERVICE even after the graph loads. Captured baseline numbers below. + + | Seed | index | enrich | ready (stats) | nodes | edges | files | languages | frameworks | health HTTP | + |---|---:|---:|---:|---:|---:|---:|---|---|---:| + | spring-petclinic | 4s | 11s | 11s | 691 | 1,836 | 67 | java 18 | spring_boot 24 | 503 | + | realworld-express | 5s | 10s | 10s | 224 | 297 | 39 | typescript 6 | express 20, prisma 7 | 503 | + + Follow-up split out below. + +- **`GraphHealthIndicator` reports `OUT_OF_SERVICE` (503) even when the graph is loaded.** Discovered during the pipeline smoke-test fix. `/actuator/health` body: `{"groups":["liveness","readiness"],"status":"OUT_OF_SERVICE"}`. The server is fully functional (`/api/stats` returns real data) but the health indicator makes `/actuator/health` unusable as a readiness probe for orchestrators (K8s, Compose, CI). Fix in `src/main/java/io/github/randomcodespace/iq/health/GraphHealthIndicator.java`. Low for baseline use; High when we start Dockerizing or targeting K8s. - **SpotBugs: 8 HIGH-priority findings (priority=1) + 1,484 at priority=2.** Total 1,492. HIGH findings must be triaged individually (read `raw/spotbugs.xml`). Noise-dominant rules (`NM_METHOD_NAMING_CONVENTION`=730, `SF_SWITCH_NO_DEFAULT`=448) should be filtered via a SpotBugs exclude file so real signal surfaces; real-concern patterns that deserve review now: `NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE` (26), `BC_UNCONFIRMED_CAST` (55), `UL_UNRELEASED_LOCK_EXCEPTION_PATH` (1), `WMI_WRONG_MAP_ITERATOR` (2), `ES_COMPARING_STRINGS_WITH_EQ` (2), `MT_CORRECTNESS` category (1). - Raw: `raw/spotbugs.xml`, `raw/spotbugs-summary.json`. diff --git a/scripts/baseline/run-pipeline.sh b/scripts/baseline/run-pipeline.sh index a5650f8f..97970542 100755 --- a/scripts/baseline/run-pipeline.sh +++ b/scripts/baseline/run-pipeline.sh @@ -18,6 +18,8 @@ fi # Clean any prior state in the seed repo. rm -rf "$SEED/.code-intelligence" "$SEED/.osscodeiq" +# Truncate timings file so re-runs don't append stale entries. +: > "$OUT/timings.txt" timer() { local label="$1"; shift @@ -37,13 +39,34 @@ PORT=18080 java -jar "$JAR" serve "$SEED" --port "$PORT" > "$OUT/serve.log" 2>&1 & PID=$! trap "kill $PID 2>/dev/null || true" EXIT -sleep 8 -if curl -sf "http://127.0.0.1:$PORT/actuator/health" > "$OUT/health.json"; then - echo "health=ok" >> "$OUT/timings.txt" +# Poll /api/stats up to 60s (30 x 2s) as the readiness probe. Spring Boot +# cold-start + embedded Neo4j page-cache warm-up is documented 8-16s (see +# CLAUDE.md §Gotchas). We deliberately do NOT poll /actuator/health: the +# GraphHealthIndicator currently reports OUT_OF_SERVICE (503) even after the +# graph has loaded (tracked as a known gap), so it is not a reliable readiness +# signal. /api/stats is the public REST surface and returns graph data iff +# the server has finished starting and loaded the graph. +ready_t0=$(date +%s) +ready_ok="no" +for _ in $(seq 1 30); do + if curl -sf "http://127.0.0.1:$PORT/api/stats" > "$OUT/stats.json"; then + ready_ok="yes"; break + fi + sleep 2 +done +ready_elapsed=$(( $(date +%s) - ready_t0 )) +if [[ "$ready_ok" == "yes" ]]; then + echo "stats=ok ready_after_s=${ready_elapsed}" | tee -a "$OUT/timings.txt" else - echo "health=fail" >> "$OUT/timings.txt" + echo "stats=fail ready_after_s=${ready_elapsed}" | tee -a "$OUT/timings.txt" + echo '{"error":"/api/stats never returned 2xx within 60s"}' > "$OUT/stats.json" fi -curl -sf "http://127.0.0.1:$PORT/api/stats" > "$OUT/stats.json" || true + +# Capture /actuator/health as a diagnostic snapshot (may be 503 today; +# still useful for tracking the health-indicator fix over time). +health_http=$(curl -s -o "$OUT/health.json" -w '%{http_code}' \ + "http://127.0.0.1:$PORT/actuator/health" 2>/dev/null || echo "000") +echo "health_http=${health_http}" | tee -a "$OUT/timings.txt" kill $PID 2>/dev/null || true wait $PID 2>/dev/null || true @@ -54,11 +77,13 @@ def load(p): try: return json.load(open(p)) except Exception: return None t=open("$OUT/timings.txt").read().strip().splitlines() +stats = load("$OUT/stats.json") print(json.dumps({ "seed": "$NAME", "timings": t, - "stats": load("$OUT/stats.json"), - "health_ok": load("$OUT/health.json") is not None, + "stats": stats, + "stats_ok": isinstance(stats, dict) and "graph" in stats, + "health_raw": load("$OUT/health.json"), }, indent=2)) PY cat "$OUT/summary.json"