EKFQ: Cholesky-failure observability counter (partial #593)#621
EKFQ: Cholesky-failure observability counter (partial #593)#621sritchie wants to merge 7 commits into
Conversation
Adds three uint32_t instance members and three public accessors for post-flight observability of correct()'s non-PSD Cholesky abort path. Counters survive init() — a failure burst before a reseed is still diagnostic. Refs #593 item #1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
update() bumps updateCallCount_ once per call. The Cholesky guard inside correct() bumps failedUpdateCount_ and stamps lastFailedCallNum_ before the early return. Pure observability; no algorithmic change. Regression goldens unchanged. Refs #593 item #1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five tests covering: zero-init, monotonic call-count on normal updates, counter bump on a synthetically degenerate measurement, counter persistence across init(), and the batch-form single-increment semantic (one failure per update(), not one per measurement). Refs #593 item #1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three accessors exposing EKFQ's getUpdateCallCount / getFailedUpdateCount / getLastFailedCallNum through the sketch-side AHRS wrapper, plus IsEkfqActive() so ConsoleSerial can decide whether to print the diagnostic line. Refs #593 item #1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the active algorithm is EKFQ, the TASKS command appends one line summarising the Cholesky-failure counter, last-failed call number, and total update() call count. Omitted when Madgwick is active so the output isn't misleading about an algorithm that's not running. Refs #593 item #1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task 4's code-quality review renamed AHRS wrapper methods for self-consistency: - EkfqFailedUpdates → EkfqFailedUpdateCount - EkfqLastFailedCall → EkfqLastFailedCallNum Updates the in-repo plan doc to match the merged code. Also trims the IsEkfqActive() doc comment to match the implementation (removed caller reference per CLAUDE.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EkfqPipeline::Step() calls EKFQ::predict() and EKFQ::correct() separately (to apply comp-fade to TAS between them), bypassing the update() wrapper. The previous increment inside update() never fired in production, so the TASKS console line would have printed "0 total" forever regardless of uptime. Move the increment to the top of correct() so it fires once per production frame. The update() convenience wrapper transitively inherits the increment. Adds a regression test exercising the production path (predict()+correct() called directly) to lock in the fix. Refs #593 item #1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Firmware ArtifactsMain firmware (Gen3 box)
Each External display firmware
Each The huVVer-AVI binary is built but not yet validated on real hardware — see the bring-up checklist for what to verify on first flash. X-Plane plugin
Drop the Built from
|
EKFQ: Cholesky-failure observability counter
When
EKFQ::correct()aborts on a non-PSD Cholesky diagonal (sum <= 0.0fatEKFQ.cpp:601), the entire 8-measurement batch update is silently dropped. The filter falls back to predict-only for that frame and the pilot has no way to know. This PR adds a session-persistent counter and surfaces it via the existingTASKSconsole command when EKFQ is the active algorithm.Strict subset of #593 item #1. Items #2 (stack hoist — filed separately as #617), #3 (log provenance — deferred for coordinated log-with-config work), and #4 (audio-sweep τ — currently a no-op since shared
kCompFadeTauSecmakes sweep.py's hardcoded0.5correct) are out of scope.Changes
EKFQgains three privateuint32_tmembers (updateCallCount_,failedUpdateCount_,lastFailedCallNum_) and three public accessors. Counters surviveinit()— a failure burst before a reseed is still diagnostic.++updateCallCount_fires at the top ofEKFQ::correct(). (Originally at the top ofEKFQ::update()— butEkfqPipeline::Step()callspredict()andcorrect()separately rather than going throughupdate(), so increment atcorrect()is the production-correct site. Caught by the final whole-branch review; fix in commit2ee45b2d.)The Cholesky guard inside
correct()bumpsfailedUpdateCount_and stampslastFailedCallNum_before its earlyreturn.The sketch-side
AHRSwrapper exposes pass-through accessorsEkfqUpdateCallCount(),EkfqFailedUpdateCount(),EkfqLastFailedCallNum(), plusIsEkfqActive().The
TASKSconsole command appends one line when EKFQ is active:(Three values: failure count, last-failed call number, total
correct()call count. Thetotaldenominator was an addition during implementation — the spec called for two values; the third makes the counter ratio interpretable post-flight without external context.)Width
%-16smatchesPrintTaskInfo's existing format for column alignment.6 new native unit tests in
test/test_ekfq/test_ekfq.cpp:test_ekfq_counter_starts_at_zerotest_ekfq_counter_unchanged_on_normal_updatetest_ekfq_counter_bumps_on_degenerate_Stest_ekfq_counter_persists_across_inittest_ekfq_counter_increments_by_one_on_batch_failure(documents batch semantics — one failure-per-correct(), not one per measurement)test_ekfq_counter_advances_via_direct_predict_correct(locks in the production-path fix from2ee45b2d)Relationship to PERF telemetry (#605/#612/#615)
This counter is intentionally separate from the PERF subsystem. PERF measures wall-clock duration of
EkfqCorrectscopes and is compile-gated (-DONSPEED_PERF_ENABLED, only built in theesp32s3-v4p-perfenv). Production firmware carries zero PERF instrumentation by design.PERF cannot answer this counter's question: "of the 208
EkfqCorrectcalls this second, did the Cholesky guard fire?" A scope emits one event per call whethercorrect()did real work or hitsum<=0and bailed. The new counter is always-on in production firmware and per-session persistent.Testing
Bench-test recommendations (post-merge)
total ≈ 6200(30 s × 208 Hz). Iftotalis0, the production-path wiring regressed.0 0 0).totalreflects only post-switchcorrect()calls.totalkeeps climbing,failed updatesstays at 0.Spec → branch drift (cosmetic, not blocking)
Spec line 145 named the branch
chore/ekfq-failed-update-counter; actual branch ischore/ekfq-cholesky-counter. Pure cosmetic.🤖 Generated with Claude Code