Implement automatic hashrate anomaly detection and ASIC recovery#1655
Implement automatic hashrate anomaly detection and ASIC recovery#1655cniweb wants to merge 7 commits into
Conversation
Integrates TCH-specific stability improvements: when hashrate drops below a dynamic threshold for 3 consecutive polls, the ASICs are automatically reinitialized using live recovery mode (ASIC_INIT_RECOVERY) without a full system reboot. This prevents sustained hashrate loss caused by TPS546 or chip domain failures, particularly beneficial for multi-ASIC configurations like Bitaxe 800 (Gamma Turbo) and SupraHex. The lower threshold is computed dynamically based on expected hashrate, ASIC count, and hash domain count. Recovery is skipped when mining is paused. Measurements are cleared after recovery to prevent hashrate spikes from stale counters. Inspired by TinyChipHub ESP-Miner-TCH stability improvements. Agent-Logs-Url: https://github.com/cniweb/ESP-Miner/sessions/a360443a-9b9c-42db-9b77-f65bed100cbe Co-authored-by: cniweb <2334906+cniweb@users.noreply.github.com>
- Separate low/high anomaly detection: low hashrate requires previous highest (prevents ramp-up false positives), high spikes detected independently - Add ANOMALY_CONSECUTIVE_THRESHOLD and RECOVERY_STABILIZATION_DELAY_MS named constants - Add detailed comment explaining the 2.0x domain contribution margin in threshold calculation Agent-Logs-Url: https://github.com/cniweb/ESP-Miner/sessions/a360443a-9b9c-42db-9b77-f65bed100cbe Co-authored-by: cniweb <2334906+cniweb@users.noreply.github.com>
- Rename low_hashrate_count to consecutive_anomaly_count since it tracks both low and high anomalies - Move check_hashrate_anomaly() before highest_hashrate update so the low-anomaly guard (current < highest) correctly ignores ramp-up - Clarify fallback threshold comment Agent-Logs-Url: https://github.com/cniweb/ESP-Miner/sessions/a360443a-9b9c-42db-9b77-f65bed100cbe Co-authored-by: cniweb <2334906+cniweb@users.noreply.github.com>
Add automatic hashrate anomaly detection and ASIC recovery (TCH stability improvements)
| // Track highest observed hashrate (after anomaly check) | ||
| if (current_hashrate > highest_hashrate) { | ||
| highest_hashrate = current_hashrate; | ||
| ESP_LOGI(TAG, "New highest hashrate: %.3f Gh/s", highest_hashrate); | ||
| } |
There was a problem hiding this comment.
why? if we're looking for abnormal hashrates we shouldnt be tracking the highest, and this isnt used anywhere
| // Compute dynamic lower hashrate threshold based on hardware configuration. | ||
| // The formula detects when hashrate drops by more than twice the contribution | ||
| // of a single hash domain on a single ASIC. Multiplying by 2.0 provides a | ||
| // margin so that losing one domain triggers detection, while normal variance | ||
| // (which is less than one full domain) does not cause false positives. | ||
| float expected_hr = GLOBAL_STATE->POWER_MANAGEMENT_MODULE.expected_hashrate; | ||
| if (expected_hr > 0.0f && asic_count > 0 && hash_domains > 0) { | ||
| float per_domain_contribution = expected_hr / asic_count / hash_domains; | ||
| lower_threshold_hashrate_pct = 1.0f - (per_domain_contribution * 2.0f / expected_hr); | ||
| ESP_LOGI(TAG, "Hashrate anomaly lower threshold: %.0f%% of expected", lower_threshold_hashrate_pct * 100.0f); | ||
| } | ||
|
|
There was a problem hiding this comment.
why not a simpler 75% threshold? the more chips you add the higher the threshold becomes, the gamma turbos threshold would be 0.875 and a nerdoctaxe would be 0.992. the gammas threshold alone is 0.5, meaning this code actually needs 2 domains to go down before it triggers the reset.
| // Reset measurements to avoid hashrate spike from stale counters | ||
| hashrate_monitor_reset_measurements(GLOBAL_STATE); | ||
| } else { | ||
| ESP_LOGE(TAG, "ASIC recovery failed - chip count 0"); |
There was a problem hiding this comment.
this should be a full reboot, instead of leaving the user in a worse state than before
…implify warmup logic, clean up comments Agent-Logs-Url: https://github.com/cniweb/ESP-Miner/sessions/0ee049d8-a3ca-4b05-8d8c-6c13556aa397 Co-authored-by: cniweb <2334906+cniweb@users.noreply.github.com>
…lure Agent-Logs-Url: https://github.com/cniweb/ESP-Miner/sessions/0ee049d8-a3ca-4b05-8d8c-6c13556aa397 Co-authored-by: cniweb <2334906+cniweb@users.noreply.github.com>
Fix hashrate anomaly detection: simplify thresholds, add reboot fallback
|
This is overly complex. All cases of diminished hashrate are happening if one or more domains of an ASIC are dropping out, e.g. the hashcounter register is not changing anymore. This can be detected without any heuristics. The problem however is that this is almost always a sign of a crappy power supply. If the device is powered by a decent power supply, this never happens. So just simply restarting the ASIC hides this problem. Maybe we should add a notification messages first, when one or more of the domains dropped out, so the user can try to either adjust frequency or voltage to remedy the issue. |
This pull request introduces an automatic hashrate anomaly detection and recovery mechanism to the
hashrate_monitor_task. The main goal is to improve system robustness by monitoring for abnormal hashrate drops or spikes and automatically reinitializing the ASICs if anomalies persist, thus reducing the need for manual intervention or full system reboots. The detection thresholds are dynamically computed based on hardware configuration to minimize false positives.Hashrate anomaly detection and recovery:
check_hashrate_anomalyfunction that monitors for low or high hashrate anomalies and triggers ASIC live recovery if anomalies persist for a configurable number of consecutive polls. The function is documented inhashrate_monitor_task.hand called from the main monitoring loop before updating the highest observed hashrate. [1] [2] [3]Integration and robustness improvements:
asic_init.handdriver/uart.hto support the new recovery logic.