Skip to content

Add fraud-service Prometheus metrics and error propagation for external APIs#153

Merged
kenahrens merged 3 commits into
masterfrom
feat/fraud-service-metrics-and-errors
Jun 10, 2026
Merged

Add fraud-service Prometheus metrics and error propagation for external APIs#153
kenahrens merged 3 commits into
masterfrom
feat/fraud-service-metrics-and-errors

Conversation

@kenahrens

Copy link
Copy Markdown
Member

Summary

  • Adds fraud_external_requests_total and fraud_external_request_duration_seconds Prometheus metrics tracking outbound calls to Stripe, Sift, and MaxMind
  • When all three providers return errors AND a transaction has elevated risk (score > 0.3), rejects the transaction with reason external-verification-unavailable — making fraud-service failures visible in the Grafana error dashboard via the existing transactions-service rejection path
  • Adds two new panels to the Banking Application Errors dashboard: "Third-party API Health" (bar chart of non-2xx responses by provider) and "Fraud Check Rejections" (stat showing rejection counts by reason)

Test plan

  • go build ./... passes
  • go test ./... passes
  • Verify fraud-service metrics appear at :9091/metrics after deploy
  • Verify new dashboard panels render in Grafana
  • Verify external-verification-unavailable rejections show up in the Errors by Endpoint table

🤖 Generated with Claude Code

kenahrens and others added 3 commits June 10, 2026 15:31
When all three external providers (Stripe, Sift, MaxMind) return errors,
medium-risk transactions are now rejected with reason
"external-verification-unavailable" — making the failures visible in the
Grafana error dashboard. New Prometheus metrics track outbound call
status and latency per provider.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Servlet filter returns 503 on ~30% of requests during the :10 and :40
minute windows (2-min duration each), matching the sim client error
spike pattern. Skips /actuator endpoints. Configurable via
error-spike.* properties; enabled by default.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Deploys Promtail on every node to tail pod logs from the banking-app
namespace (excluding sim-client noise) and ship to Loki. Adds an
"Application Logs — Errors" panel to the error dashboard filtering for
ERROR/WARN/Exception/panic lines alongside the existing eBPF traffic
panel.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kenahrens kenahrens merged commit e8b0bbc into master Jun 10, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant