Summary
microsvc-staging-decoy appears healthy at the platform level (Argo Synced/Healthy, all deployments 1/1) but functional user flows fail intermittently.
During smoke testing on staging (do-nyc1-staging-decoy), user registration/login requests through api-gateway returned 500 and 504, and user-service logs showed java.lang.OutOfMemoryError: Metaspace.
Environment
- Repo:
speedscale/microsvc
- Cluster context:
do-nyc1-staging-decoy
- Namespace:
banking-app
- Argo app:
argocd/microsvc-staging-decoy
- Revision at test time:
58e3990bd0c250c7504678d6f2fde80881c5e680
Observed behavior
- Health checks pass:
GET /actuator/health (api-gateway): 200
GET /api/healthz (frontend): 200
- Functional checks fail intermittently:
POST /api/users/register (with generateDemoData=true): 500
POST /api/users/login: 500
- Retry test with
generateDemoData=false: 504 for both register and login
user-service logs include:
java.lang.OutOfMemoryError: Metaspace
Impact
Staging looks healthy from Kubernetes/Argo status but core app behavior (register/login) is unreliable.
Repro steps
- Port-forward API gateway service in staging:
kubectl --context do-nyc1-staging-decoy -n banking-app port-forward service/api-gateway 18080:80
- Attempt registration and login against gateway:
POST http://127.0.0.1:18080/api/users/register
POST http://127.0.0.1:18080/api/users/login
- Inspect user-service logs:
kubectl --context do-nyc1-staging-decoy -n banking-app logs deploy/user-service --tail=200
- Observe intermittent
500/504 responses and OutOfMemoryError: Metaspace in logs.
Notes
A staging evidence/checklist doc was added at:
docs/staging-stability-checklist.md
Suggested next actions
- Investigate
user-service JVM memory/metaspace settings and runtime memory limits.
- Verify no regressions from simulation traffic and demo data generation paths.
- Add a functional smoke gate (register/login) to staging validation so this failure mode is caught even when deployments are
Ready.
Summary
microsvc-staging-decoyappears healthy at the platform level (ArgoSynced/Healthy, all deployments1/1) but functional user flows fail intermittently.During smoke testing on staging (
do-nyc1-staging-decoy), user registration/login requests throughapi-gatewayreturned500and504, anduser-servicelogs showedjava.lang.OutOfMemoryError: Metaspace.Environment
speedscale/microsvcdo-nyc1-staging-decoybanking-appargocd/microsvc-staging-decoy58e3990bd0c250c7504678d6f2fde80881c5e680Observed behavior
GET /actuator/health(api-gateway):200GET /api/healthz(frontend):200POST /api/users/register(withgenerateDemoData=true):500POST /api/users/login:500generateDemoData=false:504for both register and loginuser-servicelogs include:java.lang.OutOfMemoryError: MetaspaceImpact
Staging looks healthy from Kubernetes/Argo status but core app behavior (register/login) is unreliable.
Repro steps
kubectl --context do-nyc1-staging-decoy -n banking-app port-forward service/api-gateway 18080:80POST http://127.0.0.1:18080/api/users/registerPOST http://127.0.0.1:18080/api/users/loginkubectl --context do-nyc1-staging-decoy -n banking-app logs deploy/user-service --tail=200500/504responses andOutOfMemoryError: Metaspacein logs.Notes
A staging evidence/checklist doc was added at:
docs/staging-stability-checklist.mdSuggested next actions
user-serviceJVM memory/metaspace settings and runtime memory limits.Ready.