zoohelp-backend is the Rust-first backend infrastructure for ZooHelp, a geolocation-driven animal rescue, adoption, NGO coordination, trust, notification, and community protection platform.
The system is designed around one operational problem:
animal in need -> trusted report -> geospatial prioritization -> nearby helpers/NGOs -> coordinated rescue outcome
ZooHelp is not only an adoption app. The backend is being shaped as a real-time animal protection coordination layer: posts, rescue alerts, chat, nearby search, NGO profiles, trust signals, media moderation, donation intents, support workflows, and AI-assisted operational tooling.
ZooHelp Hybrid Core provides the backend runtime for a modern animal rescue network.
The operational MVP is documented in docs/operational-mvp.md. It keeps the first scope focused on verified NGOs, manual trust review, real rescue coordination, and measurable rescue outcomes before expanding into heavier AI or automation layers.
The platform is intended to support cases such as:
- a person finds an injured animal and posts an urgent request
- GPS coordinates are attached to the rescue post
- nearby users, volunteers, vets, and NGOs are identified
- rescue alerts are generated with deep links and action payloads
- the feed prioritizes urgent and nearby cases
- chat coordinates the rescue operation
- trust signals reduce abuse, fraud, and low-quality reports
- media moderation and AI workers assist content safety
- donations, support tickets, and NGO profiles support the broader ecosystem
The operating model is:
always available -> mostly local relevance -> urgent cases prioritized -> nearby response activated
The core product claim is:
geospatial rescue coordination + trust-aware community feed + NGO operational network + AI-assisted safety
The backend is intentionally split by responsibility:
- Rust owns latency-sensitive, high-concurrency, user-facing systems.
- Python owns ML, automation, moderation, analytics, and experimental intelligence layers.
This separation keeps the operational path fast while allowing the intelligence layer to evolve without slowing the critical rescue flow.
The codebase is organized around practical production constraints:
- keep urgent rescue creation fast and deterministic
- require real geolocation for emergency fan-out
- prioritize nearby cases before generic feed content
- separate core backend from AI workers
- make fraud, trust, moderation, and reporting first-class systems
- preserve mobile/backend contract compatibility with tests
- avoid putting heavy ML inference in the request hot path
- treat observability and readiness as production features, not afterthoughts
[Mobile App]
| HTTPS / WebSocket
v
[Rust API Gateway - Axum/Tokio]
|-- Auth / Users / Sessions
|-- Feed / Posts / Media / Search
|-- Chat HTTP + WebSocket
|-- Geo Nearby / Rescue Alerts
|-- ONG Profiles / Follow / Trust
|-- Donations / Support / Reports
|
| events / jobs
v
[Notification Engine]
|-- nearby recipient selection
|-- rescue alert generation
|-- deep link action payloads
|-- push-token subscription registry
|
v
[PostgreSQL latitude/longitude] [Redis Geospatial Cache - production scale] [NATS/Kafka - planned production queue]
|
v
[Python Intelligence Layer]
|-- image moderation
|-- NLP classification
|-- fraud model experiments
|-- recommendation jobs
|-- analytics and admin automation
Versioned architecture diagrams live in docs/architecture/README.md.
Included diagrams:
- C4 context
- C4 container
- rescue lifecycle
- rescue creation sequence
- notification flow
- event flow
- chat realtime flow
- benchmark evidence flow
The active rescue publication flow is intentionally linear.
- Mobile user writes a rescue/help request.
- Mobile captures photo, urgency, and GPS coordinates.
- Rust validates post payload and media contract.
- Emergency posts must include
latitudeandlongitude. - Rust creates the post contract.
- Fraud text scoring runs in the request path as a cheap deterministic signal.
- Urgent/emergency cases create a durable
rescue_fanout_state. - The rescue fanout worker expands by operational phase and ranks nearby candidates.
- Push jobs are created through the existing durable notification infrastructure.
Estou indocreates a real rescue response and pauses aggressive expansion.- Feed/search/notifications expose the rescue case back to the app with operational status.
Simplified:
mobile GPS -> post create -> validation -> fanout state -> phased nearby push -> response -> feed/chat coordination
The current backend supports a production-shaped rescue coordination contract.
Emergency posts require:
postType = emergencyorurgent = truelatitudelongitude- description
- location label
When accepted, the backend creates a durable fanout state and returns operational rescue metadata:
{
"rescueFanoutStateId": "uuid",
"rescueOperational": {
"fanoutPhase": 1,
"helpGoingCount": 0,
"helpArrivedCount": 0,
"operationalLabel": "Precisa de ajuda"
}
}The fanout worker is controlled by RESCUE_FANOUT_WORKER_ENABLED. It claims due fanout states with row locking, selects candidates using geo filters plus operational score, creates push jobs, records attempts, and pauses expansion when a real helper confirms Estou indo.
MVP fanout phases:
| Phase | Radius | Purpose |
|---|---|---|
| 1 | 0.3 km |
sniper local, critical-alert users and recently active helpers |
| 2 | 0.7 km |
controlled local expansion |
| 3 | 1.0 km |
neighborhood expansion |
| 4 | 3.0 km |
broader city-area expansion |
| 5 | verified/ONG/provider escalation | include trusted institutional actors with wider radius |
| 6 | 10 km specialists |
local specialist search |
| 7 | 30 km specialists |
regional specialist search |
| 8 | 100 km specialists |
state-level specialist search |
| 9 | 300 km agencies/specialists |
environmental agency or rare-case escalation |
Specialist escalation uses rescue_specialist_providers and rescue_escalation_attempts. It searches for competent responders by animal scope and provider type before falling back to verified/ONG/vet/admin users. It does not broadcast regional alerts to generic unverified users.
The old rescue alert preview endpoint remains useful for contract preview, but production delivery should use the persisted fanout state, specialist escalation state and durable push jobs.
POST /v1/auth/loginPOST /v1/auth/registerPOST /v1/auth/password-resetDELETE /v1/me
Supports personal users, NGOs, and vet-style accounts at the frontend contract level.
GET /v1/feedPOST /v1/postsGET /v1/posts/:idPOST /v1/posts/:id/likePOST /v1/posts/:id/commentsPOST /v1/posts/:id/reportPOST /v1/posts/:id/rescue-response
Posts support adoption, lost, found, emergency, campaign, and general community post types.
POST /v1/media/upload-intents
Cloudinary upload-intent flow is used for image/video media before post creation.
GET /v1/chat/roomsGET /v1/chat/rooms/:idGET /v1/chat/rooms/:id/messagesPOST /v1/chat/rooms/:id/messagesGET /v1/chat/rooms/:id/ws
HTTP chat and WebSocket room path are present for real-time coordination.
GET /v1/geo/nearby
Nearby logic is based on geographic distance and is aligned with rescue, feed, and map usage.
POST /v1/posts/:id/rescue-responsePOST /v1/rescue/active/:id/responses
The response endpoint records helper intent such as confirmed or arrived. A confirmed response means someone is going, not that the case is resolved. The backend must not mark a post or rescue session as resolved from this action alone.
GET /v1/notificationsPATCH /v1/notifications/:id/mark-as-readPOST /v1/notifications/:id/ackPOST /v1/notifications/push-tokenPOST /v1/notifications/rescue-alerts/:post_id/preview
The notification layer supports rescue alert modeling, push token registration, dedupe keys, categories, deep links, and critical flags.
GET /v1/ongsGET /v1/ongs/:idPOST /v1/ongs/:id/followGET /v1/trust/score/:subject_idPOST /v1/donations/intentsGET /v1/support/metaGET /v1/support/ticketsPOST /v1/support/ticketsGET /v1/search
Python is reserved for auxiliary intelligence and automation, not the latency-sensitive request core.
Intended Python responsibilities:
- image moderation
- content classification
- NLP risk tagging
- advanced recommendations
- analytics pipelines
- internal dashboards
- fraud model experiments
- admin automation scripts
Production rule:
Rust handles real-time user operations. Python handles intelligence and background automation.
The practical decision model for rescue visibility is based on signals that can be measured and replayed:
- emergency status
- user location
- post coordinates
- radius in kilometers
- recipient subscription radius
- trust state
- content category
- notification dedupe state
- feed freshness and proximity
The rescue alert radius is phase-based:
- phase 1 sniper local:
0.3 km - phase 2:
0.7 km - phase 3:
1.0 km - phase 4:
3.0 km - phase 5: verified/ONG/provider escalation with wider operational reach
30 m may remain a technical lower bound for validation or preview paths, but it is not the operational phase-1 radius.
Compact distance rule:
Candidate ordering is not distance-only. The worker ranks by expected operational response using proximity, recent activity, trust, role, verification, critical-alert preference, and fatigue/cooldown.
Current local validation is based on automated tests and contract checks.
cargo test currently validates:
- frontend feed filters
- auth register frontend shape
- post validation
- media upload contract
- emergency coordinate requirement
- emergency rescue fanout state creation
- geospatial distance calculations
- notification recipient filtering
- fraud scoring
- trust scoring
- JWT/password auth services
Latest local result for the fanout integration:
cargo check passed
cargo test compiled; route tests hit local DB pool timeout
The mobile app type contract has been validated with:
pnpm --filter zoohelp-mobile run typecheckThis validates the TypeScript contract across:
- post creation
- latitude/longitude forwarding
- rescue alert response typing
- mobile/backend post mapping
- feed micro-composer integration
The architecture is designed for low-latency rescue coordination, but performance claims should be backed by measured output.
Executable benchmark assets live in benchmarks/.
Quick commands:
k6 run .\benchmarks\k6\http-rescue-feed.js
k6 run .\benchmarks\k6\websocket-chat.js
locust -f .\benchmarks\locust\locustfile.py --host http://127.0.0.1:8080
vegeta attack -duration=60s -rate=100 -targets=.\benchmarks\vegeta\feed.targets | vegeta reportCandidate WebSocket scale run:
$env:ROOM_ID = "<chat-room-id>"
$env:ACCESS_TOKEN = "<jwt>"
$env:K6_WS_VUS = "10000"
$env:K6_DURATION = "10m"
k6 run .\benchmarks\k6\websocket-chat.jsBenchmark reports should be attached under benchmarks/reports/ before using any public throughput claim.
Useful evidence for production hardening:
- post creation latency p50/p95/p99
- feed latency p50/p95/p99
- WebSocket connection count and fan-out latency
- rescue alert fan-out time by recipient count
- Redis geospatial query latency
- PostgreSQL coordinate query latency
- push delivery success and delay by provider
- image upload success rate and moderation delay
- report/fraud false-positive review rate
No unsupported global-scale throughput claim should be treated as production proof until benchmarked with PostgreSQL, Redis, queue, upload, WebSocket, and push delivery enabled.
The production target is:
Mobile App
-> Cloudflare / Edge Protection
-> Rust API Gateway
-> PostgreSQL latitude/longitude
-> Redis Geospatial / Rate Limit / Session Cache
-> NATS or Kafka Event Bus
-> Notification Delivery Workers
-> FCM / APNs
-> Python AI Workers
-> Observability Stack
Recommended durability split:
| Layer | Production Role |
|---|---|
| PostgreSQL | authoritative relational state and latitude/longitude storage |
| Redis | low-latency geospatial lookup, cache, rate limits |
| NATS/Kafka | durable rescue alert and moderation events target |
| Rust workers | notification fan-out, realtime coordination, trust/fraud core |
| Python workers | AI moderation, NLP, analytics, model experiments |
| Cloudinary/S3/R2 | media storage and delivery |
| FCM/APNs | push notification delivery |
Current queue reality:
- critical rescue notification state is persisted in PostgreSQL through
notification_events,push_delivery_jobs,rescue_fanout_states,rescue_fanout_attempts,rescue_responses,rescue_specialist_providers, andrescue_escalation_attempts - workers claim due jobs with row locking and persist retry/dead-letter state
- NATS is present for cross-process realtime fanout, but the current implementation uses plain pub/sub, not JetStream/Kafka-style durable replay
- WebSocket broadcast channels are in-memory delivery surfaces only; the authoritative chat and rescue history remains in PostgreSQL
ZooHelp is a trust-sensitive system. The backend assumes abuse will happen.
Security and integrity controls:
- JWT-based auth surface
- password hashing service
- refresh tokens persisted in PostgreSQL with revocation timestamps
- account deletion endpoint
- report endpoint for content moderation
- trust scoring service
- fraud text scoring
- media moderation queue status
- push-token registration contract
- support tickets and operational escalation
- validation on critical request payloads
- emergency geolocation requirement
Production hardening still required:
- access-token revocation before expiry through a session table,
jtidenylist, or user token-version check - role-based authorization beyond contract shape
- full audit log
- rate limits enforced at edge and API levels
- durable report/moderation workflow
- FCM/APNs delivery receipts
- durable NATS JetStream/Kafka consumers for replayable realtime/domain events
- API restart and worker restart evidence proving no critical rescue/chat state is lost
Current reliability-oriented surfaces:
/healthz/readyz/metrics/v1/observability- structured rescue alert logging
- validation tests for frontend/backend contracts
- Docker Compose local infrastructure
Production reliability targets:
- readiness tied to PostgreSQL/Redis/NATS availability
- OpenTelemetry traces across post -> alert -> push delivery
- Sentry or equivalent error aggregation
- Prometheus dashboards for API, queue, push, and websocket metrics
- alerting for notification delay, failed uploads, auth failures, and WebSocket disconnect spikes
Production readiness gate:
docs/production-readiness.md- persistence completeness
- durable notifications
- queue guarantees
- retries and DLQ
- staging evidence before public scale claims
Core backend:
BIND_ADDR
DATABASE_URL
REDIS_URL
NATS_URL
AI_WORKER_URL
JWT_SECRET
ACCESS_TOKEN_TTL_MINUTES
REFRESH_TOKEN_TTL_DAYS
PUSH_WORKER_ENABLED
RESCUE_FANOUT_WORKER_ENABLED
POSTGIS_ENABLED
Production guardrails:
- outside development,
JWT_SECRETmust be a real non-placeholder secret with at least 32 characters - outside development,
ACCESS_TOKEN_TTL_MINUTESmust be between1and60 - outside development,
PUSH_WORKER_ENABLED=trueandRESCUE_FANOUT_WORKER_ENABLED=trueare required NATS_URLis required outside development, but NATS currently supports realtime fanout only; durable queue semantics still come from PostgreSQL job tables until JetStream/Kafka is implemented
Cloudinary media:
CLOUDINARY_CLOUD_NAME
CLOUDINARY_API_KEY
CLOUDINARY_API_SECRET
CLOUDINARY_URL
Mobile/API integration:
EXPO_PUBLIC_API_BASE_URL
Operational note:
Do not commit .env, secrets, Cloudinary API secrets, tokens, database dumps, local target artifacts, or private operational datasets.
Requirements:
- Rust toolchain
- Docker
- Docker Compose
- PostgreSQL
- Redis
- Python 3.11+
- pnpm for the mobile workspace
Start infrastructure:
cd backend
cp .env.example .env
docker compose up -dRun backend:
cargo runRun tests:
cargo fmt --check
cargo testRun mobile type contract:
cd ../client
pnpm --filter zoohelp-mobile run typecheckbackend/
Cargo.toml
docker-compose.yml
migrations/
0001_init.sql
...
0018_rescue_fanout_progressive.sql
src/
main.rs
config.rs
domain.rs
error.rs
state.rs
routes/
auth.rs
chat.rs
donations.rs
feed.rs
geo.rs
media.rs
notifications.rs
ongs.rs
posts.rs
rescue.rs
search.rs
support.rs
trust.rs
services/
auth.rs
fraud.rs
geo.rs
notifications.rs
rescue_fanout.rs
rate_limit.rs
trust.rs
python-workers/
app/
main.py
requirements.txt
This backend is production-shaped, but not yet fully production-complete.
Strong current surfaces:
- Rust Axum API structure
- mobile/backend contract alignment
- auth/register contract
- feed/post/search/ONG/support/donation routes
- media upload-intent contract
- chat HTTP and WebSocket route surface
- geospatial rescue alert modeling
- emergency coordinate validation
- notification subscription and alert preview contracts
- tests for the key frontend/backend paths
Known hardening gaps before real public scale:
- remove any remaining public seed/mock fallback paths and prove critical endpoints are PostgreSQL-backed
- move realtime/domain event delivery from plain NATS pub/sub to durable JetStream/Kafka consumers where replay is required
- add immediate access-token invalidation for banned, deleted, or compromised accounts
- deliver push notifications through FCM/APNs workers
- enforce production rate limits and abuse controls
- complete durable moderation and report review flows
- publish measured benchmark reports for PostgreSQL, Redis, queue, upload, WebSocket, and push delivery paths
- run and document API restart plus worker restart tests for chat, rescue sessions, push jobs, and
Estou indo - add production observability dashboards and alerting
ZooHelp Hybrid Core is intended to become a global animal rescue coordination backend.
The strategic direction is narrow and operational:
- fast rescue post creation
- real geolocation
- nearby helper discovery
- trusted community coordination
- NGO operational profiles
- chat-based response
- donation and support infrastructure
- AI-assisted moderation and fraud prevention
The operating thesis is:
simple mobile action -> reliable backend coordination -> nearby human response -> measurable animal impact
