Helpin Hybrid Core

zoohelp-backend is the Rust-first backend infrastructure for ZooHelp, a geolocation-driven animal rescue, adoption, NGO coordination, trust, notification, and community protection platform.

The system is designed around one operational problem:

animal in need -> trusted report -> geospatial prioritization -> nearby helpers/NGOs -> coordinated rescue outcome

ZooHelp is not only an adoption app. The backend is being shaped as a real-time animal protection coordination layer: posts, rescue alerts, chat, nearby search, NGO profiles, trust signals, media moderation, donation intents, support workflows, and AI-assisted operational tooling.

Objective

ZooHelp Hybrid Core provides the backend runtime for a modern animal rescue network.

The operational MVP is documented in docs/operational-mvp.md. It keeps the first scope focused on verified NGOs, manual trust review, real rescue coordination, and measurable rescue outcomes before expanding into heavier AI or automation layers.

The platform is intended to support cases such as:

a person finds an injured animal and posts an urgent request
GPS coordinates are attached to the rescue post
nearby users, volunteers, vets, and NGOs are identified
rescue alerts are generated with deep links and action payloads
the feed prioritizes urgent and nearby cases
chat coordinates the rescue operation
trust signals reduce abuse, fraud, and low-quality reports
media moderation and AI workers assist content safety
donations, support tickets, and NGO profiles support the broader ecosystem

The operating model is:

always available -> mostly local relevance -> urgent cases prioritized -> nearby response activated

Product Positioning

The core product claim is:

geospatial rescue coordination + trust-aware community feed + NGO operational network + AI-assisted safety

The backend is intentionally split by responsibility:

Rust owns latency-sensitive, high-concurrency, user-facing systems.
Python owns ML, automation, moderation, analytics, and experimental intelligence layers.

This separation keeps the operational path fast while allowing the intelligence layer to evolve without slowing the critical rescue flow.

Design Principles

The codebase is organized around practical production constraints:

keep urgent rescue creation fast and deterministic
require real geolocation for emergency fan-out
prioritize nearby cases before generic feed content
separate core backend from AI workers
make fraud, trust, moderation, and reporting first-class systems
preserve mobile/backend contract compatibility with tests
avoid putting heavy ML inference in the request hot path
treat observability and readiness as production features, not afterthoughts

Architecture Overview

[Mobile App]
   |  HTTPS / WebSocket
   v
[Rust API Gateway - Axum/Tokio]
   |-- Auth / Users / Sessions
   |-- Feed / Posts / Media / Search
   |-- Chat HTTP + WebSocket
   |-- Geo Nearby / Rescue Alerts
   |-- ONG Profiles / Follow / Trust
   |-- Donations / Support / Reports
   |
   | events / jobs
   v
[Notification Engine]
   |-- nearby recipient selection
   |-- rescue alert generation
   |-- deep link action payloads
   |-- push-token subscription registry
   |
   v
[PostgreSQL latitude/longitude]  [Redis Geospatial Cache - production scale]  [NATS/Kafka - planned production queue]
   |
   v
[Python Intelligence Layer]
   |-- image moderation
   |-- NLP classification
   |-- fraud model experiments
   |-- recommendation jobs
   |-- analytics and admin automation

Visual Architecture

Versioned architecture diagrams live in docs/architecture/README.md.

Included diagrams:

C4 context
C4 container
rescue lifecycle
rescue creation sequence
notification flow
event flow
chat realtime flow
benchmark evidence flow

Runtime Pipeline

The active rescue publication flow is intentionally linear.

Mobile user writes a rescue/help request.
Mobile captures photo, urgency, and GPS coordinates.
Rust validates post payload and media contract.
Emergency posts must include latitude and longitude.
Rust creates the post contract.
Fraud text scoring runs in the request path as a cheap deterministic signal.
Urgent/emergency cases create a durable rescue_fanout_state.
The rescue fanout worker expands by operational phase and ranks nearby candidates.
Push jobs are created through the existing durable notification infrastructure.
Estou indo creates a real rescue response and pauses aggressive expansion.
Feed/search/notifications expose the rescue case back to the app with operational status.

Simplified:

mobile GPS -> post create -> validation -> fanout state -> phased nearby push -> response -> feed/chat coordination

Operational Rescue Alert Model

The current backend supports a production-shaped rescue coordination contract.

Emergency posts require:

postType = emergency or urgent = true
latitude
longitude
description
location label

When accepted, the backend creates a durable fanout state and returns operational rescue metadata:

{
  "rescueFanoutStateId": "uuid",
  "rescueOperational": {
    "fanoutPhase": 1,
    "helpGoingCount": 0,
    "helpArrivedCount": 0,
    "operationalLabel": "Precisa de ajuda"
  }
}

The fanout worker is controlled by RESCUE_FANOUT_WORKER_ENABLED. It claims due fanout states with row locking, selects candidates using geo filters plus operational score, creates push jobs, records attempts, and pauses expansion when a real helper confirms Estou indo.

MVP fanout phases:

Phase	Radius	Purpose
1	`0.3 km`	sniper local, critical-alert users and recently active helpers
2	`0.7 km`	controlled local expansion
3	`1.0 km`	neighborhood expansion
4	`3.0 km`	broader city-area expansion
5	verified/ONG/provider escalation	include trusted institutional actors with wider radius
6	`10 km` specialists	local specialist search
7	`30 km` specialists	regional specialist search
8	`100 km` specialists	state-level specialist search
9	`300 km` agencies/specialists	environmental agency or rare-case escalation

Specialist escalation uses rescue_specialist_providers and rescue_escalation_attempts. It searches for competent responders by animal scope and provider type before falling back to verified/ONG/vet/admin users. It does not broadcast regional alerts to generic unverified users.

The old rescue alert preview endpoint remains useful for contract preview, but production delivery should use the persisted fanout state, specialist escalation state and durable push jobs.

Core Backend Surface

Auth and Identity

POST /v1/auth/login
POST /v1/auth/register
POST /v1/auth/password-reset
DELETE /v1/me

Supports personal users, NGOs, and vet-style accounts at the frontend contract level.

Feed and Posts

GET /v1/feed
POST /v1/posts
GET /v1/posts/:id
POST /v1/posts/:id/like
POST /v1/posts/:id/comments
POST /v1/posts/:id/report
POST /v1/posts/:id/rescue-response

Posts support adoption, lost, found, emergency, campaign, and general community post types.

Media

POST /v1/media/upload-intents

Cloudinary upload-intent flow is used for image/video media before post creation.

Chat

GET /v1/chat/rooms
GET /v1/chat/rooms/:id
GET /v1/chat/rooms/:id/messages
POST /v1/chat/rooms/:id/messages
GET /v1/chat/rooms/:id/ws

HTTP chat and WebSocket room path are present for real-time coordination.

Geolocation

GET /v1/geo/nearby

Nearby logic is based on geographic distance and is aligned with rescue, feed, and map usage.

Rescue Coordination

POST /v1/posts/:id/rescue-response
POST /v1/rescue/active/:id/responses

The response endpoint records helper intent such as confirmed or arrived. A confirmed response means someone is going, not that the case is resolved. The backend must not mark a post or rescue session as resolved from this action alone.

Notifications

GET /v1/notifications
PATCH /v1/notifications/:id/mark-as-read
POST /v1/notifications/:id/ack
POST /v1/notifications/push-token
POST /v1/notifications/rescue-alerts/:post_id/preview

The notification layer supports rescue alert modeling, push token registration, dedupe keys, categories, deep links, and critical flags.

NGOs, Trust, Donations, Support, Search

GET /v1/ongs
GET /v1/ongs/:id
POST /v1/ongs/:id/follow
GET /v1/trust/score/:subject_id
POST /v1/donations/intents
GET /v1/support/meta
GET /v1/support/tickets
POST /v1/support/tickets
GET /v1/search

Hybrid Intelligence Layer

Python is reserved for auxiliary intelligence and automation, not the latency-sensitive request core.

Intended Python responsibilities:

image moderation
content classification
NLP risk tagging
advanced recommendations
analytics pipelines
internal dashboards
fraud model experiments
admin automation scripts

Production rule:

Rust handles real-time user operations. Python handles intelligence and background automation.

Geospatial Decision Framework

The practical decision model for rescue visibility is based on signals that can be measured and replayed:

emergency status
user location
post coordinates
radius in kilometers
recipient subscription radius
trust state
content category
notification dedupe state
feed freshness and proximity

The rescue alert radius is phase-based:

phase 1 sniper local: 0.3 km
phase 2: 0.7 km
phase 3: 1.0 km
phase 4: 3.0 km
phase 5: verified/ONG/provider escalation with wider operational reach

30 m may remain a technical lower bound for validation or preview paths, but it is not the operational phase-1 radius.

Compact distance rule:

$$recipient\_eligible = distance(post, subscriber) <= min(phase\_radius, subscriber\_radius)$$

Candidate ordering is not distance-only. The worker ranks by expected operational response using proximity, recent activity, trust, role, verification, critical-alert preference, and fatigue/cooldown.

Operational Evidence

Current local validation is based on automated tests and contract checks.

Backend Test Surface

cargo test currently validates:

frontend feed filters
auth register frontend shape
post validation
media upload contract
emergency coordinate requirement
emergency rescue fanout state creation
geospatial distance calculations
notification recipient filtering
fraud scoring
trust scoring
JWT/password auth services

Latest local result for the fanout integration:

cargo check passed
cargo test compiled; route tests hit local DB pool timeout

Mobile Contract Validation

The mobile app type contract has been validated with:

pnpm --filter zoohelp-mobile run typecheck

This validates the TypeScript contract across:

post creation
latitude/longitude forwarding
rescue alert response typing
mobile/backend post mapping
feed micro-composer integration

Performance and Scaling Notes

The architecture is designed for low-latency rescue coordination, but performance claims should be backed by measured output.

Executable benchmark assets live in benchmarks/.

Quick commands:

k6 run .\benchmarks\k6\http-rescue-feed.js
k6 run .\benchmarks\k6\websocket-chat.js
locust -f .\benchmarks\locust\locustfile.py --host http://127.0.0.1:8080
vegeta attack -duration=60s -rate=100 -targets=.\benchmarks\vegeta\feed.targets | vegeta report

Candidate WebSocket scale run:

$env:ROOM_ID = "<chat-room-id>"
$env:ACCESS_TOKEN = "<jwt>"
$env:K6_WS_VUS = "10000"
$env:K6_DURATION = "10m"
k6 run .\benchmarks\k6\websocket-chat.js

Benchmark reports should be attached under benchmarks/reports/ before using any public throughput claim.

Useful evidence for production hardening:

post creation latency p50/p95/p99
feed latency p50/p95/p99
WebSocket connection count and fan-out latency
rescue alert fan-out time by recipient count
Redis geospatial query latency
PostgreSQL coordinate query latency
push delivery success and delay by provider
image upload success rate and moderation delay
report/fraud false-positive review rate

No unsupported global-scale throughput claim should be treated as production proof until benchmarked with PostgreSQL, Redis, queue, upload, WebSocket, and push delivery enabled.

Production Architecture Target

The production target is:

Mobile App
  -> Cloudflare / Edge Protection
  -> Rust API Gateway
  -> PostgreSQL latitude/longitude
  -> Redis Geospatial / Rate Limit / Session Cache
  -> NATS or Kafka Event Bus
  -> Notification Delivery Workers
  -> FCM / APNs
  -> Python AI Workers
  -> Observability Stack

Recommended durability split:

Layer	Production Role
PostgreSQL	authoritative relational state and latitude/longitude storage
Redis	low-latency geospatial lookup, cache, rate limits
NATS/Kafka	durable rescue alert and moderation events target
Rust workers	notification fan-out, realtime coordination, trust/fraud core
Python workers	AI moderation, NLP, analytics, model experiments
Cloudinary/S3/R2	media storage and delivery
FCM/APNs	push notification delivery

Current queue reality:

critical rescue notification state is persisted in PostgreSQL through notification_events, push_delivery_jobs, rescue_fanout_states, rescue_fanout_attempts, rescue_responses, rescue_specialist_providers, and rescue_escalation_attempts
workers claim due jobs with row locking and persist retry/dead-letter state
NATS is present for cross-process realtime fanout, but the current implementation uses plain pub/sub, not JetStream/Kafka-style durable replay
WebSocket broadcast channels are in-memory delivery surfaces only; the authoritative chat and rescue history remains in PostgreSQL

Security and Trust Model

ZooHelp is a trust-sensitive system. The backend assumes abuse will happen.

Security and integrity controls:

JWT-based auth surface
password hashing service
refresh tokens persisted in PostgreSQL with revocation timestamps
account deletion endpoint
report endpoint for content moderation
trust scoring service
fraud text scoring
media moderation queue status
push-token registration contract
support tickets and operational escalation
validation on critical request payloads
emergency geolocation requirement

Production hardening still required:

access-token revocation before expiry through a session table, jti denylist, or user token-version check
role-based authorization beyond contract shape
full audit log
rate limits enforced at edge and API levels
durable report/moderation workflow
FCM/APNs delivery receipts
durable NATS JetStream/Kafka consumers for replayable realtime/domain events
API restart and worker restart evidence proving no critical rescue/chat state is lost

Reliability Controls

Current reliability-oriented surfaces:

/healthz
/readyz
/metrics
/v1/observability
structured rescue alert logging
validation tests for frontend/backend contracts
Docker Compose local infrastructure

Production reliability targets:

readiness tied to PostgreSQL/Redis/NATS availability
OpenTelemetry traces across post -> alert -> push delivery
Sentry or equivalent error aggregation
Prometheus dashboards for API, queue, push, and websocket metrics
alerting for notification delay, failed uploads, auth failures, and WebSocket disconnect spikes

Production readiness gate:

docs/production-readiness.md
persistence completeness
durable notifications
queue guarantees
retries and DLQ
staging evidence before public scale claims

Environment Variables

Core backend:

BIND_ADDR
DATABASE_URL
REDIS_URL
NATS_URL
AI_WORKER_URL
JWT_SECRET
ACCESS_TOKEN_TTL_MINUTES
REFRESH_TOKEN_TTL_DAYS
PUSH_WORKER_ENABLED
RESCUE_FANOUT_WORKER_ENABLED
POSTGIS_ENABLED

Production guardrails:

outside development, JWT_SECRET must be a real non-placeholder secret with at least 32 characters
outside development, ACCESS_TOKEN_TTL_MINUTES must be between 1 and 60
outside development, PUSH_WORKER_ENABLED=true and RESCUE_FANOUT_WORKER_ENABLED=true are required
NATS_URL is required outside development, but NATS currently supports realtime fanout only; durable queue semantics still come from PostgreSQL job tables until JetStream/Kafka is implemented

Cloudinary media:

CLOUDINARY_CLOUD_NAME
CLOUDINARY_API_KEY
CLOUDINARY_API_SECRET
CLOUDINARY_URL

Mobile/API integration:

EXPO_PUBLIC_API_BASE_URL

Operational note:

Do not commit .env, secrets, Cloudinary API secrets, tokens, database dumps, local target artifacts, or private operational datasets.

Running Locally

Requirements:

Rust toolchain
Docker
Docker Compose
PostgreSQL
Redis
Python 3.11+
pnpm for the mobile workspace

Start infrastructure:

cd backend
cp .env.example .env
docker compose up -d

Run backend:

cargo run

Run tests:

cargo fmt --check
cargo test

Run mobile type contract:

cd ../client
pnpm --filter zoohelp-mobile run typecheck

Repository Layout

backend/
  Cargo.toml
  docker-compose.yml
  migrations/
    0001_init.sql
    ...
    0018_rescue_fanout_progressive.sql
  src/
    main.rs
    config.rs
    domain.rs
    error.rs
    state.rs
    routes/
      auth.rs
      chat.rs
      donations.rs
      feed.rs
      geo.rs
      media.rs
      notifications.rs
      ongs.rs
      posts.rs
      rescue.rs
      search.rs
      support.rs
      trust.rs
    services/
      auth.rs
      fraud.rs
      geo.rs
      notifications.rs
      rescue_fanout.rs
      rate_limit.rs
      trust.rs
  python-workers/
    app/
      main.py
    requirements.txt

Current Boundaries

This backend is production-shaped, but not yet fully production-complete.

Strong current surfaces:

Rust Axum API structure
mobile/backend contract alignment
auth/register contract
feed/post/search/ONG/support/donation routes
media upload-intent contract
chat HTTP and WebSocket route surface
geospatial rescue alert modeling
emergency coordinate validation
notification subscription and alert preview contracts
tests for the key frontend/backend paths

Known hardening gaps before real public scale:

remove any remaining public seed/mock fallback paths and prove critical endpoints are PostgreSQL-backed
move realtime/domain event delivery from plain NATS pub/sub to durable JetStream/Kafka consumers where replay is required
add immediate access-token invalidation for banned, deleted, or compromised accounts
deliver push notifications through FCM/APNs workers
enforce production rate limits and abuse controls
complete durable moderation and report review flows
publish measured benchmark reports for PostgreSQL, Redis, queue, upload, WebSocket, and push delivery paths
run and document API restart plus worker restart tests for chat, rescue sessions, push jobs, and Estou indo
add production observability dashboards and alerting

Production Intent

ZooHelp Hybrid Core is intended to become a global animal rescue coordination backend.

The strategic direction is narrow and operational:

fast rescue post creation
real geolocation
nearby helper discovery
trusted community coordination
NGO operational profiles
chat-based response
donation and support infrastructure
AI-assisted moderation and fraud prevention

The operating thesis is:

simple mobile action -> reliable backend coordination -> nearby human response -> measurable animal impact

Name		Name	Last commit message	Last commit date
Latest commit History 605 Commits
benchmarks		benchmarks
docs		docs
migrations		migrations
observability		observability
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

Helpin Hybrid Core

Objective

Product Positioning

Design Principles

Architecture Overview

Visual Architecture

Runtime Pipeline

Operational Rescue Alert Model

Core Backend Surface

Auth and Identity

Feed and Posts

Media

Chat

Geolocation

Rescue Coordination

Notifications

NGOs, Trust, Donations, Support, Search

Hybrid Intelligence Layer

Geospatial Decision Framework

Operational Evidence

Backend Test Surface

Mobile Contract Validation

Performance and Scaling Notes

Production Architecture Target

Security and Trust Model

Reliability Controls

Environment Variables

Running Locally

Repository Layout

Current Boundaries

Production Intent

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages