A polyglot microservices e-commerce application with end-to-end observability — distributed tracing, metrics, and log aggregation — powered by OpenTelemetry, AWS ADOT, and a full cloud-native stack.
Try it →
helm repo add observeflow https://vanshshah174.github.io/ObserveFlow && helm install demo observeflow/observeflow
ObserveFlow deploys 7 microservices written in 4 languages (Node.js, Python, Go, Java) as a functional e-commerce app. The real focus is the observability layer: OpenTelemetry auto-instrumentation injects tracing into every service with zero code changes, an OTel Collector pipelines telemetry to Jaeger, Prometheus, and Loki for local clusters — or to AWS ADOT → X-Ray, Amazon Managed Prometheus (AMP), and CloudWatch Logs on EKS. The entire infrastructure (VPC, EKS, ECR, Karpenter, IAM, KMS) is provisioned via Terraform, with CI/CD through GitHub Actions (OIDC → ECR) and GitOps via ArgoCD.
┌─────────────────────────────────────────────────────────────────────────┐
│ KUBERNETES CLUSTER │
│ │
│ ┌──────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Frontend │───▶│Product Service│ │ User Service │ │
│ │ (React) │ │ (Node.js) │ │ (Python) │ │
│ └────┬─────┘ └──────────────┘ └──────────────┘ │
│ │ │
│ ├──────────┐ │
│ ▼ ▼ │
│ ┌──────────┐ ┌──────────────┐ ┌───────────────────┐ │
│ │Cart Svc │◀─│ Order Service │─▶│Notification Service│ │
│ │(Node.js) │ │ (Node.js) │ │ (Go) │ │
│ └──────────┘ └──────┬───────┘ └───────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │Inventory Service│ │
│ │ (Java) │ │
│ └─────────────────┘ │
│ │
│ ═══════════════════ OBSERVABILITY LAYER ═══════════════════════════ │
│ │
│ ┌────────────────────┐ │
│ │ OTel Operator │ ← Auto-injects instrumentation into pods │
│ │ (Instrumentation) │ │
│ └────────┬───────────┘ │
│ ▼ │
│ ┌────────────────────┐ ┌──────────┐ ┌──────┐ ┌──────┐ │
│ │ OTel Collector │───▶│ Jaeger │ │ Prom │ │ Loki │ │
│ │ (traces/metrics/ │───▶│ (traces) │ │(met.)│ │(logs)│ │
│ │ logs pipeline) │───▶│ │ │ │ │ │ │
│ └────────────────────┘ └──────────┘ └──┬───┘ └──┬───┘ │
│ │ │ │
│ ┌─────▼─────────▼────┐ │
│ │ Grafana │ │
│ │ (unified UI) │ │
│ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
- Kubernetes cluster (Kind, Minikube, EKS, or any k8s)
- Helm v3
- kubectl
bash scripts/setup.sh my-namespaceThis script installs everything in order:
- cert-manager (TLS for OTel Operator webhooks)
- OpenTelemetry Operator (auto-instrumentation injection)
- ObserveFlow chart (7 services + Prometheus + Grafana + Loki + Jaeger)
- Triggers pod restart to inject instrumentation
# Add the ObserveFlow Helm repository
helm repo add observeflow https://vanshshah174.github.io/ObserveFlow
helm repo update
# Install prerequisites
helm install cert-manager jetstack/cert-manager \
-n cert-manager --create-namespace \
--set crds.enabled=true --wait
helm install otel-operator open-telemetry/opentelemetry-operator \
-n otel-system --create-namespace \
--set "manager.collectorImage.repository=otel/opentelemetry-collector-contrib" \
--wait
# Install ObserveFlow
helm install demo observeflow/observeflow \
-n observeflow --create-namespace
# Restart pods to inject auto-instrumentation
kubectl rollout restart deployment -n observeflow# Frontend (e-commerce app)
kubectl port-forward svc/frontend-service 3000:3000 -n observeflow
# Grafana (metrics + logs)
kubectl port-forward svc/demo-grafana 3001:80 -n observeflow
# → http://localhost:3001 (admin / admin)
# Jaeger (distributed traces)
kubectl port-forward svc/demo-jaeger-query 16686:16686 -n observeflow
# → http://localhost:16686bash scripts/generate-load.sh 60📖 New here? Check out the Complete Setup Guide for detailed step-by-step instructions with explanations.
ObserveFlow supports two deployment modes with different observability backends:
| Local Mode (Kind/Minikube) | AWS Mode (EKS) | |
|---|---|---|
| Traces | Jaeger (in-cluster) | AWS X-Ray |
| Metrics | Prometheus + Grafana (in-cluster) | Amazon Managed Prometheus (AMP) |
| Logs | Loki + Grafana (in-cluster) | CloudWatch Logs (KMS-encrypted) |
| Collector | OTel Collector | ADOT Collector (DaemonSet) |
| Auth | — | Pod Identity (no keys) |
| Autoscaling | — | Karpenter (spot instances) |
Switch between modes with a single flag:
# values.yaml
observability:
aws:
enabled: true # ← switches to ADOT → X-Ray/AMP/CloudWatch| Concept | Implementation |
|---|---|
| Distributed Tracing | Auto-instrumented traces across 7 services → OTel Collector → Jaeger / X-Ray |
| Metrics Collection | Custom business metrics (Counter, Gauge, Histogram) → Prometheus / AMP |
| Log Aggregation | Structured JSON logs with trace correlation → Loki / CloudWatch Logs |
| Trace-Log Correlation | Every log line includes traceId + spanId for direct jump from log → trace |
| Auto-Instrumentation | Zero-code OTel injection via Kubernetes Operator (no SDK changes needed) |
| Dual-Mode Observability | Same app, two backends: local (Jaeger/Prom/Loki) or AWS (X-Ray/AMP/CloudWatch) |
| Helm-based Deployment | One helm install deploys everything: 7 services + full observability stack |
| Service | Language | Port | Responsibility |
|---|---|---|---|
| frontend-service | React + Nginx | 3000 | E-commerce UI with product browsing, cart, and checkout |
| product-service | Node.js (Express) | 4000 | Product catalog (10 items with images, pricing) |
| cart-service | Node.js (Express) | 4001 | Shopping cart CRUD + custom OTel metrics |
| order-service | Node.js (Express) | 4002 | Order creation (calls cart → inventory → notification) |
| user-service | Python (Flask) | 4003 | User profiles and membership tiers |
| notification-service | Go (net/http) | 4004 | Event notifications with W3C traceparent parsing |
| inventory-service | Java (Spring Boot) | 4005 | Stock management with reserve/release/restock |
Every HTTP request generates a distributed trace that spans multiple services:
Frontend → Product Service → (user browses)
Frontend → Cart Service → (adds item)
Frontend → Order Service → Cart Service → Inventory Service → Notification Service (checkout)
Each trace shows:
- Full request lifecycle across services
- Individual span durations (middleware, handlers, downstream calls)
- Service metadata (pod name, namespace, container)
Custom business metrics (cart-service):
cart_items_added_total → Counter: total items added to carts
cart_items_current → UpDownCounter: items currently in active carts
cart_operation_duration_seconds → Histogram: p50/p95/p99 of cart operations
PromQL examples:
# Request rate per service
sum(rate(observeflow_http_server_duration_milliseconds_count[5m])) by (job)
# Cart items added by product
sum(observeflow_cart_items_added_total) by (product_id)
# P95 latency
histogram_quantile(0.95, sum(rate(observeflow_http_server_duration_milliseconds_bucket[5m])) by (le, job))
All services emit structured JSON logs with trace correlation:
{
"timestamp": "2026-06-04T12:00:01.234Z",
"service": "cart-service",
"method": "POST",
"path": "/cart/user-1/items",
"statusCode": 201,
"durationMs": 5,
"traceId": "2ebad82367f07b870f4c3a8c44ff8058",
"spanId": "0234e0930ce83e94"
}LogQL examples:
# All cart service logs (excluding health checks)
{k8s_container_name="cart-service"} | json | path!="/health"
# Filter by trace ID (jump from Jaeger → Loki)
{k8s_container_name=~".+"} | json | traceId="2ebad82367f07b870f4c3a8c44ff8058"
The key power of this setup is correlation across all three signals:
- See a spike in Grafana (metric) → click to see which traces were slow
- Open a trace in Jaeger → copy the
traceId→ search in Loki logs - Find an error in logs (Loki) → use
traceIdto see the full distributed trace
This is possible because:
- Auto-instrumentation injects trace context into every request
- Each service's logging middleware captures
traceId+spanId - OTel Collector routes all signals through a single pipeline
Services → OTLP HTTP/protobuf → OTel Collector → Jaeger (traces) + Prometheus (metrics) + Loki (logs)
Services → OTLP → ADOT DaemonSet → X-Ray (traces) + AMP (metrics) + CloudWatch (logs)
Auth: Pod Identity → IAM Role (zero credentials in code)
The entire AWS stack is provisioned via Terraform (75 resources):
🏗️ AWS Architecture Diagram
┌─────────────────────────────────────────────────────────────────────────┐
│ AWS ACCOUNT │
│ │
│ ┌────────────────── VPC (10.0.0.0/16) ───────────────────────────┐ │
│ │ │ │
│ │ ┌─── Public Subnets ───┐ ┌─── Private Subnets ───┐ │ │
│ │ │ • NAT Gateway │ │ • EKS Node Group │ │ │
│ │ │ • Internet Gateway │ │ • Karpenter Nodes │ │ │
│ │ │ • Load Balancers │ │ • Application Pods │ │ │
│ │ └──────────────────────┘ └────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──── EKS Cluster ─────────────────────────────────────────────────┐ │
│ │ K8s v1.35 │ API + Config Map auth │ Pod Identity Agent │ │
│ │ Addons: VPC-CNI, CoreDNS, kube-proxy, ADOT, Pod Identity │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──── ECR ─────────┐ ┌── Observability ────────────────────────┐ │
│ │ 7 repositories │ │ • Amazon Managed Prometheus (AMP) │ │
│ │ Immutable tags │ │ • CloudWatch Logs (365d retention) │ │
│ │ Scan on push │ │ • AWS X-Ray (distributed tracing) │ │
│ │ Auto-cleanup │ │ • KMS encryption for logs │ │
│ └───────────────────┘ └────────────────────────────────────────┘ │
│ │
│ ┌──── Karpenter ────────────────────────────────────────────────┐ │
│ │ • Pod Identity for EC2 management │ │
│ │ • Spot interruption handling (SQS + EventBridge) │ │
│ │ • Auto-discovery via subnet/SG tags │ │
│ └────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
| File | What it provisions |
|---|---|
vpc.tf |
VPC, 2 public + 2 private subnets, IGW, NAT, route tables |
eks.tf |
EKS cluster, managed node group, addons (VPC-CNI, CoreDNS, Pod Identity) |
ecr.tf |
7 ECR repos with immutable tags, scan-on-push, lifecycle cleanup |
observability.tf |
AMP workspace, CloudWatch log group (KMS), ADOT IAM role + Pod Identity |
karpenter.tf |
Controller IAM role, node role + instance profile, SQS interruption queue, EventBridge rules |
github-oidc.tf |
OIDC provider, IAM role for GitHub Actions (ECR push) |
cd terraform
terraform init && terraform apply
aws eks update-kubeconfig --name observeflow-eks --region us-east-1Detect Changes → Lint (ESLint) → SCA (npm audit) → Hadolint → Build + Trivy → Push (ECR + Docker Hub) → Update Helm
- Dynamic service detection — only builds changed services
- Trivy CVE scan (blocks CRITICAL/HIGH)
- Multi-registry push via GitHub OIDC (no stored credentials)
- Auto-updates
values-dev.yaml→ triggers ArgoCD
ArgoCD watches values-dev.yaml — when CI updates an image tag, ArgoCD auto-deploys with prune + self-heal enabled.
| Layer | Technology |
|---|---|
| Frontend | React 18 + Vite + React Router |
| Backend | Node.js, Python (Flask), Go, Java (Spring Boot) |
| Reverse Proxy | Nginx |
| Orchestration | Kubernetes (EKS / Kind) + Helm v3 |
| Tracing | OpenTelemetry → Jaeger (local) / AWS X-Ray (prod) |
| Metrics | Prometheus + Grafana (local) / Amazon Managed Prometheus (prod) |
| Logs | Loki + Grafana (local) / CloudWatch Logs (prod) |
| Collector | OpenTelemetry Collector (local) / ADOT Collector (AWS) |
| Auto-Instrumentation | OpenTelemetry Operator (K8s CRD) |
| Infrastructure | Terraform (VPC, EKS, ECR, IAM, KMS, AMP, Karpenter) |
| Autoscaling | Karpenter (spot instances + interruption handling) |
| Container Registry | ECR (AWS) + Docker Hub (public multi-arch) |
| CI/CD | GitHub Actions (OIDC → ECR) + ArgoCD (GitOps) |
| Security | Pod Identity, KMS encryption, Trivy, Hadolint, OIDC |
📂 Project Structure
ObserveFlow/
├── src/
│ ├── frontend-service/ # React + Vite + Nginx reverse proxy
│ ├── product-service/ # Node.js — product catalog
│ ├── cart-service/ # Node.js — cart + custom OTel metrics
│ ├── order-service/ # Node.js — order processing (multi-service calls)
│ ├── user-service/ # Python Flask — user profiles
│ ├── notification-service/ # Go — event notifications
│ └── inventory-service/ # Java Spring Boot — stock management
├── helm/observeflow/ # Helm chart (single installable package)
├── terraform/ # AWS infrastructure (EKS, ECR, AMP, Karpenter)
├── argocd/ # GitOps manifest
├── scripts/ # setup.sh, generate-load.sh, port-forward.sh
├── .github/workflows/ # CI + Helm release pipelines
└── docs/ # Architecture decisions and journey docs
⚙️ Helm Configuration Options
# Enable/disable the observability stack
observability:
enabled: true # Set false to deploy only microservices
# Per-service image override
cartService:
image:
repository: vanshshah17/observeflow-cart-service
tag: "latest"
replicas: 1
# AWS mode (for EKS with ADOT)
observability:
aws:
enabled: true
region: "us-east-1"
ampEndpoint: "https://aps-workspaces.us-east-1..."The chart is namespace-agnostic — install into any namespace and all internal service references resolve automatically.
All Docker images are built for both linux/amd64 and linux/arm64:
- ✅ Intel/AMD machines (Windows, Linux)
- ✅ Apple Silicon Macs (M1/M2/M3)
- ✅ AWS Graviton instances







