DevOps / SRE Engineer building toward AI Infrastructure. Based in New York City, NY.
I run production infra for a social trading platform — 30+ EC2 nodes, 10+ microservices, EKS, Prometheus/Grafana/Loki, Vault, APISIX. Lately I've been going deep on GPU serving, LLM inference, and Kubernetes at scale because that's where the interesting problems are.
current_role: DevOps/MLOps @ Traderware Inc.
focus: LLM inference infra · GPU workload scheduling · EKS at scale
learning: CKA · KV-cache-aware autoscaling · vLLM internals
open_to: AI Infrastructure roles (Anthropic, CoreWeave, Together AI, Baseten, Replicate)Orchestration & Compute
EKS Kubernetes Helm Karpenter EC2 g5.xlarge (A10G) GPU Operator
LLM & GPU Infra
vLLM Mistral 7B Flash Attention v2 CUDA Graphs DCGM Exporter NVIDIA NCA-AIIO
Observability
Prometheus Grafana Loki Vector.dev Wazuh HIDS cAdvisor Node Exporter
Data & Storage
Qdrant QuestDB Redis S3 EBS
Security & Platform
HashiCorp Vault Keycloak APISIX cert-manager Semgrep Trivy
CI/CD & IaC
GitHub Actions Terraform ArgoCD Self-hosted runners OIDC federation
Standard HPA scales on CPU/memory — useless for LLM inference where KV-cache pressure is the real bottleneck. Building a custom autoscaler that scales vLLM replicas based on cache hit rate, GPU memory utilization, and queued requests. Cuts cold-start cost while protecting tail latency.
vLLM Custom Metrics API Prometheus Adapter EKS
Priority-based job queueing + cost-aware scheduling for mixed inference/training workloads on a shared GPU fleet. Spot-aware fallback. Per-team quota enforcement.
Kubernetes Karpenter DCGM Custom Scheduler
End-to-end: vLLM serving Mistral 7B → RAGAS evaluation → MLflow tracking → ArgoCD GitOps. Models ship the same way services do.
vLLM RAGAS MLflow ArgoCD EKS
Repos go live as I finish each milestone. Pinned below as they ship.
Always down to talk GPU schedulers, inference serving, or why your Prometheus is OOMing again.
