SF · 37.7749°N · 122.4194°W · STATUS: AVAILABLE · JARVIS · v4.7 · ONLINE
Senior engineer with 6+ years shipping production systems across fintech, media, research, and early-stage startups — at the intersection of LLMs, real-time voice, and distributed backends. I build agentic systems that survive contact with production, and I'm precise about what the numbers mean: I'd rather defend a smaller claim on a whiteboard than oversell a bigger one.
- 🛰️ Currently — AI Infrastructure Engineer @ Neurologyca · multi-agent orchestration & persistent-memory architectures
- 🔬 Building — geometric, low-LLM knowledge graphs · context-routed RAG · agentic recommendation systems
- 🎯 Focus — agentic systems · real-time voice AI · RAG & knowledge graphs · event-driven backends
- 📡 Reach me — nikhil.bindal@outlook.com · nikhilbindal.com
|
Geometric, low-LLM knowledge field over research papers — resolution & validation run in embedding/rule space, not an LLM call per step. HippoRAG Personalized PageRank retrieval
Hyperbolic (Poincaré) concept embeddings
|
Personal interest-graph & agentic recommendation engine modeling cross-platform signals into a typed graph. 5 agents behind one Guardian gate · hard $0.10/user/day cost cap Prompt-injection-resistant scorer (LLM never writes the verdict)
|
Turns any document into a knowledge graph, auto-routing it (append vs. create) to a context bucket via embedding similarity + LLM tie-breaker. LangGraph state machine · CrewAI extract→resolve→validate Per-bucket inferred ontology · GraphRAG · graceful degradation
|
73% ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░ AI latency cut · 6-agent DAG re-architecture (36s → 9.5s)
90% ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░ decision latency cut · parallelized underwriting (8.7s → 890ms)
8.4M ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░ daily requests · TOI+ platform (edge-offloaded origin)
~22K ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░ events/sec · partitioned Kafka event tier (load-tested)
~40% ▓▓▓▓▓▓▓▓░░░░░░░░░░░░ retrieval relevance lift · hybrid RAG + rank fusion
15+ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░░ specialized agents · production multi-agent orchestration
Numbers are designed to be defensible: the 22K is event-tier throughput (partitions × per-consumer rate) measured under load — distinct from the lower decision throughput — and disbursement is effectively-once via idempotency, not exactly-once across an external bank call.
AI / ML ▸ Python · FastAPI · LangGraph · CrewAI · LlamaIndex · RAG · embeddings · GPT-4o · Claude · Gemini
Voice & RT ▸ LiveKit · WebRTC · Deepgram · Cartesia · streaming STT/TTS · sub-200ms pipelines
Backend ▸ Node.js · Express · NestJS · TypeScript · Python · event-driven microservices
Data ▸ PostgreSQL (+pgvector) · Redis · Kafka · MongoDB · Neo4j · Qdrant
Infra ▸ Docker · Kubernetes · GCP Cloud Run · AWS · Terraform · Prometheus · Grafana
Frontend ▸ React · Next.js · TypeScript · TailwindCSS · D3.js · micro-frontends
Workflow ▸ Claude Code · Cursor · GitHub Copilot · AI-native engineering
01 · craftProduction-first — build like a 3am page is coming. Observability and idempotency designed in, not bolted on.
02 · visionAI as leverage — agentic systems compound human decisions, never replace judgment. Human-in-the-loop on every escape hatch.
03 · velocityShip, then polish — instrument production first, optimize on data, not on the design doc.
04 · judgmentStrong opinions, loose grip — argue hard with evidence, drop the position cleanly when better evidence arrives.
