Software Engineer II | Specialized in AI Infrastructure & Retrieval Systems
I build the plumbing that makes LLMs production-ready. Currently at Cadence, I focus on architecting high-scale RAG (Retrieval-Augmented Generation) platforms, optimizing vector search performance, and bridging the gap between raw data and intelligent answers.
- AI & Search: RAG Orchestration, Vector Search (Qdrant), Multi-model Embeddings (OpenAI, Gemini, BGE), Reranking Logic.
- Backend Engineering: Java (Spring Boot), Node.js, Python, ONNX Runtime for server-side inference.
- Systems & Data: Microservices Architecture, API Design, SQL/NoSQL, High-latency Optimization.
- Latency Transformation: Re-engineered retrieval pipelines to drop query latency from 14s to <300ms.
- Cost Efficiency: Optimized chunking and context selection, reducing LLM token overhead by 35%.
- Force Multiplier: Automated documentation-to-tool drift detection, cutting manual verification time by 70%.
While I spend my days building enterprise AI at Cadence, this GitHub is my personal space for experimentation. You'll find a mix of:
- AI Experiments: Early-stage RAG wrappers and embedding benchmarks.
- Legacy Learning: My journey from student projects to SDE II.
- Open Source: Contributions and utility scripts that catch my interest.
- LinkedIn: linkedin.com/in/siddhantjan
- Focus Areas: LLM Evaluation, High-Performance Retrieval, Backend Scalability.
"Building systems that don't just work, but scale."