Oladri Renuka oladri-renuka

Renuka Oladri

2027 New Grad · MS Applied Machine Learning, University of Maryland College Park (CMNS Science Academy)
LLM inference systems · Mechanistic interpretability · Production ML infrastructure · Agentic AI

Experience

AI Data & Analytics Intern, HARMAN International (Samsung) · Dec 2024 - Jun 2025 Multi-agent TypeScript testing system: 93% code coverage vs GitHub Copilot's 70-80%. Found 5 Redis reliability gaps before production. Designed 10+ unit test cases validating memory, tool-use, and prompt components.

Research Assistant, Woxsen University · Aug 2022 - Dec 2024 Led 4-person team on Battery Management System ML (Random Forest, 90-95% accuracy). Co-authored 5 peer-reviewed publications. Translated ML outputs into technical documentation for non-specialist faculty.

Junior Data Analyst Intern, SeriGreen Technologies · Feb 2024 - Jul 2024 Transformed Karnataka cocoon market datasets (10K-100K records) using SQL and Python. Built the SeriGreen Farm Management Web Application (MERN stack). Findings presented directly to founders.

Research Intern, AppsTek Corp · Feb 2023 - Jul 2023 Built multimodal sentiment classification combining video frames, audio, and transcripts via deep learning. 90%+ accuracy across 3 labels. Demoed to AI team.

Research Highlights

Can you predict a reasoning model will fail before it visibly fails? A linear probe on DeepSeek-R1 hidden states detects failure at 150 tokens with AUC 0.612 vs 0.445 behavioral baseline (p=0.001). The signal emerges at 100 tokens when surface features are anti-informative. early_detection

Do vision-language models fail differently across image domains? Yes, and the pattern is stark. LLaVA scores 0% on chart OCR while InternVL2 hits 71.1% on identical probes. 945 probes, chi-square p<0.0001. LLaVA has 88% yes-bias on adversarial existence questions. vlm-hallucination

Does more thinking tokens help reasoning models? Mostly no. On GSM8K and MATH-500, accuracy plateaus at 256 tokens. On AIME, bimodal split: 57% of problems converge at ~4,100 tokens (96.5% acc), 43% never converge even at 10,000 tokens (11.5% acc). token-efficiency-math-reasoning

Systems and Infrastructure

Project	What it does	Numbers
inference-server	Continuous batching + paged KV-cache for GPT-2 from scratch (no vLLM). 3 backends benchmarked. Static batching underperforms naive serial under mixed-length traffic.	2.91 req/s, 0 failures, SSE streaming
feature-store	Kafka ingestion with 15% reordering, 5% dupes, 10% late arrivals. 3-node Redis Cluster, hash-tag sharding, schema registry. Three-way consistency validation.	0 mismatches / 800 checks, p95 4.8ms, 9,300 req/s
adaptive_agent	LangGraph 6-node state graph routing to Haiku 4.5 or Sonnet 4 via OpenRouter. Input guard (regex + LLM injection detection). Output guard (hallucination, completeness, format).	98% routing accuracy, 28.2% cost reduction
recsys	SASRec on MovieLens-1M deployed on AWS EC2. Full-ranking eval exposes sampled metric inflation (6.23% vs 70.68% Hit@10). 93% popularity bias documented.	HR@10 78.49%, NDCG@10 58.11%, 8,366 req/s

Research and Evaluation

Project	What it found	Numbers
early_detection	Activation probing on DeepSeek-R1-Distill-Qwen-7B predicts reasoning failure before behavioral signals exist. 200 AIME problems.	AUC 0.612 vs 0.445 at 150 tokens (p=0.001)
vlm-hallucination	LLaVA-1.5-7B vs InternVL2-8B across 4 domains. 6-category failure taxonomy. Complete capability absence, not gradual degradation.	945 probes, chi-square p<0.0001
llm-post-training-pipeline	SFT, reward model, DPO on LLaMA-3.2-1B. Diagnosed TRL bug causing negative KL divergence across 8 failed PPO runs.	+9pp factual (p=0.030), -16.7pp format (p=0.0003)
knowledge-agent	Belief graph from documents with cross-entity contradiction detection. MCP server exposing 5 tools. 2 embedding calls per document regardless of size.	936 entities, 32 conflicts, 0 false positives
factuality-verification	Compared 3 fact-checking methods on 14,525 atomic facts. Calibration matters more than model choice. NLI threshold 0.50 to 0.10 improves F1 by +0.076.	F1 0.727, Precision 0.919

Low-Level Performance

Project	What it demonstrated	Numbers
cuda-attention-kernel	Naive vs tiled attention kernels on A100. Diagnosed why tiling underperforms theory: 40MB L2 cache masks benefit below seq_len=2048. Connects to Flash Attention design rationale.	515 GFLOPS/s tiled, ~145x over CPU
cpp-simd-quant	ARM NEON SIMD on Apple Silicon. Proves SIMD helps attention (11.1x) but not Black-Scholes (1.03x) because 89% of runtime is transcendental functions. Roofline analysis.	31.88 GFLOPS/s attention, 103.8M options/sec
sparse-factor-modeling	9 LASSO solvers from scratch. Walk-forward backtest, no look-ahead bias. Novel finding: FISTA degrades at high sparsity. KKT-based factor ranking.	Sharpe 5.061, Spearman rho=0.906

Agent Systems

Project	What it does	Numbers
code-memory-agent	Coding agent with persistent SQLite memory. SHA-256 staleness detection as non-bypassable gate. Indexes file purposes, symbols, cross-file dependencies.	42.9% fewer file reads, 19 decision-reuse events
mindmirror	Real-time interview coach analyzing eye contact, facial expressions, speech, vocal patterns every 2 seconds. MediaPipe + faster-whisper + LangGraph.	~1.2s full pipeline cycle, 6 behavioral states

Skills

Publications

Paper	Venue	Year
Stem Cell Reviews and Reports	Springer Nature	2025
Digital Forensics and Cybersecurity	Wiley-Scrivener	2024
Economic Perspectives	IGI Global	2024
YOLOv8 Traffic Sign Detection (80.64% acc)	IJSRA	2024
BERT Sentiment Analysis (F1 0.88)	J. Trends in CS	2024

Certifications & Awards


Udacity Agentic AI Nanodegree	Jan 2026
Oracle OCI 2025 Certified AI Foundations Associate	Dec 2025
National Hackathon Best Demonstration Award	Oriental Institute of Science & Technology, Bhopal 2023 (Team Leader)
Dean's List + Best Student for Research Inclination	Woxsen University

_{36 repositories · 5 publications · Python, C++, CUDA, TypeScript}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly