Jonathan Sánchez jsanchez-ds

🌐 English · Español

Hi, I'm Jonathan Sánchez

Senior Data Scientist · ML Engineer · LLM / AI Engineer

Industrial Engineer from Universidad de Chile (distinción máxima) currently at ClaroVTR as Efficiencies Engineer — building end-to-end forecasting systems with XGBoost / LightGBM / Prophet ensembles for enterprise clients (~$30M CLP/month identified savings).

Based in Santiago, Chile. Comfortable in Spanish and English.

🏆 Flagship trilogy (2026) — Classical ML · LLM/RAG · Real-time Agent

Three sibling projects designed as one coherent platform. Project 3 consumes Project 1's registered model via MLflow and Project 2's RAG endpoint as an agent tool — the code and the data flow are connected, not three unrelated demos.

#	Project	What it proves	Headline result
1	energy-forecasting-databricks ⚡	Classical ML + Databricks Unity Catalog · LightGBM vs LSTM vs Isolation Forest · SHAP · Medallion architecture	LightGBM MAPE 1.81% on 2 years of CAISO data; local vs Databricks ≡ 1.81 % vs 1.83 % (reproducible)
2	energyscholar-rag 📚	LLM engineering · provider-agnostic (Groq/Claude/OpenAI/OpenRouter) · hybrid BM25+vector retrieval · cross-encoder rerank · RAGAS gated on CI	RAGAS context_precision 0.81, answer_relevancy 0.996 on arXiv energy papers
3	gridpulse-realtime-agent 🚨	Real-time streaming · custom LLM agent with OpenAI-compatible function calling · integrates Projects 1 + 2 as tools	Agent composed a fully grounded incident report (no hallucinated figures) and posted to Discord HTTP 204 ✓ after 5 tool calls

The story these three tell together: detect an anomaly in streaming telemetry → ask the classical forecaster what the expected value was → ask the RAG what the literature says → compose an incident report grounded in tool outputs → alert on-call. That's the system, not three isolated repos.

1️⃣ energy-forecasting-databricks ⚡

End-to-end MLOps pipeline for electricity-demand forecasting and anomaly detection. Ingest from EIA / ENTSO-E, land in a Medallion Delta Lake (Bronze / Silver / Gold), train three model families, promote to Unity Catalog Model Registry with the @staging alias, serve via FastAPI with a Streamlit dashboard on top.

Dataset: 17,854 hourly observations of California grid demand (CAISO, 2 years)
Winner: LightGBM MAPE 1.81 % · RMSE 700 MW · MAE 533 MW
Runner-up: PyTorch LSTM (168 h window) MAPE 2.84 %
Databricks Free Edition run produced MAPE 1.83 % — pipeline is portable, not environment-coupled
SHAP artefacts, Unity Catalog volume + registry, model signatures, drift monitoring via Evidently

Python PySpark LightGBM PyTorch Scikit-learn MLflow Delta Lake Databricks FastAPI Streamlit SHAP Evidently

2️⃣ energyscholar-rag 📚

Production-grade RAG over arXiv energy-forecasting papers. One code path runs against Groq / Anthropic / OpenAI / OpenRouter — switching provider is a .env one-liner. Hybrid retrieval (dense Qdrant + BM25 + Reciprocal Rank Fusion) then cross-encoder rerank, Claude-style strict-citation system prompt, RAGAS-gated evaluation on every PR.

Corpus: 17 real arXiv papers → 386 chunks in embedded Qdrant
Generator tested: llama-3.3-70b-versatile (Groq) + nvidia/nemotron-3-super-120b-a12b:free (OpenRouter)
Answers cite real pages: e.g. for "How does temperature affect day-ahead load forecasts?" the LLM pulled [2302.12168v2 pp. 3, 6, 7, 13, 18] — zero hallucinated citations
RAGAS context_precision 0.81 · answer_relevancy 0.996 on the golden set

Python Qdrant sentence-transformers cross-encoder RAGAS Langfuse FastAPI Streamlit

3️⃣ gridpulse-realtime-agent 🚨

Streaming + LLM agent layer that glues the first two projects into a live operational loop. Pluggable transport (Delta append stream by default, Kafka / Redpanda with one env var). PySpark Structured Streaming scores each micro-batch with the Project-1 anomaly detector from MLflow. Anomalies trigger a custom agent loop (not a framework) with function-calling tools.

5 tools wired: classify_severity, get_current_load, get_24h_forecast (→ Project 1), search_literature (→ Project 2), post_incident_report (→ Discord + Delta)
First live run (Nemotron-120B): 5 iterations, 4 tool calls, 103 s wall-clock, HTTP 204 to Discord ✓
The LLM-composed report cited the real 24-h forecast range (29,489–31,507 MW), the observed 58,000 MW, the 93 % deviation and the anomaly-score threshold — every number came from a tool call
Guardrails: max-iterations, token budget, tool timeouts; Langfuse tracing

Python PySpark Structured Streaming Delta Lake Kafka / Redpanda MLflow OpenAI-compatible function calling Groq OpenRouter Anthropic FastAPI Streamlit Discord webhooks

📊 Earlier portfolio

Project	Stack	Live
Telecom Quota Forecasting	Python, XGBoost, LightGBM, Prophet
Bank Marketing Analysis	PySpark, scikit-learn, XGBoost
Credit Choice Experiment	R, mlogit, caret
Medical Diagnosis Classification	Python, scikit-learn, imbalanced-learn
Gender Income Gap	R, fixest, glmnet, caret

Short description of each

Telecom Quota Forecasting

End-to-end quantile ML pipeline for monthly data-quota forecasting and per-subscriber plan optimization — a portfolio reproduction of a production system I built at work. Quantile XGBoost targeting P90 + LightGBM with custom asymmetric loss (penalizes under-prediction 1.5×), DTW shape clustering, tier-based ensemble, and a pricing optimizer with property-based tests. ~93% P90 coverage on validation.

Bank Marketing Campaign Analysis

Analysis of 45k calls from a Portuguese bank to predict term-deposit subscription. Random Forest achieves ROC-AUC 0.7959 (with duration excluded to avoid leakage). Key business insight: previously-contacted clients convert at 63.8% vs 9.3% — 7× more likely. Includes a v2 branch that diagnoses and fixes a SMOTE-in-CV leakage bug via imblearn.Pipeline. Interactive Streamlit demo scored any client profile in real time.

Credit Choice Experiment

Discrete-choice analysis of how visual salience of credit terms in digital ads affects consumer decisions. Randomized experiment with 4 ad-design conditions. Conditional logit + mixed logit (mlogit) with unobserved heterogeneity via random coefficients. Simple logits show no treatment effect, but the mixed logit reveals a significant T3 effect once heterogeneity is allowed.

Medical Diagnosis Classification

Binary classification on the Wisconsin Breast Cancer dataset (569 records, 30 features) to detect malignant tumors. SVM achieves 97.6% accuracy, AUC 0.99 with GridSearchCV + 5-fold CV. Class-imbalance handling (under vs over-sampling comparison), feature selection via correlation (30 → 16 features).

Gender Income Gap in Small Commerce

Quantifying the gender income gap among ~5,000 small merchants in Latin America using transactional data. Fixed-effects regression with progressive controls (hours, category, zone, age), Ridge / LASSO, CART / MARS / KNN / Random Forest. Raw gap **~ 20.7%**, partially mediated by hours and category — but a meaningful hourly-productivity gap persists.

🧰 Tech Stack

Languages — Python R SQL Classical ML / Stats — scikit-learn XGBoost LightGBM Statsmodels mlogit fixest glmnet Deep Learning — PyTorch sentence-transformers cross-encoders LLM / Agents — Anthropic SDK OpenAI SDK (compatible) Groq OpenRouter function calling RAGAS Langfuse Qdrant Data Platforms — PySpark Spark Structured Streaming Databricks Unity Catalog MLflow Delta Lake Kafka/Redpanda Serving / UX — FastAPI Streamlit Plotly Prometheus Docker Viz — Matplotlib Seaborn ggplot2 Plotly Ops — Git GitHub Actions pytest ruff black

🎓 About me

Universidad de Chile — Industrial Engineering (distinción máxima)
Areas of interest: forecasting, discrete-choice / causal inference, production ML, LLM engineering
Based in Santiago, Chile · open to remote roles internationally

Provide feedback

Saved searches

Use saved searches to filter your results more quickly