🌐 English · Español
Senior Data Scientist · ML Engineer · LLM / AI Engineer
Industrial Engineer from Universidad de Chile (distinción máxima) currently at ClaroVTR as Efficiencies Engineer — building end-to-end forecasting systems with XGBoost / LightGBM / Prophet ensembles for enterprise clients (~$30M CLP/month identified savings).
Based in Santiago, Chile. Comfortable in Spanish and English.
Three sibling projects designed as one coherent platform. Project 3 consumes Project 1's registered model via MLflow and Project 2's RAG endpoint as an agent tool — the code and the data flow are connected, not three unrelated demos.
| # | Project | What it proves | Headline result |
|---|---|---|---|
| 1 | energy-forecasting-databricks ⚡ | Classical ML + Databricks Unity Catalog · LightGBM vs LSTM vs Isolation Forest · SHAP · Medallion architecture | LightGBM MAPE 1.81% on 2 years of CAISO data; local vs Databricks ≡ 1.81 % vs 1.83 % (reproducible) |
| 2 | energyscholar-rag 📚 | LLM engineering · provider-agnostic (Groq/Claude/OpenAI/OpenRouter) · hybrid BM25+vector retrieval · cross-encoder rerank · RAGAS gated on CI | RAGAS context_precision 0.81, answer_relevancy 0.996 on arXiv energy papers |
| 3 | gridpulse-realtime-agent 🚨 | Real-time streaming · custom LLM agent with OpenAI-compatible function calling · integrates Projects 1 + 2 as tools | Agent composed a fully grounded incident report (no hallucinated figures) and posted to Discord HTTP 204 ✓ after 5 tool calls |
The story these three tell together: detect an anomaly in streaming telemetry → ask the classical forecaster what the expected value was → ask the RAG what the literature says → compose an incident report grounded in tool outputs → alert on-call. That's the system, not three isolated repos.
End-to-end MLOps pipeline for electricity-demand forecasting and anomaly detection. Ingest from EIA / ENTSO-E, land in a Medallion Delta Lake (Bronze / Silver / Gold), train three model families, promote to Unity Catalog Model Registry with the @staging alias, serve via FastAPI with a Streamlit dashboard on top.
- Dataset: 17,854 hourly observations of California grid demand (CAISO, 2 years)
- Winner: LightGBM MAPE 1.81 % · RMSE 700 MW · MAE 533 MW
- Runner-up: PyTorch LSTM (168 h window) MAPE 2.84 %
- Databricks Free Edition run produced MAPE 1.83 % — pipeline is portable, not environment-coupled
- SHAP artefacts, Unity Catalog volume + registry, model signatures, drift monitoring via Evidently
Python PySpark LightGBM PyTorch Scikit-learn MLflow Delta Lake Databricks FastAPI Streamlit SHAP Evidently
Production-grade RAG over arXiv energy-forecasting papers. One code path runs against Groq / Anthropic / OpenAI / OpenRouter — switching provider is a .env one-liner. Hybrid retrieval (dense Qdrant + BM25 + Reciprocal Rank Fusion) then cross-encoder rerank, Claude-style strict-citation system prompt, RAGAS-gated evaluation on every PR.
- Corpus: 17 real arXiv papers → 386 chunks in embedded Qdrant
- Generator tested:
llama-3.3-70b-versatile(Groq) +nvidia/nemotron-3-super-120b-a12b:free(OpenRouter) - Answers cite real pages: e.g. for "How does temperature affect day-ahead load forecasts?" the LLM pulled
[2302.12168v2 pp. 3, 6, 7, 13, 18]— zero hallucinated citations - RAGAS context_precision 0.81 · answer_relevancy 0.996 on the golden set
Python Qdrant sentence-transformers cross-encoder RAGAS Langfuse FastAPI Streamlit
Streaming + LLM agent layer that glues the first two projects into a live operational loop. Pluggable transport (Delta append stream by default, Kafka / Redpanda with one env var). PySpark Structured Streaming scores each micro-batch with the Project-1 anomaly detector from MLflow. Anomalies trigger a custom agent loop (not a framework) with function-calling tools.
- 5 tools wired:
classify_severity,get_current_load,get_24h_forecast(→ Project 1),search_literature(→ Project 2),post_incident_report(→ Discord + Delta) - First live run (Nemotron-120B): 5 iterations, 4 tool calls, 103 s wall-clock, HTTP 204 to Discord ✓
- The LLM-composed report cited the real 24-h forecast range (29,489–31,507 MW), the observed 58,000 MW, the 93 % deviation and the anomaly-score threshold — every number came from a tool call
- Guardrails: max-iterations, token budget, tool timeouts; Langfuse tracing
Python PySpark Structured Streaming Delta Lake Kafka / Redpanda MLflow OpenAI-compatible function calling Groq OpenRouter Anthropic FastAPI Streamlit Discord webhooks
| Project | Stack | Live |
|---|---|---|
| Telecom Quota Forecasting | Python, XGBoost, LightGBM, Prophet | |
| Bank Marketing Analysis | PySpark, scikit-learn, XGBoost | |
| Credit Choice Experiment | R, mlogit, caret | |
| Medical Diagnosis Classification | Python, scikit-learn, imbalanced-learn | |
| Gender Income Gap | R, fixest, glmnet, caret |
Short description of each
End-to-end quantile ML pipeline for monthly data-quota forecasting and per-subscriber plan optimization — a portfolio reproduction of a production system I built at work. Quantile XGBoost targeting P90 + LightGBM with custom asymmetric loss (penalizes under-prediction 1.5×), DTW shape clustering, tier-based ensemble, and a pricing optimizer with property-based tests. ~93% P90 coverage on validation.
Analysis of 45k calls from a Portuguese bank to predict term-deposit subscription. Random Forest achieves ROC-AUC 0.7959 (with duration excluded to avoid leakage). Key business insight: previously-contacted clients convert at 63.8% vs 9.3% — 7× more likely. Includes a v2 branch that diagnoses and fixes a SMOTE-in-CV leakage bug via imblearn.Pipeline. Interactive Streamlit demo scored any client profile in real time.
Discrete-choice analysis of how visual salience of credit terms in digital ads affects consumer decisions. Randomized experiment with 4 ad-design conditions. Conditional logit + mixed logit (mlogit) with unobserved heterogeneity via random coefficients. Simple logits show no treatment effect, but the mixed logit reveals a significant T3 effect once heterogeneity is allowed.
Binary classification on the Wisconsin Breast Cancer dataset (569 records, 30 features) to detect malignant tumors. SVM achieves 97.6% accuracy, AUC 0.99 with GridSearchCV + 5-fold CV. Class-imbalance handling (under vs over-sampling comparison), feature selection via correlation (30 → 16 features).
Quantifying the gender income gap among ~5,000 small merchants in Latin America using transactional data. Fixed-effects regression with progressive controls (hours, category, zone, age), Ridge / LASSO, CART / MARS / KNN / Random Forest. Raw gap **~ 20.7%**, partially mediated by hours and category — but a meaningful hourly-productivity gap persists.
Languages — Python R SQL
Classical ML / Stats — scikit-learn XGBoost LightGBM Statsmodels mlogit fixest glmnet
Deep Learning — PyTorch sentence-transformers cross-encoders
LLM / Agents — Anthropic SDK OpenAI SDK (compatible) Groq OpenRouter function calling RAGAS Langfuse Qdrant
Data Platforms — PySpark Spark Structured Streaming Databricks Unity Catalog MLflow Delta Lake Kafka/Redpanda
Serving / UX — FastAPI Streamlit Plotly Prometheus Docker
Viz — Matplotlib Seaborn ggplot2 Plotly
Ops — Git GitHub Actions pytest ruff black
- Universidad de Chile — Industrial Engineering (distinción máxima)
- Areas of interest: forecasting, discrete-choice / causal inference, production ML, LLM engineering
- Based in Santiago, Chile · open to remote roles internationally