LLM inference researcher specializing in speculative decoding, MTP/EAGLE architectures, and MoE routing optimization. Builds self-hosted AI infrastructure with focus on NVIDIA GPU constraints (Tesla P40 sm_61, RTX 3050) and embedded systems integration.
| Category | Technologies |
|---|---|
| LLM Inference | GGUF, EXL2, llama.cpp, Ollama, Speculative Decoding, EAGLE |
| GPU Optimization | Tesla P40, RTX 3050, CUDA sm_61, VRAM-constrained deployment |
| Research | Qwen3.6-27B, MoE Routing, MTP, EAGLE Algorithms |
| Embedded | Arduino R4 WiFi, ESP32, MQTT/Tailscale, HID |
- add-video-input-support-to-llamacpp-mtmd - Video input support for llama.cpp with webcam/file frame capture
- ai-gateway-in-prod-alternative-concrete-a-litellm - Alternative gateway solutions to LiteLLM for local AI stack
- automated-exl2-conversion-validation-pipeline - Automated EXL2 conversion and validation for Qwen3.5Moe architecture
- benchmark-4-agent-wrappers-on-qwen3627b-llamacpp - Comparative benchmark of 4 agent implementations on Qwen3.6-27B
- auto-quantization-pipeline-gguf - Automated GGUF quantization pipeline with hardware-aware optimization
- auto-vault-journal - Automated Obsidian vault updates via Claude Code session hooks
- nex2-mini-phase-twin-30b-lowvram-gguf-model - Low-VRAM GGUF model optimized for Tesla P40 constraints
- openclaw - Ollama gateway with Node.js backend for local AI agent access
- secure-llm-context-vault - Secure archive for LLM contexts and sensitive data
- ai-dashboard - Local AI monitoring dashboard with GPU metrics and security scanning
- megatool - OSINT toolkit with Flask web app and AI-powered photo analysis
- reddit-monitor - AI-focused subreddit monitoring with automated idea generation
- voice-dictate - Local Whisper turbo GPU dictation for Claude Code integration
- web-access-layer-per-agenti-ai-locali - Centralized web access layer for local AI agents
- ai-home-assistant-hid-dashboard - Arduino R4 WiFi + ESP32 hardware dashboard with MQTT/Tailscale integration
- controller-termico-proattivo-esp32 - ESP32-based proactive fan control with temperature sensors
- [digital-