Skip to content

robertlangdonn/robertlangdonn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

Prasad Khake

I make LLMs run well on real, constrained hardware — on-device, edge, Apple Silicon — and build the products around them.

The recurring question in my work: what actually limits LLM inference on a machine you own, and how do those limits change as you scale? Background: hardware (e-paper boards shipped to 20+ countries), Rust systems tooling, and AI-native full-stack development.

On-device / inference (Apple's MLX)

  • Contributor to mlx-lm. Merged: #1349 — enables text-mode loading of Gemma 4 (gemma4_unified) checkpoints on MLX.
  • #1329 (approved) — root-caused why Mistral/Devstral (tekken-v13) models emit Ġ instead of spaces on Apple Silicon, and fixed the detokenizer routing. The writeup.
  • First merged contribution to vLLM on Apple Silicon (#382).

On Device — measuring the bottlenecks

Other work

  • paperd.ink — open-source ESP32 e-paper dev board, in makers' hands across 20+ countries.
  • vcfkit — genomics CLI in Rust; 4× faster than bcftools, single static binary.
  • Hacker Newspaper — comments-first mobile Hacker News reader.

Writing about on-device LLMs at prasadkhake.com · On Device.

📫 prasadkhake@gmail.com · LinkedIn

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors