EdgeLLM

Run Large Language Models on iOS devices with just one line of code

let response = try await EdgeLLM.chat("Hello, world!")

Note: EdgeLLM is now fully functional! Supports multiple models including Qwen, Gemma, and Phi-3.

Quick Start

import EdgeLLM

// 1. Basic chat (uses default model)
let response = try await EdgeLLM.chat("Hello!")
print(response)

// 2. Choose a specific model
let response = try await EdgeLLM.chat("Hello!", model: .gemma2b)

// 3. Stream responses in real-time
for try await token in EdgeLLM.stream("Tell me a joke") {
    print(token, terminator: "")
}

// 4. System prompt
let response = try await EdgeLLM.chat(
    "Translate 'hello' to French",
    systemPrompt: "You are a professional translator."
)

>
_{Fictional penguin article summarised offline.}

Features

Dead Simple - Chat with LLMs in one line

iOS Optimized - Metal GPU acceleration for blazing speed

Privacy First - Everything runs on-device

Easy Install - Swift Package Manager ready

Streaming Support - Real-time token-by-token responses

System Prompts - Set context and personality

Multi-Turn Conversations - Full conversation history support

SwiftUI Ready - Observable wrapper for SwiftUI apps

Model Management - Download, cache, and manage models

Custom Models - Bring your own MLC-compiled models

Installation

Swift Package Manager

In Xcode:

File -> Add Package Dependencies

Enter URL: https://github.com/john-rocky/EdgeLLM

Select version and click "Add Package"

Or add to your Package.swift:

dependencies: [ .package(url: "https://github.com/john-rocky/EdgeLLM", from: "1.0.0") ]

Supported Models

Model Enum Size Best For

Qwen 0.6B .qwen06b ~1.2 GB Fastest responses, low-end devices

Gemma 2B .gemma2b ~2.5 GB Balanced speed & quality

Phi-3.5 Mini .phi3_mini ~3.8 GB Best quality, reasoning tasks

Models are automatically downloaded on first use (WiFi recommended).

Usage

Simplest Example

import EdgeLLM // Chat in one line! let response = try await EdgeLLM.chat("What's the weather like?") print(response)

Streaming Responses

// Receive response token by token for try await token in EdgeLLM.stream("Tell me a story") { print(token, terminator: "") }

System Prompts

// Set model behavior with a system prompt let response = try await EdgeLLM.chat( "Explain recursion", systemPrompt: "You are a patient teacher. Use simple analogies." ) // Or via Options for reuse let options = EdgeLLM.Options(systemPrompt: "You are a pirate. Respond in pirate speak.") let llm = try await EdgeLLM(model: .gemma2b, options: options) let response = try await llm.chat("What is the meaning of life?")

Multi-Turn Conversations

let llm = try await EdgeLLM(model: .gemma2b) let conversation = Conversation(systemPrompt: "You are a helpful assistant.") // Full conversation history is maintained let r1 = try await llm.chat("My name is Alice", in: conversation) let r2 = try await llm.chat("What's my name?", in: conversation) // r2 correctly answers "Alice" // Stream within a conversation for try await token in llm.stream("Tell me more", in: conversation) { print(token, terminator: "") } // View history print(conversation.messages.count) // Reset conversation (keeps system prompt) conversation.clear()

Generation Options

let options = EdgeLLM.Options( temperature: 0.3, // More deterministic (default: 0.7) maxTokens: 500, // Limit response length (default: 2048) topP: 0.9, // Nucleus sampling (default: 0.95) systemPrompt: "Be concise.", frequencyPenalty: 0.5, // Reduce repetition presencePenalty: 0.3, // Encourage topic diversity stopSequences: ["\n\n"] // Stop at double newline ) let response = try await EdgeLLM.chat("Summarize quantum computing", options: options)

SwiftUI Integration

import SwiftUI import EdgeLLM struct ChatView: View { @StateObject private var chat = EdgeLLMChat( model: .gemma2b, systemPrompt: "You are a helpful assistant." ) @State private var input = "" var body: some View { VStack { // Download progress if chat.isDownloading { ProgressView("Downloading model...") } // Chat messages ScrollView { ForEach(chat.messages) { msg in HStack { if msg.role == .user { Spacer() } Text(msg.content) .padding(8) .background(msg.role == .user ? Color.blue : Color.gray.opacity(0.2)) .cornerRadius(12) if msg.role == .assistant { Spacer() } } } } // Performance metrics if chat.isGenerating { Text("\(String(format: "%.1f", chat.tokensPerSecond)) tok/s") .font(.caption) } // Input HStack { TextField("Message", text: $input) Button("Send") { let text = input input = "" Task { await chat.send(text) } } .disabled(chat.isGenerating || !chat.isModelLoaded) } } .task { await chat.load() } } }

Model Management

// Check if a model is already downloaded if EdgeLLM.isDownloaded(.gemma2b) { print("Ready to use!") } // Pre-download a model try await EdgeLLM.download(.phi3_mini) { progress in print(progress.statusDescription) // "Downloading params_shard_2.bin (3/8)" } // List downloaded models let models = EdgeLLM.downloadedModels() for model in models { print("\(model.model.displayName): \(model.sizeBytes / 1_000_000) MB") } // Check available space let freeSpace = EdgeLLM.availableDiskSpace() let cacheSize = EdgeLLM.totalCacheSize() // Delete a model try EdgeLLM.deleteModel(.phi3_mini)

Custom Models

Bring your own MLC-compiled models when using a custom MLCRuntime binary:

let config = EdgeLLM.ModelConfig( name: "My Custom Model", modelLib: "custom_model_lib_hash", huggingFaceURL: "https://huggingface.co/your-org/your-model-MLC", approximateSize: 2_000_000_000 ) let llm = try await EdgeLLM(config: config) let response = try await llm.chat("Hello!")

Example Apps

Simple Chat

Basic chat interface in Examples/SimpleChat:

cd Examples/SimpleChat open SimpleChat.xcodeproj

Streaming Chat with Performance Metrics

Advanced demo with real-time streaming and performance monitoring:

cd Examples/StreamingChat open StreamingChat.xcodeproj

Features:

Real-time token streaming

Live performance metrics (tokens/sec, latency)

Model comparison (Qwen3, Gemma, Phi-3.5)

Requirements

iOS 14.0+ / macOS 14.0+ / visionOS 1.0+

Xcode 15.0+

4GB+ free storage for models

Recommended: iPhone 12 or newer (Neural Engine support)

Performance

On iPhone 15 Pro:

Initial load: 2-3 seconds

Token generation: 10-30 tokens/sec (model dependent)

Memory usage: 1-4GB depending on model

Troubleshooting

Model Not Found

Models are downloaded automatically on first run (WiFi recommended).

Out of Memory

Try a smaller model like .qwen06b:

let response = try await EdgeLLM.chat("Hello", model: .qwen06b)

Manage Disk Space

// Check and clean up let cacheSize = EdgeLLM.totalCacheSize() try EdgeLLM.deleteAllModels()

License

Apache 2.0 License

Contributing

Pull requests are welcome!

Development Setup

Clone the repository

Set up git hooks to prevent large files:
git config core.hooksPath .githooks

Important: Large Files Policy

Never commit binary files (.xcframework, .zip, .mlmodel, etc.)

Maximum file size: 10MB

Large files should be uploaded to GitHub Releases

The pre-commit hook will block commits with large files

Links

Example App

Report Issues

Credits

EdgeLLM is built on top of the MLC-LLM project.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.githooks		.githooks
.github/workflows		.github/workflows
Examples		Examples
Plugins/EdgeLLMSetupPlugin		Plugins/EdgeLLMSetupPlugin
Sources/EdgeLLM		Sources/EdgeLLM
.gitignore		.gitignore
EdgeLLM_Cleanup_Spec.md		EdgeLLM_Cleanup_Spec.md
LICENSE		LICENSE
Package.swift		Package.swift
README.md		README.md

Model	Enum	Size	Best For
Qwen 0.6B	`.qwen06b`	~1.2 GB	Fastest responses, low-end devices
Gemma 2B	`.gemma2b`	~2.5 GB	Balanced speed & quality
Phi-3.5 Mini	`.phi3_mini`	~3.8 GB	Best quality, reasoning tasks

Folders and files

Latest commit

History

Repository files navigation

EdgeLLM

EdgeLLM

Quick Start

Features

Installation

Swift Package Manager

Supported Models

Usage

Simplest Example

Streaming Responses

System Prompts

Multi-Turn Conversations

Generation Options

SwiftUI Integration

Model Management

Custom Models

Example Apps

Simple Chat

Streaming Chat with Performance Metrics

Requirements

Performance

Troubleshooting

Model Not Found

Out of Memory

Manage Disk Space

License

Contributing

Development Setup

Important: Large Files Policy

Links

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages