Run Large Language Models on iOS devices with just one line of code
let response = try await EdgeLLM.chat("Hello, world!")Note: EdgeLLM is now fully functional! Supports multiple models including Qwen, Gemma, and Phi-3.
import EdgeLLM
// 1. Basic chat (uses default model)
let response = try await EdgeLLM.chat("Hello!")
print(response)
// 2. Choose a specific model
let response = try await EdgeLLM.chat("Hello!", model: .gemma2b)
// 3. Stream responses in real-time
for try await token in EdgeLLM.stream("Tell me a joke") {
print(token, terminator: "")
}
// 4. System prompt
let response = try await EdgeLLM.chat(
"Translate 'hello' to French",
systemPrompt: "You are a professional translator."
)
>
Fictional penguin article summarised offline.
- Dead Simple - Chat with LLMs in one line
- iOS Optimized - Metal GPU acceleration for blazing speed
- Privacy First - Everything runs on-device
- Easy Install - Swift Package Manager ready
- Streaming Support - Real-time token-by-token responses
- System Prompts - Set context and personality
- Multi-Turn Conversations - Full conversation history support
- SwiftUI Ready - Observable wrapper for SwiftUI apps
- Model Management - Download, cache, and manage models
- Custom Models - Bring your own MLC-compiled models
In Xcode:
- File -> Add Package Dependencies
- Enter URL:
https://github.com/john-rocky/EdgeLLM - Select version and click "Add Package"
Or add to your Package.swift:
dependencies: [
.package(url: "https://github.com/john-rocky/EdgeLLM", from: "1.0.0")
]| Model | Enum | Size | Best For |
|---|---|---|---|
| Qwen 0.6B | .qwen06b |
~1.2 GB | Fastest responses, low-end devices |
| Gemma 2B | .gemma2b |
~2.5 GB | Balanced speed & quality |
| Phi-3.5 Mini | .phi3_mini |
~3.8 GB | Best quality, reasoning tasks |
Models are automatically downloaded on first use (WiFi recommended).
import EdgeLLM
// Chat in one line!
let response = try await EdgeLLM.chat("What's the weather like?")
print(response)// Receive response token by token
for try await token in EdgeLLM.stream("Tell me a story") {
print(token, terminator: "")
}// Set model behavior with a system prompt
let response = try await EdgeLLM.chat(
"Explain recursion",
systemPrompt: "You are a patient teacher. Use simple analogies."
)
// Or via Options for reuse
let options = EdgeLLM.Options(systemPrompt: "You are a pirate. Respond in pirate speak.")
let llm = try await EdgeLLM(model: .gemma2b, options: options)
let response = try await llm.chat("What is the meaning of life?")let llm = try await EdgeLLM(model: .gemma2b)
let conversation = Conversation(systemPrompt: "You are a helpful assistant.")
// Full conversation history is maintained
let r1 = try await llm.chat("My name is Alice", in: conversation)
let r2 = try await llm.chat("What's my name?", in: conversation)
// r2 correctly answers "Alice"
// Stream within a conversation
for try await token in llm.stream("Tell me more", in: conversation) {
print(token, terminator: "")
}
// View history
print(conversation.messages.count)
// Reset conversation (keeps system prompt)
conversation.clear()let options = EdgeLLM.Options(
temperature: 0.3, // More deterministic (default: 0.7)
maxTokens: 500, // Limit response length (default: 2048)
topP: 0.9, // Nucleus sampling (default: 0.95)
systemPrompt: "Be concise.",
frequencyPenalty: 0.5, // Reduce repetition
presencePenalty: 0.3, // Encourage topic diversity
stopSequences: ["\n\n"] // Stop at double newline
)
let response = try await EdgeLLM.chat("Summarize quantum computing", options: options)import SwiftUI
import EdgeLLM
struct ChatView: View {
@StateObject private var chat = EdgeLLMChat(
model: .gemma2b,
systemPrompt: "You are a helpful assistant."
)
@State private var input = ""
var body: some View {
VStack {
// Download progress
if chat.isDownloading {
ProgressView("Downloading model...")
}
// Chat messages
ScrollView {
ForEach(chat.messages) { msg in
HStack {
if msg.role == .user { Spacer() }
Text(msg.content)
.padding(8)
.background(msg.role == .user ? Color.blue : Color.gray.opacity(0.2))
.cornerRadius(12)
if msg.role == .assistant { Spacer() }
}
}
}
// Performance metrics
if chat.isGenerating {
Text("\(String(format: "%.1f", chat.tokensPerSecond)) tok/s")
.font(.caption)
}
// Input
HStack {
TextField("Message", text: $input)
Button("Send") {
let text = input
input = ""
Task { await chat.send(text) }
}
.disabled(chat.isGenerating || !chat.isModelLoaded)
}
}
.task { await chat.load() }
}
}// Check if a model is already downloaded
if EdgeLLM.isDownloaded(.gemma2b) {
print("Ready to use!")
}
// Pre-download a model
try await EdgeLLM.download(.phi3_mini) { progress in
print(progress.statusDescription) // "Downloading params_shard_2.bin (3/8)"
}
// List downloaded models
let models = EdgeLLM.downloadedModels()
for model in models {
print("\(model.model.displayName): \(model.sizeBytes / 1_000_000) MB")
}
// Check available space
let freeSpace = EdgeLLM.availableDiskSpace()
let cacheSize = EdgeLLM.totalCacheSize()
// Delete a model
try EdgeLLM.deleteModel(.phi3_mini)Bring your own MLC-compiled models when using a custom MLCRuntime binary:
let config = EdgeLLM.ModelConfig(
name: "My Custom Model",
modelLib: "custom_model_lib_hash",
huggingFaceURL: "https://huggingface.co/your-org/your-model-MLC",
approximateSize: 2_000_000_000
)
let llm = try await EdgeLLM(config: config)
let response = try await llm.chat("Hello!")Basic chat interface in Examples/SimpleChat:
cd Examples/SimpleChat
open SimpleChat.xcodeprojAdvanced demo with real-time streaming and performance monitoring:
cd Examples/StreamingChat
open StreamingChat.xcodeprojFeatures:
- Real-time token streaming
- Live performance metrics (tokens/sec, latency)
- Model comparison (Qwen3, Gemma, Phi-3.5)
- iOS 14.0+ / macOS 14.0+ / visionOS 1.0+
- Xcode 15.0+
- 4GB+ free storage for models
- Recommended: iPhone 12 or newer (Neural Engine support)
On iPhone 15 Pro:
- Initial load: 2-3 seconds
- Token generation: 10-30 tokens/sec (model dependent)
- Memory usage: 1-4GB depending on model
Models are downloaded automatically on first run (WiFi recommended).
Try a smaller model like .qwen06b:
let response = try await EdgeLLM.chat("Hello", model: .qwen06b)// Check and clean up
let cacheSize = EdgeLLM.totalCacheSize()
try EdgeLLM.deleteAllModels()Apache 2.0 License
Pull requests are welcome!
- Clone the repository
- Set up git hooks to prevent large files:
git config core.hooksPath .githooks
- Never commit binary files (
.xcframework,.zip,.mlmodel, etc.) - Maximum file size: 10MB
- Large files should be uploaded to GitHub Releases
- The pre-commit hook will block commits with large files
EdgeLLM is built on top of the MLC-LLM project.
