A Rust-based CLI tool for AI-assisted coding using local LLMs with RAG (Retrieval-Augmented Generation) capabilities. SLM CLI indexes your codebase, retrieves relevant context for your questions, and provides intelligent coding assistance powered by Ollama.
- Local LLM Integration: Uses Ollama for both chat completions and embeddingsβno cloud API keys required
- RAG-Enhanced Conversations: Automatically retrieves relevant code context for each query
- File Edit Support: LLM can suggest file edits that you can apply with a single command
- Persistent Index: Vector store is saved locally for fast startup on subsequent runs
- Smart Filtering: Automatically ignores
.git,target,node_modules, lock files, and binary files - Configurable: Customize Ollama URL, models, and context retrieval settings
SLM CLI follows a hexagonal (ports and adapters) / clean architecture approach with three distinct layers:
flowchart TB
subgraph CLI["main.rs (Dependency Injection & CLI)"]
end
subgraph App["Application Layer"]
Indexer["IndexerUseCase"]
Chat["ChatUseCase"]
FileEdit["FileEditUseCase"]
end
subgraph Core["Domain Layer"]
Entities["Entities<br/>(Message, Document, CodeSnippet)"]
Traits["Traits / Ports<br/>(LlmService, FileSystem, VectorStore)"]
end
subgraph Infra["Infrastructure Layer (Adapters)"]
Ollama["OllamaAdapter"]
Disk["DiskAdapter"]
RAG["SimpleRagAdapter"]
end
CLI --> App
App --> Core
Infra -.->|implements| Traits
App --> Infra
src/
βββ main.rs # CLI entry point, dependency injection
βββ lib.rs # Module organization and exports
βββ domain/ # Core business logic (no external deps)
β βββ mod.rs
β βββ entities.rs # Message, Document, CodeSnippet, DocumentMetadata
β βββ error.rs # Custom error types (AppError, LlmError, etc.)
β βββ traits.rs # LlmService, FileSystem, VectorStore (ports)
βββ infrastructure/ # Adapters (external implementations)
β βββ mod.rs
β βββ ollama.rs # OllamaAdapter for LLM chat & embeddings
β βββ disk.rs # DiskAdapter for file operations
β βββ rag.rs # SimpleRagAdapter with in-memory vector storage
βββ application/ # Use cases (orchestration)
βββ mod.rs
βββ indexer.rs # IndexerUseCase for file indexing
βββ chat.rs # ChatUseCase for RAG-enhanced conversations
βββ file_edit.rs # FileEditUseCase for applying LLM edits
-
Domain Layer: Contains pure business logic with no external dependencies. Defines traits (ports) that abstract external systems and entities that represent core data structures.
-
Infrastructure Layer: Contains adapters that implement domain traits. Each adapter wraps an external system (Ollama API, filesystem, vector store).
-
Application Layer: Contains use cases that orchestrate business logic by combining domain entities with infrastructure adapters.
-
Dependency Injection:
main.rsassembles concrete implementations and injects them into use cases, allowing easy testing and swapping of implementations.
Install Rust using rustup:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | shEnsure you have Rust 1.75 or later:
rustc --versionInstall Ollama from ollama.ai or using your package manager:
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.ai/install.sh | shStart the Ollama service:
ollama servePull the default models:
# Chat model (default: qwen2.5-coder:7b)
ollama pull qwen2.5-coder:7b
# Embedding model (default: nomic-embed-text)
ollama pull nomic-embed-text# Clone the repository
git clone https://github.com/yourusername/slm-cli.git
cd slm-cli
# Build in release mode
cargo build --release
# The binary will be at ./target/release/slm-clicargo install --path .This installs slm-cli to ~/.cargo/bin/, which should be in your PATH.
| Option | Short | Default | Description |
|---|---|---|---|
--ollama-url |
http://localhost:11434 |
Ollama API base URL | |
--chat-model |
qwen2.5-coder:7b |
Model for chat completions | |
--embed-model |
nomic-embed-text |
Model for generating embeddings | |
--project-dir |
-p |
Current directory | Project directory to index/chat about |
--context-chunks |
5 |
Number of context chunks to retrieve | |
--force |
-f |
false |
Force re-indexing (for index command) |
# Use a remote Ollama instance
slm-cli --ollama-url http://192.168.1.100:11434 chat
# Use different models
slm-cli --chat-model mistral --embed-model all-minilm chat
# Index a specific project
slm-cli -p /path/to/project index
# Retrieve more context chunks
slm-cli --context-chunks 10 chatBefore chatting, index your project to build the vector store:
cd /path/to/your/project
slm-cli indexOutput:
Indexing project files...
Processing: src/main.rs
Processing: src/lib.rs
...
β Indexed 42 files, 156 chunks in 12.3s
To force re-indexing (e.g., after code changes):
slm-cli index --forceslm-cli chatThis starts an interactive REPL where you can ask questions about your codebase:
SLM CLI - Vibe Coding Agent
Project: /path/to/your/project
Index: 156 chunks loaded
You: How does the authentication middleware work?
Assistant: Based on the codebase, the authentication middleware is implemented in
`src/middleware/auth.rs`. It uses JWT tokens to validate requests...
[Retrieved context from: src/middleware/auth.rs, src/models/user.rs]
For one-off questions without an interactive session:
slm-cli ask "What does the UserService do?"During a chat session, you can use these special commands:
| Command | Description |
|---|---|
/clear |
Clear conversation history and start fresh |
/apply |
Apply file edits suggested in the last LLM response |
exit or quit |
End the chat session |
The LLM can suggest file edits using a special code block syntax:
```filepath:src/utils/helper.rs
pub fn new_helper_function() -> String {
"Hello, World!".to_string()
}
```When you see file edits in a response, use /apply to write them to disk:
You: Add a helper function to src/utils/helper.rs
Assistant: I'll add a helper function for you:
```filepath:src/utils/helper.rs
pub fn new_helper_function() -> String {
"Hello, World!".to_string()
}
You: /apply β Applied 1 file edit:
- src/utils/helper.rs (created)
## How It Works
### RAG (Retrieval-Augmented Generation) Workflow
1. **Indexing Phase**: When you run `slm-cli index`, the tool:
- Scans all files in the project directory
- Filters out ignored paths (`.git`, `node_modules`, `target`, etc.)
- Chunks each file into smaller segments (default: 100 lines per chunk)
- Generates embeddings for each chunk using the embedding model
- Stores chunks and embeddings in the vector store
2. **Query Phase**: When you ask a question:
- Your question is converted to an embedding
- The vector store finds the most similar chunks using cosine similarity
- Retrieved chunks are injected into the system prompt as context
- The LLM generates a response with awareness of your codebase
3. **Response Phase**: The LLM response may include:
- Explanations referencing specific files and code
- Code suggestions with file edit blocks
- Follow-up questions or clarifications
### File Chunking Strategy
Files are split into chunks to fit within embedding model context limits:
- **Chunk Size**: 100 lines per chunk (configurable)
- **Overlap**: 10 lines overlap between chunks for context continuity
- **Metadata**: Each chunk stores file path, line range, and language
### Vector Store Persistence
The vector store is saved to `.slm-index.json` in your project directory:
```json
{
"documents": [
{
"id": "uuid-here",
"content": "pub fn authenticate(...",
"embedding": [0.123, -0.456, ...],
"metadata": {
"file_path": "src/auth.rs",
"start_line": 1,
"end_line": 100,
"language": "rust"
}
}
]
}
This file is loaded on startup, so you don't need to re-index every time.
# Run all tests
cargo test
# Run tests with output
cargo test -- --nocapture
# Run a specific test
cargo test test_cosine_similarity# Run clippy for linting
cargo clippy
# Run clippy with warnings as errors
cargo clippy -- -D warnings# Format all code
cargo fmt
# Check formatting without making changes
cargo fmt -- --checkWhen adding new features, follow the hexagonal architecture:
-
New Entity or Error Type: Add to
src/domain/entities.rsorsrc/domain/error.rs -
New External Integration:
- Define a trait (port) in
src/domain/traits.rs - Implement an adapter in
src/infrastructure/ - Export from
src/infrastructure/mod.rs
- Define a trait (port) in
-
New Use Case:
- Create a new file in
src/application/ - Accept dependencies via constructor injection
- Export from
src/application/mod.rs
- Create a new file in
-
New CLI Command:
- Add to the
Commandsenum insrc/main.rs - Implement the handler function
- Wire up dependencies in
main()
- Add to the
// src/infrastructure/my_adapter.rs
use crate::domain::{MyTrait, AppError};
pub struct MyAdapter {
// fields
}
impl MyAdapter {
pub fn new(/* deps */) -> Self {
Self { /* ... */ }
}
}
#[async_trait::async_trait]
impl MyTrait for MyAdapter {
async fn do_something(&self) -> Result<(), AppError> {
// implementation
}
}This project is licensed under the MIT License - see the LICENSE file for details.
Happy Vibe Coding! π