Skip to content

b-fontaine/slm-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SLM CLI - Vibe Coding Agent

A Rust-based CLI tool for AI-assisted coding using local LLMs with RAG (Retrieval-Augmented Generation) capabilities. SLM CLI indexes your codebase, retrieves relevant context for your questions, and provides intelligent coding assistance powered by Ollama.

Features

  • Local LLM Integration: Uses Ollama for both chat completions and embeddingsβ€”no cloud API keys required
  • RAG-Enhanced Conversations: Automatically retrieves relevant code context for each query
  • File Edit Support: LLM can suggest file edits that you can apply with a single command
  • Persistent Index: Vector store is saved locally for fast startup on subsequent runs
  • Smart Filtering: Automatically ignores .git, target, node_modules, lock files, and binary files
  • Configurable: Customize Ollama URL, models, and context retrieval settings

Architecture

SLM CLI follows a hexagonal (ports and adapters) / clean architecture approach with three distinct layers:

flowchart TB
    subgraph CLI["main.rs (Dependency Injection & CLI)"]
    end

    subgraph App["Application Layer"]
        Indexer["IndexerUseCase"]
        Chat["ChatUseCase"]
        FileEdit["FileEditUseCase"]
    end

    subgraph Core["Domain Layer"]
        Entities["Entities<br/>(Message, Document, CodeSnippet)"]
        Traits["Traits / Ports<br/>(LlmService, FileSystem, VectorStore)"]
    end

    subgraph Infra["Infrastructure Layer (Adapters)"]
        Ollama["OllamaAdapter"]
        Disk["DiskAdapter"]
        RAG["SimpleRagAdapter"]
    end

    CLI --> App
    App --> Core
    Infra -.->|implements| Traits
    App --> Infra
Loading

Directory Structure

src/
β”œβ”€β”€ main.rs                    # CLI entry point, dependency injection
β”œβ”€β”€ lib.rs                     # Module organization and exports
β”œβ”€β”€ domain/                    # Core business logic (no external deps)
β”‚   β”œβ”€β”€ mod.rs
β”‚   β”œβ”€β”€ entities.rs            # Message, Document, CodeSnippet, DocumentMetadata
β”‚   β”œβ”€β”€ error.rs               # Custom error types (AppError, LlmError, etc.)
β”‚   └── traits.rs              # LlmService, FileSystem, VectorStore (ports)
β”œβ”€β”€ infrastructure/            # Adapters (external implementations)
β”‚   β”œβ”€β”€ mod.rs
β”‚   β”œβ”€β”€ ollama.rs              # OllamaAdapter for LLM chat & embeddings
β”‚   β”œβ”€β”€ disk.rs                # DiskAdapter for file operations
β”‚   └── rag.rs                 # SimpleRagAdapter with in-memory vector storage
└── application/               # Use cases (orchestration)
    β”œβ”€β”€ mod.rs
    β”œβ”€β”€ indexer.rs             # IndexerUseCase for file indexing
    β”œβ”€β”€ chat.rs                # ChatUseCase for RAG-enhanced conversations
    └── file_edit.rs           # FileEditUseCase for applying LLM edits

Key Components

  • Domain Layer: Contains pure business logic with no external dependencies. Defines traits (ports) that abstract external systems and entities that represent core data structures.

  • Infrastructure Layer: Contains adapters that implement domain traits. Each adapter wraps an external system (Ollama API, filesystem, vector store).

  • Application Layer: Contains use cases that orchestrate business logic by combining domain entities with infrastructure adapters.

  • Dependency Injection: main.rs assembles concrete implementations and injects them into use cases, allowing easy testing and swapping of implementations.

Prerequisites

Rust Toolchain

Install Rust using rustup:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Ensure you have Rust 1.75 or later:

rustc --version

Ollama

Install Ollama from ollama.ai or using your package manager:

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.ai/install.sh | sh

Start the Ollama service:

ollama serve

Required Models

Pull the default models:

# Chat model (default: qwen2.5-coder:7b)
ollama pull qwen2.5-coder:7b

# Embedding model (default: nomic-embed-text)
ollama pull nomic-embed-text

Installation

From Source

# Clone the repository
git clone https://github.com/yourusername/slm-cli.git
cd slm-cli

# Build in release mode
cargo build --release

# The binary will be at ./target/release/slm-cli

Install Globally

cargo install --path .

This installs slm-cli to ~/.cargo/bin/, which should be in your PATH.

Configuration

Command-Line Options

Option Short Default Description
--ollama-url http://localhost:11434 Ollama API base URL
--chat-model qwen2.5-coder:7b Model for chat completions
--embed-model nomic-embed-text Model for generating embeddings
--project-dir -p Current directory Project directory to index/chat about
--context-chunks 5 Number of context chunks to retrieve
--force -f false Force re-indexing (for index command)

Example Configurations

# Use a remote Ollama instance
slm-cli --ollama-url http://192.168.1.100:11434 chat

# Use different models
slm-cli --chat-model mistral --embed-model all-minilm chat

# Index a specific project
slm-cli -p /path/to/project index

# Retrieve more context chunks
slm-cli --context-chunks 10 chat

Usage Guide

Step 1: Index Your Project

Before chatting, index your project to build the vector store:

cd /path/to/your/project
slm-cli index

Output:

Indexing project files...
Processing: src/main.rs
Processing: src/lib.rs
...

βœ“ Indexed 42 files, 156 chunks in 12.3s

To force re-indexing (e.g., after code changes):

slm-cli index --force

Step 2: Start an Interactive Chat Session

slm-cli chat

This starts an interactive REPL where you can ask questions about your codebase:

SLM CLI - Vibe Coding Agent
Project: /path/to/your/project
Index: 156 chunks loaded

You: How does the authentication middleware work?
Assistant: Based on the codebase, the authentication middleware is implemented in
`src/middleware/auth.rs`. It uses JWT tokens to validate requests...

[Retrieved context from: src/middleware/auth.rs, src/models/user.rs]

Step 3: Ask Single Questions

For one-off questions without an interactive session:

slm-cli ask "What does the UserService do?"

Interactive Chat Commands

During a chat session, you can use these special commands:

Command Description
/clear Clear conversation history and start fresh
/apply Apply file edits suggested in the last LLM response
exit or quit End the chat session

File Edit Feature

The LLM can suggest file edits using a special code block syntax:

```filepath:src/utils/helper.rs
pub fn new_helper_function() -> String {
    "Hello, World!".to_string()
}
```

When you see file edits in a response, use /apply to write them to disk:

You: Add a helper function to src/utils/helper.rs

Assistant: I'll add a helper function for you:

```filepath:src/utils/helper.rs
pub fn new_helper_function() -> String {
    "Hello, World!".to_string()
}

You: /apply βœ“ Applied 1 file edit:

  • src/utils/helper.rs (created)

## How It Works

### RAG (Retrieval-Augmented Generation) Workflow

1. **Indexing Phase**: When you run `slm-cli index`, the tool:
   - Scans all files in the project directory
   - Filters out ignored paths (`.git`, `node_modules`, `target`, etc.)
   - Chunks each file into smaller segments (default: 100 lines per chunk)
   - Generates embeddings for each chunk using the embedding model
   - Stores chunks and embeddings in the vector store

2. **Query Phase**: When you ask a question:
   - Your question is converted to an embedding
   - The vector store finds the most similar chunks using cosine similarity
   - Retrieved chunks are injected into the system prompt as context
   - The LLM generates a response with awareness of your codebase

3. **Response Phase**: The LLM response may include:
   - Explanations referencing specific files and code
   - Code suggestions with file edit blocks
   - Follow-up questions or clarifications

### File Chunking Strategy

Files are split into chunks to fit within embedding model context limits:

- **Chunk Size**: 100 lines per chunk (configurable)
- **Overlap**: 10 lines overlap between chunks for context continuity
- **Metadata**: Each chunk stores file path, line range, and language

### Vector Store Persistence

The vector store is saved to `.slm-index.json` in your project directory:

```json
{
  "documents": [
    {
      "id": "uuid-here",
      "content": "pub fn authenticate(...",
      "embedding": [0.123, -0.456, ...],
      "metadata": {
        "file_path": "src/auth.rs",
        "start_line": 1,
        "end_line": 100,
        "language": "rust"
      }
    }
  ]
}

This file is loaded on startup, so you don't need to re-index every time.

Contributing

Running Tests

# Run all tests
cargo test

# Run tests with output
cargo test -- --nocapture

# Run a specific test
cargo test test_cosine_similarity

Linting

# Run clippy for linting
cargo clippy

# Run clippy with warnings as errors
cargo clippy -- -D warnings

Code Formatting

# Format all code
cargo fmt

# Check formatting without making changes
cargo fmt -- --check

Adding New Features

When adding new features, follow the hexagonal architecture:

  1. New Entity or Error Type: Add to src/domain/entities.rs or src/domain/error.rs

  2. New External Integration:

    • Define a trait (port) in src/domain/traits.rs
    • Implement an adapter in src/infrastructure/
    • Export from src/infrastructure/mod.rs
  3. New Use Case:

    • Create a new file in src/application/
    • Accept dependencies via constructor injection
    • Export from src/application/mod.rs
  4. New CLI Command:

    • Add to the Commands enum in src/main.rs
    • Implement the handler function
    • Wire up dependencies in main()

Example: Adding a New Adapter

// src/infrastructure/my_adapter.rs
use crate::domain::{MyTrait, AppError};

pub struct MyAdapter {
    // fields
}

impl MyAdapter {
    pub fn new(/* deps */) -> Self {
        Self { /* ... */ }
    }
}

#[async_trait::async_trait]
impl MyTrait for MyAdapter {
    async fn do_something(&self) -> Result<(), AppError> {
        // implementation
    }
}

License

This project is licensed under the MIT License - see the LICENSE file for details.


Happy Vibe Coding! πŸš€

About

SLM based Vide Coding Agent

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages