Skip to content

kmesiab/qwenvert

Repository files navigation

Qwenvert

Run Claude Code with a local LLM on your Mac. Keep your code private.

PyPI version License Python Version

Qwenvert lets you use Claude Code CLI with a completely local LLM (Qwen2.5-Coder) instead of Anthropic's API. Your code never leaves your machine.

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│ Claude Code │ --> │   Qwenvert   │ --> │ Local Qwen  │
│     CLI     │     │   (adapter)  │     │    Model    │
└─────────────┘     └──────────────┘     └─────────────┘
                         :8088              (via Ollama)

Why? Privacy. Security. Compliance. Zero inference costs. No internet required.


⚡ 5-Minute Quick Start

1. Install

Requirements:

  • Mac with M1/M2/M3 chip (8GB RAM minimum)
  • Python 3.9-3.12 (check: python3 --version)
  • Ollama or llama.cpp

Install from PyPI:

pip install qwenvert

Or install from source:

git clone https://github.com/kmesiab/qwenvert.git
cd qwenvert
pip install -e .

macOS Users (Python 3.11+): If you see an "externally-managed-environment" error, you have two options:

Option 1 (Recommended for development):

git clone https://github.com/kmesiab/qwenvert.git
cd qwenvert
make venv           # Creates .venv virtual environment
source .venv/bin/activate
make install-dev    # Installs qwenvert + dev dependencies

Option 2 (Recommended for end users):

pipx install qwenvert  # Installs in isolated environment
# Install pipx first if needed: brew install pipx

This is due to PEP 668 which protects system Python on modern macOS.

2. Setup (One Command - Zero Friction!)

qwenvert init

This will automatically (no prompts!):

  • ✅ Detect your hardware (chip, RAM, cooling)
  • ✅ Install llama-server binary if needed (~50MB)
  • ✅ Pick the best model for your Mac
  • ✅ Download the model from HuggingFace (~4GB)
  • ✅ Configure everything automatically

First run takes 2-5 minutes (downloads binaries & models). Subsequent runs are instant.

Example output:

Qwenvert Initialization

✓ Detected: M1 Pro, 16GB RAM, 16 GPU cores, Active cooling
✓ Selected: Qwen2.5 Coder 7B Q5
✓ Downloading from HuggingFace...
✓ Model downloaded: ~/.qwenvert/models/qwen25-coder-7b-q5.gguf (4.2GB)
✓ Configuration saved: ~/.config/qwenvert/config.yaml

Next step: qwenvert start

3. Start Qwenvert

qwenvert start

You'll see:

Starting Qwenvert

✓ Backend: Ollama with qwen2.5-coder:7b
✓ Backend server: http://localhost:11434 (healthy)
✓ Qwenvert adapter: http://localhost:8088
✓ Ready for Claude Code!

Configure Claude Code:
  export ANTHROPIC_BASE_URL=http://localhost:8088
  export ANTHROPIC_API_KEY=local-qwen
  export ANTHROPIC_MODEL=qwenvert-default

Leave this terminal running.

Missing Dependencies? If Ollama isn't installed, qwenvert will offer to install it automatically:

qwenvert start

You'll see:

======================================================================
  Missing Dependency: Ollama
======================================================================

Ollama is not installed (required for running local models)

To install Ollama using Homebrew:
  1. Run: brew install ollama
  2. Wait for installation to complete
  3. Run: qwenvert init

Learn more: https://ollama.ai

======================================================================

Would you like to install Ollama automatically using Homebrew? [Y/n]:

Non-interactive mode:

qwenvert start --auto-install

Automatically installs missing dependencies via Homebrew without prompting.

Note: Auto-installation only works for supported dependencies (Ollama, llama.cpp) when Homebrew is available.

4. Configure Claude Code (New Terminal)

export ANTHROPIC_BASE_URL=http://localhost:8088
export ANTHROPIC_API_KEY=local-qwen
export ANTHROPIC_MODEL=qwenvert-default

claude

That's it! Claude Code now uses your local model. Your code stays on your machine.

What Just Happened?

Without qwenvert (default):

Claude Code → api.anthropic.com → Claude Sonnet/Opus
              (internet)           (cloud)
              💰 Costs money      ☁️ Code leaves machine

With qwenvert (configured):

Claude Code → localhost:8088 → Ollama → Qwen Model
              (no internet)     (local)  (your Mac)
              💰 Free            🔒 Code stays local

Claude Code doesn't know the difference - it just uses whatever ANTHROPIC_BASE_URL points to!


📖 How to Use

Basic Workflow

# Start qwenvert (terminal 1)
qwenvert start

# Use Claude Code (terminal 2)
export ANTHROPIC_BASE_URL=http://localhost:8088
export ANTHROPIC_API_KEY=local-qwen
export ANTHROPIC_MODEL=qwenvert-default
claude

# When done, stop qwenvert
qwenvert stop

Make Environment Variables Permanent

Add to your ~/.zshrc or ~/.bashrc:

# Qwenvert - Local Claude Code
export ANTHROPIC_BASE_URL=http://localhost:8088
export ANTHROPIC_API_KEY=local-qwen
export ANTHROPIC_MODEL=qwenvert-default

Then reload: source ~/.zshrc

Now claude will automatically use qwenvert!

Verify Claude Code is Using Qwenvert

After setting environment variables, verify the setup:

# Check environment variables are set
echo $ANTHROPIC_BASE_URL
# Should show: http://localhost:8088

echo $ANTHROPIC_API_KEY
# Should show: local-qwen

echo $ANTHROPIC_MODEL
# Should show: qwenvert-default

# Make sure qwenvert is running
curl http://localhost:8088/health
# Should return: {"status":"healthy","backend":"connected"}

# Test with Claude Code
claude
# In Claude Code, ask: "What model are you?"
# It should respond as Qwen2.5-Coder (though it might say Claude)

How to tell it's working:

  • ✅ Claude Code starts without asking for an API key
  • ✅ Responses come quickly (no network delay)
  • qwenvert monitor shows requests appearing
  • ✅ Works offline (disconnect wifi and try)

If it's NOT working:

  • ❌ "Invalid API key" error → Check ANTHROPIC_API_KEY=local-qwen
  • ❌ "Connection refused" → Check ANTHROPIC_BASE_URL and qwenvert is running
  • ❌ "Model not found" → Check ANTHROPIC_MODEL=qwenvert-default

🎯 Common Commands

Check Status

qwenvert status

Output:

Qwenvert Status

Configuration
  Model:              qwen2.5-coder-7b-q5
  Backend:            ollama
  Backend URL:        http://localhost:11434
  Adapter:            http://localhost:8088
  Context Length:     32,768 tokens

Server Health:
  Backend:  ✓ Running
  Adapter:  ✓ Running

Monitor Performance (Optional)

qwenvert monitor

Shows a live dashboard with:

  • Requests per second
  • Token generation speed
  • System resources (CPU, memory, temp)
  • Recent request history

OpenTelemetry Support: The monitor now uses OpenTelemetry-compliant metrics. Enable OTLP export for integration with observability platforms:

# Enable with local OTLP collector (secure)
export OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4317
qwenvert monitor --enable-otel

See TELEMETRY_SECURITY.md for complete security details.

Press Ctrl+C to exit.

Binary Management Commands

Check llama-server installation:

qwenvert binary info

Output:

┌──────────────┬────────────────────────────────────────┐
│ Property     │ Value                                  │
├──────────────┼────────────────────────────────────────┤
│ Path         │ ~/.cache/qwenvert/bin/llama-server     │
│ Version      │ b3600                                  │
│ Source       │ downloaded                             │
│ Architecture │ arm64                                  │
│ Valid        │ ✓ Yes                                  │
└──────────────┴────────────────────────────────────────┘

List available versions:

qwenvert binary list

Install specific version:

qwenvert binary install --version b3600

Update to latest:

qwenvert binary update

Verify integrity:

qwenvert binary verify

Rollback to backup:

qwenvert binary rollback

Detect Available Backends

qwenvert backends

Shows which backends (MLX, llama.cpp, Ollama) are available on your system and recommends the fastest option.

Example output on Apple Silicon:

Available Backends:
✓ MLX v0.10.0 (recommended - fastest on Apple Silicon)
✓ llama.cpp b3600 (available)
✗ Ollama (not installed)

List Available Models

qwenvert models list

Output:

Available Models

ID                           Size    RAM    Context
qwen2.5-coder-7b-q4          4.1GB   8GB    32K
qwen2.5-coder-7b-q5          4.8GB   16GB   32K
qwen2.5-coder-14b-q4         8.5GB   16GB   32K
qwen2.5-coder-14b-q5         10GB    32GB   32K

Clean Up Downloaded Models

Remove downloaded model files to free disk space:

# Interactive selection
qwenvert models clean

# Remove specific model
qwenvert models clean --model-id qwen2.5-coder-7b-instruct-q4_k_m.gguf

# Remove all models (with confirmation)
qwenvert models clean --all

# Preview what would be deleted (dry run)
qwenvert models clean --dry-run

Example output:

Model Cleanup

Models disk usage: 12.3 GB
Available disk space: 45.2 GB

Downloaded models:

  1. qwen2.5-coder-7b-instruct-q4_k_m.gguf (4.2 GB)
  2. qwen2.5-coder-14b-instruct-q5_k_m.gguf (8.1 GB)
  3. All models
  4. Cancel

Enter number(s) separated by commas: 1

Models to be deleted:

Filename                                   Size
qwen2.5-coder-7b-instruct-q4_k_m.gguf     4.2 GB

Total space to free: 4.2 GB

Delete these models? [y/N]: y

✓ Cleanup complete! Deleted 1 model(s), freed 4.2 GB

Check Your Hardware

qwenvert hardware

Output:

Hardware Information

Chip:               M1 Pro
Total Memory:       16GB
GPU Cores:          16
Performance Cores:  8
Cooling:            Active (fan)
Recommended:        32K tokens context

📦 Dependencies & Auto-Installation

Required Dependencies

Qwenvert requires one of these backends to run:

  • Ollama (recommended) - Easy to install via Homebrew: brew install ollama
  • llama.cpp - Manual build required, see llama.cpp docs

Supported Auto-Install Dependencies

When you run qwenvert start, it automatically detects missing dependencies and offers to install them via Homebrew. The following dependencies support auto-installation:

Dependency Package Name Installation Command
Ollama ollama brew install ollama
llama.cpp llama.cpp (Not yet supported for auto-install)

Security Note: Auto-installation only works for whitelisted dependencies defined in ALLOWED_AUTO_INSTALL_DEPENDENCIES. This prevents accidental installation of arbitrary packages.

Auto-Install Modes

Interactive (default):

qwenvert start
# Prompts: "Would you like to install Ollama automatically using Homebrew? [Y/n]:"

Non-interactive (CI/automation):

qwenvert start --auto-install
# Automatically installs without prompting

Manual installation (traditional):

# Install Ollama manually
brew install ollama

# Then start qwenvert
qwenvert start

Checking Dependencies

To check if dependencies are installed, qwenvert automatically detects them when you run commands. You can also manually check:

which ollama        # Check if Ollama is in PATH
ollama --version    # Verify Ollama version

Adding More Dependencies

Currently, only Ollama and llama.cpp are supported as backends. Other dependencies (like Homebrew itself) require manual installation.

If you need support for additional backends, please open an issue.


🔧 Advanced Usage

Use a Specific Model

# List models
qwenvert models list

# Re-initialize with different model
qwenvert init --model qwen2.5-coder-14b-q5

# Restart
qwenvert stop
qwenvert start

Use llama.cpp Instead of Ollama

# Initialize with llama.cpp backend
qwenvert init --backend llamacpp

# Start (same command)
qwenvert start

Why llama.cpp?

  • More control over inference parameters
  • Slightly faster on some Macs
  • Lower memory overhead

Why Ollama? (default)

  • Easier to install
  • Better model management
  • More beginner-friendly

Custom Context Length

# Longer context = more memory
qwenvert init --context-length 65536  # 64K tokens

# Shorter context = less memory
qwenvert init --context-length 16384  # 16K tokens

Rule of thumb:

  • 8GB Mac: 16K max
  • 16GB Mac: 32K safe
  • 32GB+ Mac: 64K works

❓ Troubleshooting

"Connection refused" when starting Claude Code

Check if qwenvert is running:

curl http://localhost:8088/health

Should return:

{"status": "healthy", "backend": "connected"}

If not running:

qwenvert start

Model download fails

Problem: HuggingFace download interrupted

Solution:

# Try again (downloads resume automatically)
qwenvert init

# Or download manually and place in ~/.qwenvert/models/

Slow response times

Check memory usage:

qwenvert status

Solutions:

  1. Use smaller model:

    qwenvert init --model qwen2.5-coder-7b-q4
  2. Reduce context length:

    qwenvert init --context-length 16384
  3. Close other apps to free RAM

Expected speeds:

  • 8GB Mac: 15-20 tokens/sec
  • 16GB Mac: 25-35 tokens/sec
  • 32GB+ Mac: 30-40 tokens/sec

MacBook Air overheating

Enable thermal pacing:

Edit ~/.config/qwenvert/config.yaml:

thermal_pacing: true
thermal_threshold: 70  # Celsius

Or re-run init with thermal protection:

qwenvert init --thermal-pacing

Can't install - Python version error

Problem: Python 3.13 not supported yet

Solution: Use Python 3.12 or earlier

# Check version
python3 --version

# Install Python 3.12 via Homebrew
brew install python@3.12

# Use it
pip3.12 install -e .

Environment variables not persisting

Problem: Variables reset when you close terminal

Solution: Add to shell config

# Open your shell config
nano ~/.zshrc  # or ~/.bashrc for bash

# Add these lines
export ANTHROPIC_BASE_URL=http://localhost:8088
export ANTHROPIC_API_KEY=local-qwen
export ANTHROPIC_MODEL=qwenvert-default

# Save and reload
source ~/.zshrc

"externally-managed-environment" error on install

Problem: pip install fails with error about externally managed environment

macOS Python 3.11+ Context: Apple now protects system Python to prevent breaking macOS tools. This is PEP 668 in action.

Solution 1 - Virtual Environment (Recommended for development):

# Clone the repository
git clone https://github.com/kmesiab/qwenvert.git
cd qwenvert

# Create and activate virtual environment
make venv
source .venv/bin/activate

# Install
make install-dev

Solution 2 - pipx (Recommended for end users):

# Install pipx if needed
brew install pipx

# Install qwenvert in isolated environment
pipx install qwenvert

Solution 3 - Disable protection (NOT recommended):

# This breaks the system protection - avoid unless you know what you're doing
pip install qwenvert --break-system-packages

Why virtual environments?

  • Isolated dependencies (won't conflict with other projects)
  • Easy to delete and recreate if something breaks
  • Standard Python best practice
  • Doesn't require disabling system protections

🔒 Privacy & Security

What Data Stays Local?

Everything. Qwenvert is designed for security-conscious developers.

Your code - Never sent to any server ✅ Prompts - Processed only on your Mac ✅ Responses - Generated locally ✅ Model weights - Stored in ~/.qwenvert/models/

How We Guarantee This

  1. Localhost-only binding - Adapter listens on 127.0.0.1 only (not accessible from network)
  2. No external calls - Code explicitly blocks external connections
  3. Telemetry security - All telemetry exporters disabled by default; OTLP endpoints validated to be localhost-only (see TELEMETRY_SECURITY.md)
  4. Test-proven - 23 dedicated security tests verify isolation and telemetry safety
  5. Transparent code - Full source available for audit

Perfect for:

  • HIPAA/SOC2 compliance
  • Proprietary code bases
  • Air-gapped development
  • Security research
  • Offline work

📊 Performance Expectations

What to Expect

Mac Type Model Speed Memory Context
8GB M1 (Air) 7B Q4 15-20 t/s ~4GB 16K tokens
16GB M1 Pro 7B Q5 25-35 t/s ~6GB 32K tokens
32GB M1 Max 14B Q5 20-30 t/s ~12GB 64K tokens

t/s = tokens per second

Compared to Cloud APIs

Feature Qwenvert Claude API
Speed 20-35 t/s 40-60 t/s
Latency ~0ms (local) 100-300ms (network)
Cost $0/month $15-300/month
Privacy 100% local Cloud
Offline ✅ Yes ❌ No
Code quality Good Excellent

Best for: Security/privacy-critical work, cost-sensitive projects, offline development

Not ideal for: Highest code quality, fastest possible responses


🎓 Understanding Qwenvert

What Is It?

Qwenvert is an HTTP adapter that sits between Claude Code CLI and your local LLM:

Claude Code → Qwenvert → Ollama/llama.cpp → Qwen Model

Not just config - It's a full translation layer:

  • Translates Anthropic API → Ollama/llama.cpp format
  • Converts responses back to Anthropic format
  • Handles streaming (Server-Sent Events)
  • Manages backend processes
  • Monitors performance

Why Not Use Ollama Directly?

Ollama has basic Anthropic API support, but:

  • ❌ Limited streaming support
  • ❌ Missing some API features
  • ❌ No thermal management
  • ❌ No hardware optimization
  • ❌ Can't switch backends easily

Qwenvert provides:

  • ✅ Full Anthropic Messages API
  • ✅ Works with Ollama or llama.cpp
  • ✅ Thermal monitoring for MacBook Air
  • ✅ Hardware-aware model selection
  • ✅ Easy to extend with new backends

Performance & Backend Comparison

Qwenvert supports three backends for running local LLMs: MLX (fastest on Apple Silicon), llama.cpp (fast and cross-platform), and Ollama (easiest setup).

Benchmark Results

Backend Throughput Performance vs Ollama Best For
MLX ~230 tok/s 1.5-2x faster than llama.cpp Apple Silicon (M1-M5), Python integration
llama.cpp ~150 tok/s 3-7x faster than Ollama Production, cross-platform
Ollama 20-40 tok/s Baseline Quick testing, simple setup

Benchmarks from vLLM-MLX (2026) and Comparative Study (2025)

Why MLX is Fastest on Apple Silicon

MLX (Apple's ML framework) is purpose-built for Apple Silicon and provides:

  • Native Metal GPU Acceleration: Direct access to M-series GPU/Neural Engine
  • Unified Memory Optimization: Efficient use of Apple's unified memory architecture
  • M5 Neural Accelerators: Only framework that leverages M5's new GPU Neural Accelerators (3.5-4x faster prefill)
  • Content-Based Prefix Caching: 28x speedup on repeated image queries (multimodal models)
  • Lower Latency: ~1.5-2x faster than llama.cpp on same hardware

MLX is automatically recommended on Apple Silicon if available.

Why llama.cpp is Faster than Ollama

llama.cpp provides direct Metal GPU acceleration for Apple Silicon, while Ollama adds a Go wrapper layer that introduces overhead:

  • Metal Acceleration: 2.4x speedup over CPU-only inference (source)
  • Optimized for Apple Silicon: Full GPU layer offload (-ngl 99)
  • Continuous Batching: Better throughput for multiple requests
  • Lower Memory Overhead: Direct model access without wrapper

Apple Silicon Performance by Model

Mac Model RAM Model Size MLX Throughput llama.cpp Throughput Expected Response Time
M1 Air 8GB 1.5B Q4 45-60 tok/s 30-40 tok/s <1 second
M1 Pro 16GB 7B Q4 63-70 tok/s 28-35 tok/s 1-2 seconds
M2 Max 32GB 14B Q4 48-55 tok/s 22-30 tok/s 2-3 seconds
M3 Pro/Max 18GB+ 7B Q4 65-75 tok/s 28-35 tok/s 1-2 seconds
M4 Max 48GB+ 7B Q4 525 tok/s 150 tok/s <1 second
M5 Pro/Max 24GB+ 7B Q4 800+ tok/s* 150 tok/s <1 second

*M5 performance based on Apple's official benchmarks with MLX as canonical runtime Performance data from vLLM-MLX research, llama.cpp benchmarks, and Apple ML Research

Choosing a Backend

Use MLX (fastest) if:

  • ✅ You're on Apple Silicon (M1-M5)
  • ✅ You want maximum performance (1.5-2x faster than llama.cpp)
  • ✅ You want native Metal GPU acceleration
  • ✅ You need multimodal support (vision models)

Use llama.cpp (cross-platform) if:

  • ✅ You want great performance (3-7x faster than Ollama)
  • ✅ You need cross-platform support
  • ✅ You're comfortable with command-line tools

Use Ollama (easiest) if:

  • ✅ You prefer simpler setup (one-line install)
  • ✅ You already have Ollama installed
  • ✅ Performance is not critical

To switch backends:

qwenvert init --backend llamacpp  # Use llama.cpp (default, fastest production backend)
qwenvert init --backend ollama    # Use Ollama

MLX Backend (Experimental - Not Yet User-Selectable)

The MLX backend infrastructure is implemented but not yet available via CLI. MLX requires router/launcher integration for in-process execution. Once complete, it will provide 1.5-2x faster inference on Apple Silicon (M1-M5) compared to llama.cpp.

Current status:

  • ✅ Backend detection and installation
  • ✅ Model registry (5 MLX models)
  • ❌ Router integration (blocked by in-process execution model)
  • ❌ CLI selection (disabled until router complete)

For production use, stick with llama.cpp or Ollama backends.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Claude Code CLI                         │
└────────────────────────┬────────────────────────────────────┘
                         │
                    POST /v1/messages
                         │
┌────────────────────────▼────────────────────────────────────┐
│                 Qwenvert HTTP Adapter                        │
│                     (localhost:8088)                         │
│  • Validates requests                                        │
│  • Translates Anthropic → Backend format                    │
│  • Handles streaming (SSE)                                   │
│  • Monitors performance                                      │
└────────────────────────┬────────────────────────────────────┘
                         │
                Backend-specific API
                         │
┌────────────────────────▼────────────────────────────────────┐
│              Ollama or llama.cpp Server                      │
│                  (localhost:11434 or :8080)                  │
└────────────────────────┬────────────────────────────────────┘
                         │
                  ┌──────▼───────┐
                  │  Qwen Model  │
                  │    (GGUF)    │
                  └──────────────┘

🚀 Next Steps

After Installation

  1. Optimize for your use case:

    • Heavy coding? Use Q5 quantization for better quality
    • Low RAM? Use Q4 quantization to save memory
    • Need speed? Use llama.cpp backend
  2. Set up convenience aliases:

    # Add to ~/.zshrc
    alias qw-start='qwenvert start'
    alias qw-stop='qwenvert stop'
    alias qw-status='qwenvert status'
  3. Monitor performance:

    qwenvert monitor
  4. Read advanced docs:


💡 Tips & Best Practices

For Best Performance

  1. Close other apps when running inference
  2. Use appropriate model size for your RAM
  3. Monitor temperature on MacBook Air (use qwenvert monitor)
  4. Don't use Rosetta - qwenvert is native Apple Silicon

For Best Code Quality

  1. Use Q5 quantization if you have 16GB+ RAM
  2. Give it more context - longer prompts = better results
  3. Be specific in your prompts (same as with Claude)
  4. Iterate - local models benefit from refinement

For Development

  1. Keep qwenvert running in a dedicated terminal
  2. Check logs if something seems wrong: qwenvert status
  3. Update models periodically - new versions improve quality
  4. Share feedback - open issues for bugs/improvements

📊 Performance Benchmarks

Measure qwenvert performance on your Mac:

# Start qwenvert
qwenvert start

# Run benchmarks (separate terminal)
make benchmark

What it tests:

  • Different prompt lengths (short, medium, long)
  • Streaming vs non-streaming
  • Different token limits (50, 100, 200)
  • Code generation tasks

Metrics:

  • Latency (ms)
  • Throughput (tokens/sec)
  • Time to first token (TTFT)
  • Success rate

Example output:

┌────────────────┬─────────┬──────┬─────────┬────────┬─────────┬────────┐
│ Benchmark      │ Backend │ Quant│ Latency │ Tokens │ Speed   │ Status │
├────────────────┼─────────┼──────┼─────────┼────────┼─────────┼────────┤
│ prompt_short   │ ollama  │ Q4_K │ 1234ms  │ 5      │ 4.1 t/s │   ✓    │
│ prompt_medium  │ ollama  │ Q4_K │ 2456ms  │ 89     │ 36.2t/s │   ✓    │
└────────────────┴─────────┴──────┴─────────┴────────┴─────────┴────────┘

Summary:
  Average latency: 1845ms
  Average throughput: 32.4 tokens/sec

Results saved to benchmarks/results/ for tracking over time.

See benchmarks/README.md for details.


🤝 Contributing

We welcome contributions! Areas where help is needed:

  • Model support - Add Qwen3-Coder, other model families
  • Backend support - vLLM, TensorRT-LLM integration
  • MLX enhancements - Continuous batching, multimodal support
  • Performance - Optimization for specific Mac models
  • Testing - More edge cases, hardware configurations
  • Documentation - Tutorials, examples, translations

See CONTRIBUTING.md for guidelines.


📚 More Documentation


🙏 Acknowledgments

  • Qwen Team (Alibaba) - Excellent Qwen2.5-Coder models
  • Apple ML Team - Metal acceleration, unified memory
  • llama.cpp community - High-performance inference engine
  • Ollama team - Making local LLMs accessible
  • Anthropic - Claude Code CLI and Messages API

📝 License

Apache 2.0 License - see LICENSE


⚠️ Limitations & Disclaimers

Known Limitations

  • Mac only - Designed for M1/M2/M3 Macs (Intel/Windows not supported)
  • Python 3.9-3.12 - Python 3.13 not yet compatible
  • Large downloads - Models are 4-10GB (one-time download)
  • Code quality - Good, but not as good as Claude Opus/Sonnet
  • First run slow - Model loading takes 10-30 seconds

Not Affiliated

Qwenvert is an independent project and is not affiliated with, endorsed by, or supported by Anthropic. Claude Code is a trademark of Anthropic.


📖 Research & Methodology

This project implements research-backed development practices for AI agent collaboration:

Repository-Level Instructions

Our AGENTS.md file follows findings from:

"Repository-Level Instructions Enhance AI Assistant Completion and Efficiency" Li et al., 2025. arXiv:2601.20404 https://arxiv.org/abs/2601.20404

Key findings from the research:

  • 28.64% reduction in AI agent task completion time
  • 16.58% reduction in token usage
  • Repository-level instructions significantly improve code generation accuracy

How we apply it:

  • Structured project conventions in AGENTS.md
  • Security-critical rules documented upfront
  • File modification requirements clearly specified
  • Specialized agent catalog with use cases

This approach makes qwenvert development more efficient and maintainable when working with AI coding assistants like Claude Code.


Questions? Issues? Feedback?

Open an issue: https://github.com/kmesiab/qwenvert/issues


Built with care for the Mac M1 community 🚀

About

One click to configure Claude Code to work with local Qwen models

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages