Offline AI-Powered Repository Intelligence Platform
RepoMind AI turns a GitHub URL, ZIP, or local repository into an evidence-backed intelligence workspace: architecture maps, dependency views, security findings, CTO/recruiter reports, technical debt summaries, and cited repository chat.
It runs locally. The only generation model is:
${FORGE_MODELS}/qwen-judge
No model routing. No cloud LLM fallback. No mock answers.
RepoMind AI analyzes real repositories and keeps the result after the source checkout is deleted.
- Ingest a GitHub repository, ZIP archive, or local folder.
- Parse Python, JavaScript, TypeScript, JSX, and TSX with AST extraction.
- Build repository summaries, route maps, dependency graphs, security findings, and score evidence.
- Index code with
BAAI/bge-small-en-v1.5embeddings in ChromaDB. - Answer repository questions with retrieval, reranking, citations, Mermaid diagrams, and qwen-judge inference.
- Generate CTO, recruiter, security, technical debt, roadmap, and project status reports.
- Persist metadata, vector indexes, and reports.
- Delete cloned repository contents after analysis to avoid repository accumulation.
flowchart LR
Frontend["Frontend Workspace"] --> Backend["FastAPI Backend"]
Backend --> Analysis["Analysis Engine"]
Analysis --> Vector["Chroma Vector Store"]
Analysis --> LLM["qwen-judge Local LLM"]
Backend --> Reports["Report Artifacts"]
Vector --> Chat["Cited Repository Chat"]
LLM --> Reports
LLM --> Chat
flowchart LR
Ingestion["Repository Ingestion Service"] --> AST["AST Analysis Service"]
AST --> Dependency["Dependency Engine"]
AST --> Security["Security Engine"]
AST --> RAG["RAG Engine"]
Dependency --> Report["Report Engine"]
Security --> Report
RAG --> Report
RAG --> Chat["Repository Answer Engine"]
flowchart TD
Source["GitHub / ZIP / Local Path"] --> Filter["Ignore generated folders"]
Filter --> Parse["Tree-sitter + Python AST parsing"]
Parse --> Extract["Functions, classes, methods, imports, routes, models"]
Extract --> Graph["Dependency graph"]
Extract --> Security["Bandit + Semgrep + custom rules"]
Graph --> Scores["Evidence-backed scores"]
Security --> Scores
Scores --> Reports["Markdown reports"]
flowchart LR
Question["Repository question"] --> Embed["BGE-small embedding"]
Embed --> Chroma["ChromaDB search"]
Chroma --> Rerank["Lexical + path-aware reranking"]
Rerank --> Evidence["Cited evidence chunks"]
Evidence --> Qwen["qwen-judge"]
Qwen --> Answer["Structured answer + diagram + risks"]
flowchart LR
Clone["Clone/import repository"] --> Analyze["Analyze + index"]
Analyze --> Persist["Persist metadata, vectors, reports"]
Persist --> Delete["Delete repository contents"]
Delete --> Retain["Retain reports and indexes"]
Retain --> Cleanup["Scheduled retention cleanup"]
The Architecture tab is designed as the hero surface:
- Executive Architecture: business-system view with no files.
- Service Architecture: ingestion, AST analysis, dependency, security, RAG, and reporting services.
- Module Architecture: collapsed module groups with optional expansion.
- Implementation Architecture: files, routes, symbols, and imports only when debugging.
The graph uses React Flow, ELK auto-layout, Dagre fallback, animated edges, minimap, zoom, search, focus, fullscreen, service icons, and hover impact cards.
The dependency view groups code by architectural layer:
- Frontend
- API
- Business Logic
- Analysis
- RAG
- Storage
- LLM
This avoids file-level hairballs while still exposing critical paths and module details.
Example question:
How does authentication work?
RepoMind AI answers with:
- Direct answer
- Architecture impact
- Critical files
- Rendered Mermaid diagram
- Risks
- Improvements
- Citations
If evidence does not exist, the product says so. For RepoMindAI itself, authentication is correctly reported as not implemented.
Security analysis combines:
- Bandit findings
- Semgrep findings
- Custom repository rules
- Evidence-backed scoring
- Positive and negative score contributors
| Tool | Strength | RepoMind AI difference |
|---|---|---|
| Sourcegraph | Enterprise-scale code search | RepoMind AI focuses on local repository intelligence, generated reviews, diagrams, and offline model inference. |
| Cursor | IDE-native code assistance | RepoMind AI is a repository-level audit and showcase surface, not an editor autocomplete loop. |
| GitHub code search | Fast symbol/text lookup | RepoMind AI adds AST extraction, vector retrieval, reports, security scoring, architecture maps, and cited answers. |
- Local-first qwen-judge inference.
- Real BGE embeddings and ChromaDB vector search.
- AST parsing for Python, TypeScript, JavaScript, TSX, and JSX.
- Architecture diagrams at executive, service, module, and implementation levels.
- Layered dependency graph with search, minimap, zoom, and focus.
- Cited repository chat with answer-quality guardrails.
- Security audit with Bandit, Semgrep, and custom rules.
- Evidence-backed security, production, maintainability, CTO, and recruiter scores.
- Generated CTO review, recruiter review, roadmap, security report, technical debt report, and project status.
- Post-analysis repository deletion with retained metadata, vectors, and generated reports.
How does authentication work?
Where are API routes defined?
What services talk to the database?
How is repository cleanup handled?
What would prevent this project from production deployment?
Which files are most important for the RAG pipeline?
RepoMind AI extracts implementation facts instead of relying on regex-only source scans. It captures imports, exports, classes, functions, methods, routes, environment variables, and database models, then turns those facts into dependency and architecture evidence.
Repository chunks are embedded with BAAI/bge-small-en-v1.5 and stored in ChromaDB. Retrieval combines vector search, lexical reranking, topic-specific boosts, pinned evidence paths, and citation metadata.
The UI intentionally separates abstraction levels:
- executive and service views show systems and services;
- module view groups code ownership areas;
- implementation view exposes concrete files and symbols.
This prevents the common failure mode where architecture diagrams become unreadable file graphs.
Findings from Bandit, Semgrep, and custom rules are normalized into one security score. Scores include positive contributors, negative contributors, and a calculation explanation.
Benchmarks were run on real repositories with ingestion, analysis, BGE embeddings, ChromaDB indexing, qwen-judge report generation, qwen-judge explainers, and cleanup verification.
| Repository | Analysis | Indexing | Report generation | Files | Indexed chunks | Retrieval | Cleanup |
|---|---|---|---|---|---|---|---|
| FastAPI | 214.669s | 34.913s | 75.228s | 2,748 | 10,862 | auth/routing/db strong | passed |
| Flask | 71.975s | 1.940s | 67.421s | 231 | 857 | auth/db strong, routing partial | passed |
| Next.js | 200.799s | 92.848s | 65.757s | 25,024 | 50,996 | auth/routing/db strong | passed |
| RepoMindAI | 84.715s | 9.770s | 73.348s | 66 | 220 | auth/routing/db partial | passed |
Full details: BENCHMARK_RESULTS.md.
| View | Screenshot |
|---|---|
| Dashboard | screenshots/dashboard-overview.png |
| Architecture | screenshots/architecture-view.png |
| Dependencies | screenshots/dependency-view.png |
| Security | screenshots/security-view.png |
| Repository Chat | screenshots/repository-chat.png |
- Python 3.11+
- Node.js 18+
- Local model at
${FORGE_MODELS}/qwen-judge - CUDA-capable GPU recommended for qwen-judge
cd ${PROJECT_ROOT}
make setup
PYTHONPATH=backend .venv/bin/uvicorn repomind.main:app --host 0.0.0.0 --port 8000cd ${PROJECT_ROOT}/frontend
npm install
npm run build
cp -R .next/static .next/standalone/.next/static
HOSTNAME=0.0.0.0 PORT=3000 node .next/standalone/server.jsOpen:
http://localhost:3000
Repository cleanup is controlled by environment settings:
AUTO_DELETE_AFTER_ANALYSIS=true
RETENTION_MINUTES=1440PYTHONPATH=backend .venv/bin/ruff check backend tests scripts
PYTHONPATH=backend .venv/bin/pytest tests/api/test_api.py::test_health_endpoint tests/unit/test_utils_and_parsing.py -q
cd frontend && npm run buildRepoMind AI is a strong local AI engineering showcase, but it is not presented as a fully managed SaaS product.
Known release gaps:
- Next.js/PostCSS audit advisories remain until a safe framework upgrade.
- qwen-judge report generation is accurate but slow.
- Browser test coverage should be expanded beyond screenshot generation.
- Public setup still assumes the local Forge model path.
See GITHUB_RELEASE_CHECKLIST.md and PRODUCT_REVIEW.md.
- Stream ingestion and report-generation progress.
- Add Playwright regression tests for key UI surfaces.
- Improve report post-processing for even tighter executive prose.
- Add benchmark trend history.
- Reduce large-repo indexing latency.
- Package model-path configuration for easier public setup.
Contributions should preserve the core constraint: repository intelligence must be evidence-backed and generated from the configured local model only.
Useful contribution areas:
- UI polish and accessibility
- Retrieval quality tests
- Benchmark harness improvements
- Security rule coverage
- Documentation and examples
MIT




