Skip to content

SUSRUTHVATTIKUTI/CodeRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CodeR-RAG

Retrieval-Augmented Generation for Code Intelligence

CodeR-RAG is an end-to-end retrieval-augmented system built on top of CodeR, a state-of-the-art code embedding model trained using a massive synthetic dataset (CodeR-Pile). This framework brings high-accuracy code retrieval, multi-language code understanding, and task-aware augmentation directly to LLM-based developer tools.


Features

State-of-the-Art Code Retrieval

Powered by CodeR embeddings, optimized for:

  • Text2Code (e.g., natural language → code examples)
  • Code2Text (summaries, explanations, API usage)
  • Code2Code (similar code, translation, refinement)
  • Hybrid retrieval (bug fixing, optimization, style, security)

Full RAG Pipeline

The system includes:

  • Document ingestion + code parsing
  • Embedding generation (instruction-aware InfoNCE)
  • Vector storage (FAISS / LanceDB / Milvus)
  • Query routing + retrieval
  • Augmentation formatting
  • LLM generation modules

Multi-language Support

CodeR-RAG supports retrieval across 20 programming languages, including: Python, Java, Go, C++, TypeScript, Rust, C#, PHP…

Synthetic Data Extensions

Includes tools and prompts to extend CodeR-Pile using the DRU principle:

  • Diversity (multiple task types)
  • Reliability (LLM and GPT-filtered)
  • Usability (instruction-aware queries)

Modular and Production-Ready

Easily swap:

  • Embedding models (CodeR or custom)
  • LLMs for generation
  • Vector DB backends
  • Chunking & parsing strategies

About

CodeR-RAG is a lightweight RAG system powered by CodeR embeddings for fast, accurate code search and context retrieval. It includes a simple indexing pipeline, retrieval API, and optional DRU-based synthetic data tools.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages