RAG CLI

An advanced RAG (Retrieval-Augmented Generation) command-line tool for indexing documents and answering questions using hybrid search with semantic and lexical retrieval.

Features

Document Processing: Convert documents to structured format using Docling (supports PDF, DOCX, PPTX, XLSX, and more)
Intelligent Chunking: Semantic chunking that preserves document structure and context
Hybrid Search: Combines dense (semantic) and sparse (lexical/BM25) embeddings for superior retrieval
Reranking: Optional reranking with BGE models for improved relevance
Query Rewriting: Automatic query optimization for better search results
RAG with OpenAI: Generate answers using retrieved context with GPT-4 or other OpenAI models
Streaming Responses: Real-time streaming output for chat completions

Architecture

The tool uses a multi-stage pipeline:

Document Conversion (Docling)
- Converts documents to structured JSON format
- Preserves semantic structure (headings, tables, lists)
- Extracts text, tables, and images with provenance
Chunking (Custom chunker)
- Semantic chunking based on document structure
- Maintains heading hierarchy for context
- Preserves tables and code blocks intact
- Configurable chunk sizes with overlap
Indexing (Pinecone)
- Dense embeddings (OpenAI text-embedding-3-small)
- Sparse embeddings
- Dual-index strategy for hybrid search
- Metadata storage (source, headings, chunk text)
Retrieval (Hybrid search)
- Parallel dense + sparse search
- Reciprocal Rank Fusion (RRF) for merging results
- Optional reranking with BGE models
- Score-based filtering
Generation (OpenAI)
- Context-aware prompts with retrieved chunks
- Support for streaming and non-streaming responses
- Configurable models and parameters
- Source citation tracking

Prerequisites

Node.js >= 18.0.0
Docling server (running locally or remotely)
Pinecone account with two indexes (dense + sparse)
OpenAI API key

Installation

# Clone the repository
git clone <repository-url>
cd rag-example

# Install dependencies
npm install

# Build the project
npm run build

Configuration

See .env.example for a template.

Pinecone Setup

You need to create two Pinecone indexes:

Dense Index (for semantic embeddings)
- Dimension: 1536 (for text-embedding-3-small)
- Metric: cosine
- Spec: Serverless (recommended) or Pod-based
Sparse Index
- Metric: dotproduct
- Spec: Serverless (recommended)

Usage

The CLI provides three main commands: index, search, and query.

Indexing Documents

Index documents from a directory or file:

# Index all documents in a directory
npm run dev index --source ./docs

# Index a single file
npm run dev index --source ./docs/my-document.pdf

# Use a namespace for organization
npm run dev index --source ./docs --namespace my-project

# Verbose mode for detailed progress
npm run dev index --source ./docs --verbose

Supported formats: .pdf, .docx, .pptx, .xlsx, .md, .html, .txt, .asciidoc, .xml

Options:

-s, --source <path>: Source directory or file (default: ./docs)
-n, --namespace <name>: Pinecone namespace (optional)
-v, --verbose: Show detailed progress information

Note: Indexing clears existing data in the namespace before adding new documents.

Searching Documents

Search indexed documents using hybrid search:

# Basic search
npm run dev search "machine learning algorithms"

# Limit results
npm run dev search "neural networks" --top-k 5

# Search in a specific namespace
npm run dev search "data pipelines" --namespace my-project

# Disable reranking
npm run dev search "python examples" --no-rerank

# Verbose mode
npm run dev search "optimization techniques" --verbose

Options:

-k, --top-k <number>: Number of results to return (default: 10)
-n, --namespace <name>: Pinecone namespace to search in
--no-rerank: Disable reranking (enabled by default)
--rerank-model <model>: Reranking model to use (default: bge-reranker-v2-m3)
-v, --verbose: Show additional details

Querying with RAG

Ask questions and get AI-generated answers based on your documents:

# Basic query
npm run dev query "What are the main benefits of hybrid search?"

# Use a specific model
npm run dev query "Explain gradient descent" --model gpt-4o

# Adjust temperature for creativity
npm run dev query "Generate ideas for ML projects" --temperature 0.7

# Filter by minimum relevance score
npm run dev query "What is backpropagation?" --min-score 0.5

# Disable query rewriting
npm run dev query "List all algorithms" --no-rewrite

# Non-streaming output
npm run dev query "Summarize the document" --no-stream

# Verbose mode (shows rewritten query and token usage)
npm run dev query "How does attention work?" --verbose

Options:

-k, --top-k <number>: Maximum number of context chunks (default: 5)
-n, --namespace <name>: Pinecone namespace to search in
-m, --model <name>: OpenAI model to use (default: gpt-4o)
-t, --temperature <number>: Response temperature 0-2 (default: 0.3)
--min-score <number>: Minimum relevance score 0-1 (default: 0.3)
--no-rewrite: Skip query rewriting optimization
--no-stream: Disable streaming output
-v, --verbose: Show query details and token usage

Examples

Example 1: Index and Query Technical Documentation

# Index your technical docs
npm run dev index --source ./technical-docs --namespace tech-docs

# Ask a question
npm run dev query "How do I configure authentication?" --namespace tech-docs

Example 2: Research Papers with High Relevance Threshold

# Index research papers
npm run dev index --source ./papers --namespace research

# Query with strict relevance filtering
npm run dev query "What are recent advances in transformer architectures?" \
  --namespace research \
  --min-score 0.6 \
  --top-k 10

Example 3: Creative Writing with Higher Temperature

# Index creative writing examples
npm run dev index --source ./writing-samples

# Generate creative responses
npm run dev query "Write a short story about AI" \
  --temperature 1.2 \
  --model gpt-4o

Development

# Run in development mode
npm run dev <command>

# Build for production
npm run build

# Run production build
npm start <command>

Project Structure

rag-example/
├── src/
│   ├── commands/
│   │   ├── index.ts       # Indexing command
│   │   ├── query.ts       # RAG query command
│   │   └── search.ts      # Search command
│   ├── services/
│   │   ├── docling.ts     # Document conversion
│   │   ├── openai.ts      # OpenAI integration
│   │   └── pinecone.ts    # Vector storage and search
│   ├── utils/
│   │   └── chunker.ts     # Document chunking logic
│   ├── config.ts          # Configuration management
│   └── index.ts           # CLI entry point
├── .env                   # Environment variables (create this)
├── .env.example           # Environment template
├── package.json
├── tsconfig.json
└── README.md

How It Works

Chunking Strategy

The chunker uses a hierarchical approach:

Heading-based sections: Documents are split by headings (h1, h2, etc.)
Size-based splitting: Large sections are recursively split
Context preservation: Heading hierarchy is maintained in metadata
Special handling: Tables and code blocks stay intact

Hybrid Search

The tool implements a sophisticated hybrid search strategy:

Dense search: Uses OpenAI embeddings for semantic similarity
Sparse search: Uses BM25-style lexical matching
Fusion: Combines results using Reciprocal Rank Fusion (RRF)
Reranking: Optional reranking with BGE models for final ordering

RAG Pipeline

Query rewriting: Optimizes the user's question for better retrieval
Hybrid retrieval: Fetches relevant chunks using hybrid search
Filtering: Removes low-relevance results based on score threshold
Context building: Constructs a prompt with retrieved chunks
Generation: GPT generates an answer with source citations

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.cursorignore		.cursorignore
.cursorrules		.cursorrules
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG CLI

Features

Architecture

Prerequisites

Installation

Configuration

Pinecone Setup

Usage

Indexing Documents

Searching Documents

Querying with RAG

Examples

Example 1: Index and Query Technical Documentation

Example 2: Research Papers with High Relevance Threshold

Example 3: Creative Writing with Higher Temperature

Development

Project Structure

How It Works

Chunking Strategy

Hybrid Search

RAG Pipeline

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG CLI

Features

Architecture

Prerequisites

Installation

Configuration

Pinecone Setup

Usage

Indexing Documents

Searching Documents

Querying with RAG

Examples

Example 1: Index and Query Technical Documentation

Example 2: Research Papers with High Relevance Threshold

Example 3: Creative Writing with Higher Temperature

Development

Project Structure

How It Works

Chunking Strategy

Hybrid Search

RAG Pipeline

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages