Skip to content

sristy17/IRIS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IRIS — Full-Stack Search Engine (BM25)

Python FastAPI React TypeScript License

A full-stack search engine built from scratch implementing web crawling, inverted indexing, and BM25 ranking, with a FastAPI backend and React frontend.


Key Features

  • Web Crawler

    • Multi-seed crawling
    • Domain-restricted traversal
    • HTML parsing & text extraction (BeautifulSoup)
    • Noise filtering (scripts, nav, Wikipedia special pages)
  • Search Engine Core

    • Inverted index construction
    • BM25 ranking algorithm (relevance scoring)
    • Tokenization + stopword filtering
    • Multi-term query handling
  • Backend API

    • FastAPI-based REST service
    • /search?q= endpoint
    • JSON responses with ranked results
    • CORS-enabled for frontend integration
  • Frontend UI

    • React + TypeScript (Vite)
    • Search interface with real-time results
    • Snippets + relevance scores
    • Keyboard + button search support

Architecture

Crawler → Documents → Indexer → Inverted Index
                                      ↓
                                 Query Engine (BM25)
                                      ↓
                                   FastAPI API
                                      ↓
                              React Frontend UI

Tech Stack

  • Languages: Python, TypeScript
  • Backend: FastAPI
  • Frontend: React (Vite)
  • Parsing: BeautifulSoup, Requests
  • IR Model: BM25 (Okapi)

Highlights

  • Built a search engine from scratch using information retrieval concepts
  • Implemented inverted indexing and BM25 ranking algorithm
  • Designed a REST API using FastAPI for query processing
  • Developed a React + TypeScript frontend for real-time search
  • Optimized crawler with domain filtering and noise reduction
  • Handled CORS, async API integration, and full-stack communication

About

IRIS (Information Retrieval & Indexing System) is a production-style search engine built from scratch, featuring web crawling, inverted indexing, and BM25-based ranking. It focuses on efficient information retrieval, fast query processing, and explainable search results.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors