Notebook Similarity Checker for Plagiarism Detection
- Compares Jupyter notebooks for similarity using sentence-transformer embeddings
- Supports comparison of only markdown, only code, or both
- Visualizes similarity matrix as a heatmap
- CLI interface for flexible usage
- Clone the repository:
git clone https://github.com/shakha-de/simity.git cd simity - Install dependencies (Python >=3.11 recommended) using uv:
or use old, but still gold:
uv sync
pip install -r requirements.txt
python main.py <NOTEBOOK_ROOT_PATH> [--mode all|markdown|code] [--window]<NOTEBOOK_ROOT_PATH>: Root directory to search for Jupyter notebooks (.ipynb)--mode: Comparison mode (default:all)all: Compare both code and markdownmarkdown: Compare only markdown cellscode: Compare only code cells
--window: Show the heatmap in a dedicated window (blocks execution)
python main.py Exercise-1 --mode markdown --window- Lists pairs of submissions with high similarity (>85%)
- Shows a heatmap of all similarities
- Python >=3.11
- numpy
- pandas
- seaborn
- matplotlib
- scikit-learn
- sentence-transformers
- uv (for installation)
MIT