Automated Academic PDF Organization & Search — Powered by AI
Published in SoftwareX (Elsevier) · Science Citation Index Expanded (SCI-E)
Overview • Screenshots • Features • Pipeline • Quick Start • Docs • Citation
LitOrganizer is a free, open-source tool that automatically organizes academic PDF collections. It extracts metadata via DOI lookup, queries multiple academic APIs, and leverages Google Gemini AI as an intelligent fallback — then renames files using citation standards, categorizes them, and provides full-text search through a modern web interface.
|
The Problem:
Researchers accumulate hundreds of PDFs with cryptic filenames like |
The Solution:
LitOrganizer automatically renames them to |
|
Automatically detects DOIs from PDF text and queries 7+ academic APIs simultaneously for accurate metadata:
|
When DOI extraction fails, Gemini AI reads the PDF content and extracts title, authors, and year — then validates via Crossref. Real-time AI status panel shows extraction progress. |
|
Files are renamed using APA 7th edition format: Automatic folder categorization: journal · author · year · subject |
Search across your entire PDF collection with:
|
|
|
LitOrganizer uses a multi-stage pipeline to extract metadata and name your PDF files:
flowchart LR
A["📄 PDF File"] --> B{"DOI Found?"}
B -- Yes --> C["🔗 Query Academic APIs"]
C --> D["✅ Named Article/"]
B -- No --> E{"Gemini AI\nEnabled?"}
E -- Yes --> F["🤖 AI Extraction\n(Title, Authors, Year)"]
F --> G{"Validated via\nCrossref?"}
G -- Yes --> D
G -- No --> H["📁 AI Named Content/\n(if separate folder)"]
E -- No --> I["❓ Unnamed Article/"]
G -- Fail --> I
Output directory structure:
your_pdf_folder/
├── Named Article/ ← DOI + API verified or Gemini AI validated
├── AI Named Content/ ← Gemini AI named (optional separate folder)
├── Unnamed Article/ ← No metadata found
└── backups/ ← Original file backups (if enabled)
The launcher scripts handle everything automatically — Python check, virtual environment, dependencies, and server startup.
🪟 Windows
- Download or clone the repository
- Double-click
start_litorganizer.bat - Browser opens automatically at
http://localhost:5000
🍎 macOS
git clone https://github.com/bcankara/LitOrganizer.git
cd LitOrganizer
chmod +x start_litorganizer.sh "Start LitOrganizer.command"Option A: Double-click Start LitOrganizer.command in Finder
Option B: Run ./start_litorganizer.sh in Terminal
Note: If downloaded as ZIP, remove quarantine first:
xattr -cr .
🐧 Linux
git clone https://github.com/bcankara/LitOrganizer.git
cd LitOrganizer
chmod +x start_litorganizer.sh
./start_litorganizer.sh🛠 Manual Installation
# Clone & setup
git clone https://github.com/bcankara/LitOrganizer.git
cd LitOrganizer
# Create & activate virtual environment
python3 -m venv .venv
source .venv/bin/activate # macOS / Linux
# .venv\Scripts\activate # Windows
# Install & run
pip install -r requirements.txt
python litorganizer.py🐳 Docker
# Quick start — mount your PDF folder and open http://localhost:5000
docker run -d -p 5000:5000 -v $(pwd)/pdfs:/app/pdf bcankara/litorganizer:v2Or with Docker Compose:
# docker-compose.yml is included in the repo
docker compose up -dOpen your browser at http://localhost:5000
To persist your API key settings, also mount the config volume:
docker run -d -p 5000:5000 \
-v $(pwd)/pdfs:/app/pdf \
-v $(pwd)/config:/app/config \
bcankara/litorganizer:v2⌨️ Command Line Mode
python litorganizer.py -d /path/to/pdfs --create-referencesRun python litorganizer.py --help for all available options.
API settings can be managed on the Settings page or by editing config/api_keys.json.
| API | Status | Requires |
|---|---|---|
| Crossref | ✅ Enabled | — |
| OpenAlex | ✅ Enabled | |
| DataCite | ✅ Enabled | — |
| Europe PMC | ✅ Enabled | — |
| Semantic Scholar | ✅ Enabled | — |
| Scopus | ⬚ Optional | API Key |
| Unpaywall | ⬚ Optional | |
| Google Gemini AI | ⬚ Optional | API Key |
🤖 Enable Gemini AI
- Open the Settings page in LitOrganizer
- Toggle Google Gemini Flash on
- Enter your free API key from Google AI Studio
- Save — Gemini AI will be used as fallback when DOI extraction fails
For detailed usage instructions, see the User Guide which covers:
| Topic | Description |
|---|---|
| 🔄 Naming Pipeline | How metadata is extracted and files are renamed |
| 🤖 Gemini AI Setup | Configuration and usage of the AI fallback |
| 🔎 Keyword Search | Regex examples and export options |
| 📁 Output Structure | How files are organized into folders |
| ⚙️ API Reference | Available APIs and configuration |
💡 In-App Guide: After launching, click Guide in the navigation menu for interactive documentation.
| Layer | Technologies |
|---|---|
| Backend | Python · Flask · Flask-SocketIO · PyMuPDF · pdfplumber |
| AI | Google Gemini Flash 2.0 API |
| Frontend | Tailwind CSS · Socket.IO Client · SVG Progress Rings · Native OS Dialog |
| Data Export | pandas · openpyxl · python-docx |
- Modern web interface with real-time updates
- DOI fallback with Crossref title search
- Google Gemini AI integration
- Native OS folder picker
- Built-in usage guide
- Full-text search with Word/Excel export
- Batch export in BibTeX / RIS format
- Docker support
- Dark mode
If you use LitOrganizer in your research, please cite:
Şahin, A., Kara, B. C., & Dirsehan, T. (2025). LitOrganizer: Automating the process of data extraction and organization for scientific literature reviews. SoftwareX, 30, 102198. https://doi.org/10.1016/j.softx.2025.102198
BibTeX
@article{sahin2025litorganizer,
title = {LitOrganizer: Automating the process of data extraction and organization for scientific literature reviews},
author = {Şahin, Alperen and Kara, Burak Can and Dirsehan, Taşkın},
journal = {SoftwareX},
volume = {30},
pages = {102198},
year = {2025},
publisher = {Elsevier},
doi = {10.1016/j.softx.2025.102198}
}APA 7th Edition
Şahin, A., Kara, B. C., & Dirsehan, T. (2025). LitOrganizer: Automating the process of data
extraction and organization for scientific literature reviews. SoftwareX, 30, 102198.
https://doi.org/10.1016/j.softx.2025.102198
RIS
TY - JOUR
TI - LitOrganizer: Automating the process of data extraction and organization for scientific literature reviews
AU - Şahin, Alperen
AU - Kara, Burak Can
AU - Dirsehan, Taşkın
JO - SoftwareX
VL - 30
SP - 102198
PY - 2025
SN - 2352-7110
DO - 10.1016/j.softx.2025.102198
UR - https://www.sciencedirect.com/science/article/pii/S2352711025001657
ER -
v2.0.0 — AI-Powered Web Application (Latest)
Major Release: Complete redesign from PyQt5 desktop app to Flask + Socket.IO web application with Google Gemini AI integration.
- Google Gemini AI integration with real-time status panel
- Modern web interface with Tailwind CSS
- WebSocket-powered live progress tracking with circular progress rings
- Native OS folder picker with quick access shortcuts
- Multi-stage DOI fallback pipeline
- Global activity panel & completion modal
- Comprehensive usage guide page
- Search export to Word/Excel with highlights
- Backup system file copy scope issue
- Cross-platform path separator in "Open Folder"
- Statistics persistence across page navigation
- Progress ring synchronization
- Architecture: PyQt5 → Flask + Socket.IO
- Default AI-named files go to
Named Article/(configurable) - Native OS dialog replaces drag-and-drop zone
- Python requirement broadened to 3.10+
- PyQt5 desktop GUI &
modules/gui/directory --guiCLI argument- Drag & drop directory selection
- Heuristic regex-based content extraction
v1.x — Desktop Application (Legacy)
- PyQt5-based desktop GUI with tabbed interface
- Basic progress bar
- Local-only operation
Contributions are welcome! Please feel free to submit a Pull Request.
1. Fork the repository
2. Create your feature branch → git checkout -b feature/AmazingFeature
3. Commit your changes → git commit -m 'Add AmazingFeature'
4. Push to the branch → git push origin feature/AmazingFeature
5. Open a Pull Request



