GitHub - bcankara/LitOrganizer: LitOrganizer is a powerful tool designed for researchers, academics, and students to organize their PDF literature collections automatically. It extracts metadata from academic papers, renames files according to citation standards, categorizes them into a logical directory structure, and provides powerful search capabilities.

Automated Academic PDF Organization & Search — Powered by AI

_{Published in SoftwareX (Elsevier) · Science Citation Index Expanded (SCI-E)}

Overview • Screenshots • Features • Pipeline • Quick Start • Docs • Citation

📌 What is LitOrganizer?

LitOrganizer is a free, open-source tool that automatically organizes academic PDF collections. It extracts metadata via DOI lookup, queries multiple academic APIs, and leverages Google Gemini AI as an intelligent fallback — then renames files using citation standards, categorizes them, and provides full-text search through a modern web interface.

The Problem: Researchers accumulate hundreds of PDFs with cryptic filenames like 1234567.pdf, paper_final_v3.pdf, or download(2).pdf. Finding the right paper becomes a nightmare.

The Solution: LitOrganizer automatically renames them to (Smith, 2024) - Machine Learning in Healthcare.pdf and organizes them into folders by journal, author, or year.

📸 Screenshots

_{PDF Processing — Real-time progress with Gemini AI panel}	_{Statistics Dashboard — Performance & accuracy analytics}
_{Processing Complete — Summary with success rate}	_{Full-Text Search — Search across all PDFs with export}

✨ Key Features

🔍 Smart Metadata Extraction

Automatically detects DOIs from PDF text and queries 7+ academic APIs simultaneously for accurate metadata:

Crossref · OpenAlex · DataCite · Europe PMC · Semantic Scholar · Scopus · Unpaywall

🤖 Google Gemini AI Fallback

When DOI extraction fails, Gemini AI reads the PDF content and extracts title, authors, and year — then validates via Crossref.

Real-time AI status panel shows extraction progress.

📝 Citation-Based Renaming

Files are renamed using APA 7th edition format:

(Author, Year) - Title.pdf

Automatic folder categorization: journal · author · year · subject

🔎 Full-Text Search

Search across your entire PDF collection with:

Exact match & regex support
Sentence-level context highlighting
Export results to Word or Excel

📊 Real-Time Web Interface

WebSocket-powered live progress with animated rings
Native OS folder picker dialog
Statistics dashboard with performance metrics

📋 Reference Generation

Auto-generated bibliography of all processed papers
Publication analytics by author, journal & year
Detailed error diagnostics for problematic files

🔬 How It Works

LitOrganizer uses a multi-stage pipeline to extract metadata and name your PDF files:

flowchart LR
    A["📄 PDF File"] --> B{"DOI Found?"}
    B -- Yes --> C["🔗 Query Academic APIs"]
    C --> D["✅ Named Article/"]
    B -- No --> E{"Gemini AI\nEnabled?"}
    E -- Yes --> F["🤖 AI Extraction\n(Title, Authors, Year)"]
    F --> G{"Validated via\nCrossref?"}
    G -- Yes --> D
    G -- No --> H["📁 AI Named Content/\n(if separate folder)"]
    E -- No --> I["❓ Unnamed Article/"]
    G -- Fail --> I

Output directory structure:

your_pdf_folder/
├── Named Article/          ← DOI + API verified or Gemini AI validated
├── AI Named Content/       ← Gemini AI named (optional separate folder)
├── Unnamed Article/        ← No metadata found
└── backups/                ← Original file backups (if enabled)

🚀 Quick Start

The launcher scripts handle everything automatically — Python check, virtual environment, dependencies, and server startup.

🪟 Windows

Download or clone the repository
Double-click start_litorganizer.bat
Browser opens automatically at http://localhost:5000

🍎 macOS

git clone https://github.com/bcankara/LitOrganizer.git
cd LitOrganizer
chmod +x start_litorganizer.sh "Start LitOrganizer.command"

Option A: Double-click Start LitOrganizer.command in Finder Option B: Run ./start_litorganizer.sh in Terminal

Note: If downloaded as ZIP, remove quarantine first: xattr -cr .

🐧 Linux

git clone https://github.com/bcankara/LitOrganizer.git
cd LitOrganizer
chmod +x start_litorganizer.sh
./start_litorganizer.sh

🛠 Manual Installation

# Clone & setup
git clone https://github.com/bcankara/LitOrganizer.git
cd LitOrganizer

# Create & activate virtual environment
python3 -m venv .venv
source .venv/bin/activate        # macOS / Linux
# .venv\Scripts\activate         # Windows

# Install & run
pip install -r requirements.txt
python litorganizer.py

🐳 Docker

# Quick start — mount your PDF folder and open http://localhost:5000
docker run -d -p 5000:5000 -v $(pwd)/pdfs:/app/pdf bcankara/litorganizer:v2

Or with Docker Compose:

# docker-compose.yml is included in the repo
docker compose up -d

Open your browser at http://localhost:5000

To persist your API key settings, also mount the config volume:

docker run -d -p 5000:5000 \
  -v $(pwd)/pdfs:/app/pdf \
  -v $(pwd)/config:/app/config \
  bcankara/litorganizer:v2

⌨️ Command Line Mode

python litorganizer.py -d /path/to/pdfs --create-references

Run python litorganizer.py --help for all available options.

⚙️ Configuration

API settings can be managed on the Settings page or by editing config/api_keys.json.

API	Status	Requires
Crossref	✅ Enabled	—
OpenAlex	✅ Enabled	Email
DataCite	✅ Enabled	—
Europe PMC	✅ Enabled	—
Semantic Scholar	✅ Enabled	—
Scopus	⬚ Optional	API Key
Unpaywall	⬚ Optional	Email
Google Gemini AI	⬚ Optional	API Key

🤖 Enable Gemini AI

Open the Settings page in LitOrganizer
Toggle Google Gemini Flash on
Enter your free API key from Google AI Studio
Save — Gemini AI will be used as fallback when DOI extraction fails

📖 Documentation

For detailed usage instructions, see the User Guide which covers:

Topic	Description
🔄 Naming Pipeline	How metadata is extracted and files are renamed
🤖 Gemini AI Setup	Configuration and usage of the AI fallback
🔎 Keyword Search	Regex examples and export options
📁 Output Structure	How files are organized into folders
⚙️ API Reference	Available APIs and configuration

💡 In-App Guide: After launching, click Guide in the navigation menu for interactive documentation.

🛠️ Tech Stack

Layer	Technologies
Backend	Python · Flask · Flask-SocketIO · PyMuPDF · pdfplumber
AI	Google Gemini Flash 2.0 API
Frontend	Tailwind CSS · Socket.IO Client · SVG Progress Rings · Native OS Dialog
Data Export	pandas · openpyxl · python-docx

🗺️ Roadmap

📄 Citation

If you use LitOrganizer in your research, please cite:

Şahin, A., Kara, B. C., & Dirsehan, T. (2025). LitOrganizer: Automating the process of data extraction and organization for scientific literature reviews. SoftwareX, 30, 102198. https://doi.org/10.1016/j.softx.2025.102198

BibTeX

@article{sahin2025litorganizer,
  title     = {LitOrganizer: Automating the process of data extraction and organization for scientific literature reviews},
  author    = {Şahin, Alperen and Kara, Burak Can and Dirsehan, Taşkın},
  journal   = {SoftwareX},
  volume    = {30},
  pages     = {102198},
  year      = {2025},
  publisher = {Elsevier},
  doi       = {10.1016/j.softx.2025.102198}
}

APA 7th Edition

Şahin, A., Kara, B. C., & Dirsehan, T. (2025). LitOrganizer: Automating the process of data
extraction and organization for scientific literature reviews. SoftwareX, 30, 102198.
https://doi.org/10.1016/j.softx.2025.102198

RIS

TY  - JOUR
TI  - LitOrganizer: Automating the process of data extraction and organization for scientific literature reviews
AU  - Şahin, Alperen
AU  - Kara, Burak Can
AU  - Dirsehan, Taşkın
JO  - SoftwareX
VL  - 30
SP  - 102198
PY  - 2025
SN  - 2352-7110
DO  - 10.1016/j.softx.2025.102198
UR  - https://www.sciencedirect.com/science/article/pii/S2352711025001657
ER  -

📋 Changelog

v2.0.0 — AI-Powered Web Application (Latest)

Major Release: Complete redesign from PyQt5 desktop app to Flask + Socket.IO web application with Google Gemini AI integration.

✅ Added

Google Gemini AI integration with real-time status panel
Modern web interface with Tailwind CSS
WebSocket-powered live progress tracking with circular progress rings
Native OS folder picker with quick access shortcuts
Multi-stage DOI fallback pipeline
Global activity panel & completion modal
Comprehensive usage guide page
Search export to Word/Excel with highlights

🔧 Fixed

Backup system file copy scope issue
Cross-platform path separator in "Open Folder"
Statistics persistence across page navigation
Progress ring synchronization

🔄 Changed

Architecture: PyQt5 → Flask + Socket.IO
Default AI-named files go to Named Article/ (configurable)
Native OS dialog replaces drag-and-drop zone
Python requirement broadened to 3.10+

🗑️ Removed

PyQt5 desktop GUI & modules/gui/ directory
--gui CLI argument
Drag & drop directory selection
Heuristic regex-based content extraction

v1.x — Desktop Application (Legacy)

PyQt5-based desktop GUI with tabbed interface
Basic progress bar
Local-only operation

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository
2. Create your feature branch    →  git checkout -b feature/AmazingFeature
3. Commit your changes           →  git commit -m 'Add AmazingFeature'
4. Push to the branch            →  git push origin feature/AmazingFeature
5. Open a Pull Request

📬 Contact & Support

_{Made with ❤️ for the academic community}

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
Empirical_Validation_Results		Empirical_Validation_Results
config		config
documents		documents
modules		modules
resources		resources
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
LitOrganizer_v2_NoDOI_Test_Sample.pdf		LitOrganizer_v2_NoDOI_Test_Sample.pdf
README.md		README.md
Start LitOrganizer.command		Start LitOrganizer.command
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
litorganizer.py		litorganizer.py
requirements.txt		requirements.txt
start_litorganizer.bat		start_litorganizer.bat
start_litorganizer.sh		start_litorganizer.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📌 What is LitOrganizer?

📸 Screenshots

✨ Key Features

🔍 Smart Metadata Extraction

🤖 Google Gemini AI Fallback

📝 Citation-Based Renaming

🔎 Full-Text Search

📊 Real-Time Web Interface

📋 Reference Generation

🔬 How It Works

🚀 Quick Start

⚙️ Configuration

📖 Documentation

🛠️ Tech Stack

🗺️ Roadmap

📄 Citation

📋 Changelog

✅ Added

🔧 Fixed

🔄 Changed

🗑️ Removed

🤝 Contributing

📬 Contact & Support

About

Uh oh!

Releases 4

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📌 What is LitOrganizer?

📸 Screenshots

✨ Key Features

🔍 Smart Metadata Extraction

🤖 Google Gemini AI Fallback

📝 Citation-Based Renaming

🔎 Full-Text Search

📊 Real-Time Web Interface

📋 Reference Generation

🔬 How It Works

🚀 Quick Start

⚙️ Configuration

📖 Documentation

🛠️ Tech Stack

🗺️ Roadmap

📄 Citation

📋 Changelog

✅ Added

🔧 Fixed

🔄 Changed

🗑️ Removed

🤝 Contributing

📬 Contact & Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages