Skip to content

bcankara/LitOrganizer

Repository files navigation

LitOrganizer Logo



Automated Academic PDF Organization & Search — Powered by AI


Version Python License: MIT Platform Docker


DOI SoftwareX SCI-E GitHub stars GitHub issues


Published in SoftwareX (Elsevier) · Science Citation Index Expanded (SCI-E)


OverviewScreenshotsFeaturesPipelineQuick StartDocsCitation


📌 What is LitOrganizer?

LitOrganizer is a free, open-source tool that automatically organizes academic PDF collections. It extracts metadata via DOI lookup, queries multiple academic APIs, and leverages Google Gemini AI as an intelligent fallback — then renames files using citation standards, categorizes them, and provides full-text search through a modern web interface.

The Problem: Researchers accumulate hundreds of PDFs with cryptic filenames like 1234567.pdf, paper_final_v3.pdf, or download(2).pdf. Finding the right paper becomes a nightmare.

The Solution: LitOrganizer automatically renames them to (Smith, 2024) - Machine Learning in Healthcare.pdf and organizes them into folders by journal, author, or year.


📸 Screenshots

Processing Page
PDF Processing — Real-time progress with Gemini AI panel
Statistics Dashboard
Statistics Dashboard — Performance & accuracy analytics
Completion Modal
Processing Complete — Summary with success rate
Keyword Search
Full-Text Search — Search across all PDFs with export

✨ Key Features

🔍 Smart Metadata Extraction

Automatically detects DOIs from PDF text and queries 7+ academic APIs simultaneously for accurate metadata:

Crossref · OpenAlex · DataCite · Europe PMC · Semantic Scholar · Scopus · Unpaywall

🤖 Google Gemini AI Fallback

When DOI extraction fails, Gemini AI reads the PDF content and extracts title, authors, and year — then validates via Crossref.

Real-time AI status panel shows extraction progress.

📝 Citation-Based Renaming

Files are renamed using APA 7th edition format:

(Author, Year) - Title.pdf

Automatic folder categorization: journal · author · year · subject

🔎 Full-Text Search

Search across your entire PDF collection with:

  • Exact match & regex support
  • Sentence-level context highlighting
  • Export results to Word or Excel

📊 Real-Time Web Interface

  • WebSocket-powered live progress with animated rings
  • Native OS folder picker dialog
  • Statistics dashboard with performance metrics

📋 Reference Generation

  • Auto-generated bibliography of all processed papers
  • Publication analytics by author, journal & year
  • Detailed error diagnostics for problematic files

🔬 How It Works

LitOrganizer uses a multi-stage pipeline to extract metadata and name your PDF files:

flowchart LR
    A["📄 PDF File"] --> B{"DOI Found?"}
    B -- Yes --> C["🔗 Query Academic APIs"]
    C --> D["✅ Named Article/"]
    B -- No --> E{"Gemini AI\nEnabled?"}
    E -- Yes --> F["🤖 AI Extraction\n(Title, Authors, Year)"]
    F --> G{"Validated via\nCrossref?"}
    G -- Yes --> D
    G -- No --> H["📁 AI Named Content/\n(if separate folder)"]
    E -- No --> I["❓ Unnamed Article/"]
    G -- Fail --> I
Loading

Output directory structure:

your_pdf_folder/
├── Named Article/          ← DOI + API verified or Gemini AI validated
├── AI Named Content/       ← Gemini AI named (optional separate folder)
├── Unnamed Article/        ← No metadata found
└── backups/                ← Original file backups (if enabled)

🚀 Quick Start

The launcher scripts handle everything automatically — Python check, virtual environment, dependencies, and server startup.

🪟 Windows
  1. Download or clone the repository
  2. Double-click start_litorganizer.bat
  3. Browser opens automatically at http://localhost:5000
🍎 macOS
git clone https://github.com/bcankara/LitOrganizer.git
cd LitOrganizer
chmod +x start_litorganizer.sh "Start LitOrganizer.command"

Option A: Double-click Start LitOrganizer.command in Finder Option B: Run ./start_litorganizer.sh in Terminal

Note: If downloaded as ZIP, remove quarantine first: xattr -cr .

🐧 Linux
git clone https://github.com/bcankara/LitOrganizer.git
cd LitOrganizer
chmod +x start_litorganizer.sh
./start_litorganizer.sh
🛠 Manual Installation
# Clone & setup
git clone https://github.com/bcankara/LitOrganizer.git
cd LitOrganizer

# Create & activate virtual environment
python3 -m venv .venv
source .venv/bin/activate        # macOS / Linux
# .venv\Scripts\activate         # Windows

# Install & run
pip install -r requirements.txt
python litorganizer.py
🐳 Docker
# Quick start — mount your PDF folder and open http://localhost:5000
docker run -d -p 5000:5000 -v $(pwd)/pdfs:/app/pdf bcankara/litorganizer:v2

Or with Docker Compose:

# docker-compose.yml is included in the repo
docker compose up -d

Open your browser at http://localhost:5000

To persist your API key settings, also mount the config volume:

docker run -d -p 5000:5000 \
  -v $(pwd)/pdfs:/app/pdf \
  -v $(pwd)/config:/app/config \
  bcankara/litorganizer:v2
⌨️ Command Line Mode
python litorganizer.py -d /path/to/pdfs --create-references

Run python litorganizer.py --help for all available options.


⚙️ Configuration

API settings can be managed on the Settings page or by editing config/api_keys.json.

API Status Requires
Crossref ✅ Enabled
OpenAlex ✅ Enabled Email
DataCite ✅ Enabled
Europe PMC ✅ Enabled
Semantic Scholar ✅ Enabled
Scopus ⬚ Optional API Key
Unpaywall ⬚ Optional Email
Google Gemini AI ⬚ Optional API Key
🤖 Enable Gemini AI
  1. Open the Settings page in LitOrganizer
  2. Toggle Google Gemini Flash on
  3. Enter your free API key from Google AI Studio
  4. Save — Gemini AI will be used as fallback when DOI extraction fails

📖 Documentation

For detailed usage instructions, see the User Guide which covers:

Topic Description
🔄 Naming Pipeline How metadata is extracted and files are renamed
🤖 Gemini AI Setup Configuration and usage of the AI fallback
🔎 Keyword Search Regex examples and export options
📁 Output Structure How files are organized into folders
⚙️ API Reference Available APIs and configuration

💡 In-App Guide: After launching, click Guide in the navigation menu for interactive documentation.


🛠️ Tech Stack

Layer Technologies
Backend Python · Flask · Flask-SocketIO · PyMuPDF · pdfplumber
AI Google Gemini Flash 2.0 API
Frontend Tailwind CSS · Socket.IO Client · SVG Progress Rings · Native OS Dialog
Data Export pandas · openpyxl · python-docx

🗺️ Roadmap

  • Modern web interface with real-time updates
  • DOI fallback with Crossref title search
  • Google Gemini AI integration
  • Native OS folder picker
  • Built-in usage guide
  • Full-text search with Word/Excel export
  • Batch export in BibTeX / RIS format
  • Docker support
  • Dark mode

📄 Citation

If you use LitOrganizer in your research, please cite:

Şahin, A., Kara, B. C., & Dirsehan, T. (2025). LitOrganizer: Automating the process of data extraction and organization for scientific literature reviews. SoftwareX, 30, 102198. https://doi.org/10.1016/j.softx.2025.102198

BibTeX
@article{sahin2025litorganizer,
  title     = {LitOrganizer: Automating the process of data extraction and organization for scientific literature reviews},
  author    = {Şahin, Alperen and Kara, Burak Can and Dirsehan, Taşkın},
  journal   = {SoftwareX},
  volume    = {30},
  pages     = {102198},
  year      = {2025},
  publisher = {Elsevier},
  doi       = {10.1016/j.softx.2025.102198}
}
APA 7th Edition
Şahin, A., Kara, B. C., & Dirsehan, T. (2025). LitOrganizer: Automating the process of data
extraction and organization for scientific literature reviews. SoftwareX, 30, 102198.
https://doi.org/10.1016/j.softx.2025.102198
RIS
TY  - JOUR
TI  - LitOrganizer: Automating the process of data extraction and organization for scientific literature reviews
AU  - Şahin, Alperen
AU  - Kara, Burak Can
AU  - Dirsehan, Taşkın
JO  - SoftwareX
VL  - 30
SP  - 102198
PY  - 2025
SN  - 2352-7110
DO  - 10.1016/j.softx.2025.102198
UR  - https://www.sciencedirect.com/science/article/pii/S2352711025001657
ER  -

📋 Changelog

v2.0.0 — AI-Powered Web Application (Latest)

Major Release: Complete redesign from PyQt5 desktop app to Flask + Socket.IO web application with Google Gemini AI integration.

✅ Added

  • Google Gemini AI integration with real-time status panel
  • Modern web interface with Tailwind CSS
  • WebSocket-powered live progress tracking with circular progress rings
  • Native OS folder picker with quick access shortcuts
  • Multi-stage DOI fallback pipeline
  • Global activity panel & completion modal
  • Comprehensive usage guide page
  • Search export to Word/Excel with highlights

🔧 Fixed

  • Backup system file copy scope issue
  • Cross-platform path separator in "Open Folder"
  • Statistics persistence across page navigation
  • Progress ring synchronization

🔄 Changed

  • Architecture: PyQt5 → Flask + Socket.IO
  • Default AI-named files go to Named Article/ (configurable)
  • Native OS dialog replaces drag-and-drop zone
  • Python requirement broadened to 3.10+

🗑️ Removed

  • PyQt5 desktop GUI & modules/gui/ directory
  • --gui CLI argument
  • Drag & drop directory selection
  • Heuristic regex-based content extraction
v1.x — Desktop Application (Legacy)
  • PyQt5-based desktop GUI with tabbed interface
  • Basic progress bar
  • Local-only operation

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository
2. Create your feature branch    →  git checkout -b feature/AmazingFeature
3. Commit your changes           →  git commit -m 'Add AmazingFeature'
4. Push to the branch            →  git push origin feature/AmazingFeature
5. Open a Pull Request

📬 Contact & Support

Issues Discussions


Stars   Forks

Made with ❤️ for the academic community

About

LitOrganizer is a powerful tool designed for researchers, academics, and students to organize their PDF literature collections automatically. It extracts metadata from academic papers, renames files according to citation standards, categorizes them into a logical directory structure, and provides powerful search capabilities.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors