Intelligent duplicate download detection and prevention system for browsers and file systems
ReDUCE is a comprehensive solution that automatically detects and prevents redundant file downloads across your system, saving bandwidth, storage space, and time through intelligent metadata tracking and cross-platform file monitoring.
- Overview
- System Architecture
- Components
- Features
- Prerequisites
- Quick Start
- How It Works
- Configuration
- API Reference
- Project Structure
- Troubleshooting
- Contributing
ReDUCE is an integrated system that prevents duplicate downloads by tracking file metadata across your browser and file system. It consists of three interconnected components:
- Browser Extension - Monitors and intercepts download requests
- Metadata Server - Centralized duplicate detection and record management
- CLI Utility Tool - File system monitoring and metadata synchronization
Users often download the same files multiple times, whether accidentally or unknowingly. This leads to:
- β Wasted bandwidth
- β Redundant storage consumption
- β Cluttered download folders
- β Time spent managing duplicate files
ReDUCE automatically:
- β Detects duplicate downloads before they start
- β Prevents redundant file transfers
- β Tracks download metadata intelligently
- β Synchronizes with file system changes
- β Provides statistics on saved bandwidth and storage
graph TB
subgraph Browser["π Browser Environment"]
BE[Browser Extension]
end
subgraph Server["π₯οΈ Local Server"]
MS[Metadata Server<br/>Flask API :5050]
DB[(SQLite Database<br/>downloads.db)]
end
subgraph FileSystem["π File System"]
FS[Monitored Files]
CLI[CLI Utility Tool<br/>File Monitor]
end
BE -->|1. Download Request<br/>+ Metadata| MS
MS -->|2. Check Duplicates| DB
MS -->|3. Action Response<br/>Cancel/Proceed| BE
MS -->|4. Store Record| DB
CLI -->|5. Monitor Files| FS
CLI -->|6. Sync Deletions| MS
FS -.->|File Changes| CLI
style BE fill:#4A90E2
style MS fill:#50C878
style DB fill:#FFB347
style CLI fill:#9B59B6
style FS fill:#E8E8E8
- Download Initiation: User starts a download in the browser
- Metadata Extraction: Extension extracts file metadata (URL, size, content type, hash)
- Duplicate Check: Server queries database for matching files
- Decision: Server responds with action (cancel if duplicate, proceed if new)
- File Monitoring: CLI tool watches file system for changes
- Synchronization: Deleted files trigger database cleanup
Location: reduce-New-Extension/
A Chromium-based browser extension (Manifest V3) that:
- Monitors download events in real-time
- Extracts comprehensive metadata before downloads start
- Communicates with the Metadata Server via REST API
- Cancels duplicate downloads automatically
- Provides a popup interface for viewing download history
Technologies: JavaScript (ES6 Modules), Chrome Extensions API, Service Workers
π Detailed Documentation β
Location: reduce-Internal-Metadata-Server/
A Flask-based REST API server that:
- Manages SQLite database of download records
- Implements duplicate detection algorithms
- Provides RESTful endpoints for download management
- Tracks statistics (bandwidth saved, duplicates prevented)
- Handles device-specific download records
Technologies: Python 3, Flask, SQLite
π Detailed Documentation β
Location: reduce-CLI-Utility-Tool/
Two command-line utilities for file system monitoring and download management:
Cross-platform file system monitoring that:
- Monitors files with ReDUCE metadata
- Uses platform-specific metadata storage (ADS on Windows, xattr on Linux/macOS)
- Automatically syncs file deletions with the server
- Collects device information for tracking
- Runs as background service
Command-line interface for download commands:
- Wraps
wgetandcurlcommands - Executes Python and Bash download scripts
- Automatic metadata extraction
- Server communication for duplicate checking
Technologies: Python 3, Watchdog, pyxattr (Linux/macOS)
π Detailed Documentation β
-
π Intelligent Duplicate Detection
- Hash-based file identification
- Metadata comparison (filename, size, URL, content-type)
- Per-device tracking
-
π Cross-Platform Support
- Windows (Alternate Data Streams)
- Linux (Extended Attributes)
- macOS (Extended Attributes)
-
β‘ Real-Time Monitoring
- File system event tracking
- Instant download interception
- Automatic database synchronization
-
π Statistics & Insights
- Bandwidth saved tracker
- Storage optimization metrics
- Download history analytics
-
π Extensible Architecture
- RESTful API for integrations
- Modular component design
- Clear separation of concerns
- Chrome, Edge, Brave, or any Chromium-based browser
- Developer mode enabled (for manual installation)
- Python: 3.7 or higher
- pip: Python package manager
- Dependencies:
- Flask
- (See
reduce-Internal-Metadata-Server/requirements.txt)
- Python: 3.7 or higher
- pip: Python package manager
- Dependencies:
- requests
- watchdog
- pyxattr (Linux/macOS only)
- pyinstaller (for building standalone executables)
# Navigate to server directory
cd reduce-Internal-Metadata-Server
# Install dependencies
pip install -r requirements.txt
# Start the server (runs on port 5050)
python main.pyThe server will start at http://127.0.0.1:5050 and initialize the SQLite database.
# Navigate to file monitoring service
cd reduce-CLI-Utility-Tool/file-monitoring-service
# Install dependencies
pip install -r requirements.txt
# Run the file monitor
python file_monitor.pyThe file monitoring service will:
- Initialize cache from existing files with metadata
- Start monitoring the file system (C:\ on Windows, ~/ on Linux/macOS)
- Begin syncing with the Metadata Server
- Open your Chromium-based browser
- Navigate to
chrome://extensions/(oredge://extensions/for Edge) - Enable Developer mode (toggle in top-right corner)
- Click "Load unpacked"
- Select the
reduce-New-Extensionfolder - The extension icon should appear in your toolbar
- Try downloading a file from the internet
- The download should proceed normally
- Try downloading the same file again
- The extension should detect the duplicate and cancel the download
- Check the extension popup to view download history
sequenceDiagram
participant User
participant Browser
participant Extension
participant Server
participant Database
participant FileSystem
participant CLI
User->>Browser: Initiates Download
Browser->>Extension: onDeterminingFilename Event
Extension->>Extension: Extract Metadata<br/>(URL, size, hash, etc.)
Extension->>Extension: Get Device Info
Extension->>Server: POST /process_download
Server->>Database: Query for Duplicates
alt Duplicate Found
Database-->>Server: Match Found
Server-->>Extension: Action: 1 (Cancel)
Extension->>Browser: Cancel Download
Extension->>User: Notification: Duplicate Detected
else No Duplicate
Database-->>Server: No Match
Server->>Database: Insert New Record
Server-->>Extension: Action: 0 (Proceed)
Extension->>Browser: Allow Download
Browser->>FileSystem: Save File
end
CLI->>FileSystem: Monitor Changes
alt File Deleted
FileSystem-->>CLI: File Deleted Event
CLI->>Server: POST /delete_record
Server->>Database: Remove Record
end
- Metadata Hashing: Files are identified using SHA-1 hash of (filename + content-length)
- Partial Hash Verification: Additional hash stored in file metadata for verification
- Device Tracking: Downloads are tracked per device using MAC address and device ID
- Platform-Specific Metadata:
- Windows: Alternate Data Streams (
:file_hash_check_parts) - Linux/macOS: Extended Attributes (
user.file_hash_check_parts)
- Windows: Alternate Data Streams (
File: reduce-Internal-Metadata-Server/main.py
# Default configuration
PORT = 5050
DATABASE = 'downloads.db'
DEBUG_MODE = TrueChange Port:
if __name__ == '__main__':
app.run(port=5050, debug=True) # Change port hereFile: reduce-CLI-Utility-Tool/file-monitoring-service/file_monitor.py
# Default monitored paths
# Windows: C:\
# Linux/macOS: ~/
# Server endpoint
SERVER_URL = "http://127.0.0.1:5050"Change Monitored Path (Lines 312-315):
if is_windows():
path_to_monitor = "C:\\" # Change for Windows
else:
path_to_monitor = os.path.expanduser("~") # Change for Linux/macOSFile: reduce-New-Extension/utils/config.js (if exists) or modify in code
The extension connects to http://127.0.0.1:5050 by default.
| Endpoint | Method | Description |
|---|---|---|
/process_download |
POST | Process new download request and check for duplicates |
/delete_record |
POST | Delete download record by partial hash |
/get_all_downloads |
GET | Retrieve all download records |
/cancelled_download_stats |
GET | Get statistics on cancelled downloads |
/completed_download_stats |
GET | Get statistics on completed downloads |
/device_info |
GET | Get current device information |
Request:
POST http://127.0.0.1:5050/process_download
Content-Type: application/json
{
"id": "download-id",
"data": {
"download_meta_data": {
"url": "https://example.com/file.zip",
"finalUrl": "https://cdn.example.com/file.zip",
"referrer": "https://example.com"
},
"fetched_complete_metadata": {
"content-length": "1048576",
"content-type": "application/zip",
"etag": "abc123"
},
"downloadFileNameDomainUrlDetails": {
"downloadFileName": "file.zip",
"domain": "cdn.example.com"
},
"partial_hash": "abc123def456",
"device_info": {
"device_id": "device-uuid",
"device_name": "laptop",
"current_user": "user",
"mac_address": "00:11:22:33:44:55"
}
}
}Response:
{
"action": 0 // 0 = proceed, 1 = cancel
}π Full API Documentation β
ReDUCE/
βββ README.md # This file
βββ .gitignore
β
βββ reduce-New-Extension/ # Browser Extension
β βββ manifest.json # Extension manifest (V3)
β βββ background/
β β βββ index.js # Service worker entry
β β βββ observer.js # Download event observer
β βββ content/
β β βββ content.js # Content scripts
β βββ pages/
β β βββ popup.html # Extension popup UI
β βββ utils/ # Utility functions
β βββ icons/ # Extension icons
β
βββ reduce-Internal-Metadata-Server/ # Flask API Server
β βββ main.py # Flask application & routes
β βββ model.py # Database models & queries
β βββ a.py # Additional utilities
β βββ requirements.txt # Python dependencies
β βββ downloads.db # SQLite database (created at runtime)
β βββ README.md # Server documentation
β
βββ reduce-CLI-Utility-Tool/ # CLI Utilities
βββ README.md # Overview of both tools
βββ file-monitoring-service/ # File system monitor
β βββ file_monitor.py # Main monitoring script
β βββ metadata_checker.py # Metadata utility
β βββ requirements.txt # Dependencies
β βββ README.md # Monitoring docs
βββ cli-download-wrapper/ # Download command wrapper
βββ reduce.py # CLI entry point
βββ handlers/ # Command handlers
βββ download_logic/ # Processing logic
βββ utils/ # Utilities
βββ requirements.txt # Dependencies
βββ README.md # CLI wrapper docs
Symptom: Extension can't connect to server, CLI tool shows connection errors
Solutions:
- Verify server is running:
python main.pyin the server directory - Check the server is listening on port 5050
- Ensure no firewall is blocking port 5050
- Check server logs for errors
Symptom: Downloads aren't being intercepted
Solutions:
- Verify extension is installed and enabled in
chrome://extensions/ - Check extension popup shows no errors
- Reload the extension (click reload button in extensions page)
- Check browser console for errors (F12 β Console)
- Ensure server is running and accessible
Symptom: ModuleNotFoundError: No module named 'xattr'
Solution:
pip install pyxattrSymptom: Access denied when monitoring certain directories
Solutions:
- Windows: Run as Administrator
- Linux/macOS: Use
sudoor adjust monitored path to user directory
Symptom: sqlite3.OperationalError: database is locked
Solutions:
- Ensure only one instance of the server is running
- Close any database browser tools that might have the DB open
- Restart the server
Symptom: Duplicate downloads aren't being cancelled
Solutions:
- Check server logs to see if the download was detected as a duplicate
- Verify the original file still has metadata attached
- Check if the file hash matches (device-specific tracking)
- Ensure the extension has permission to cancel downloads
We welcome contributions! Here's how you can help:
- Use the GitHub Issues tab
- Provide detailed error messages and logs
- Include your OS and Python/browser versions
- Describe steps to reproduce the issue
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Test thoroughly across platforms
- Commit with clear messages (
git commit -m 'Add amazing feature') - Push to your fork (
git push origin feature/amazing-feature) - Open a Pull Request
- Python: Follow PEP 8 guidelines
- JavaScript: Use ES6+ features, consistent naming
- Add comments for complex logic
- Update documentation for new features
License information to be determined
Project Maintainers: ReDUCE Development Team
Contributors: See contributor list on GitHub
For questions, issues, or feature requests, please:
- π§ Open an issue on GitHub
- π Check component-specific README files
- π¬ Review troubleshooting section
Built with β€οΈ for efficient download management
ReDUCE: Because you shouldn't download the same file twice