LinkGuardian

A powerful asynchronous website crawler and link checker that helps you identify broken links, orphaned pages, and analyze your website's link structure.

Created by Farhan Ansari

Features

🔄 Asynchronous crawling for faster performance
🌐 Cross-platform support (Windows, macOS, Linux)
🎨 Beautiful terminal output with color coding
📊 Link analysis and reporting
🔍 Smart caching system for efficient crawling
🛡️ Rate limiting and robots.txt compliance
📝 CSV reports for broken and all links
🔒 SSL/TLS support
🎯 Configurable crawl depth and page limits

Prerequisites

Python 3.8 or higher
pip (Python package installer)

Installation

Clone the repository:

git clone https://github.com/fxrhan/LinkGuardian.git
cd LinkGuardian

Install dependencies:

pip install -r requirements.txt

Usage

Basic Usage

python linkcheck.py --url https://example.com

Advanced Options

python linkcheck.py --url https://example.com --workers 20 --rate 0.5 --max-pages 200 --max-depth 4 --timeout 15 --check-external --ignore-robots

Command Line Arguments

Argument	Default	Description
`--url`	(required)	Base URL to crawl (must start with `http://` or `https://`)
`--workers`	`10`	Number of concurrent workers
`--rate`	`0.5`	Seconds to wait between requests per worker
`--max-pages`	`100`	Maximum number of pages to crawl
`--max-depth`	`3`	Maximum crawl depth
`--timeout`	`30`	Request timeout in seconds
`--cache-dir`		Custom directory for cache files
`--output-dir`		Custom directory for CSV reports
`--ignore-robots`		Ignore robots.txt rules
`--no-verify-ssl`		Disable SSL certificate verification (for self-signed certs)
`--check-external`		Verify external links via HEAD requests (not crawled further)
`--version`		Print version and exit

Output Structure

The tool creates a .linkguardian directory in your home folder with the following structure:

~/.linkguardian/
├── cache/          # Cache files for each domain
├── logs/           # Log files
└── output/         # Crawl results
    └── {domain}_{timestamp}/
        ├── broken_links.csv
        └── all_links.csv

Cache System

The tool implements a smart caching system that:

Stores visited pages and checked links
Handles JSON serialization of complex data types
Automatically manages cache files per domain
Preserves crawl progress between sessions

Error Handling

The tool includes comprehensive error handling for:

Network connectivity issues
SSL/TLS certificate problems
Timeout errors
HTTP errors
JSON serialization errors
Platform-specific path issues
Keyboard interrupts

Output Files

broken_links.csv

Contains information about broken links:

Broken Link URL
Source Page URL
Status Code
Error Category
Timestamp

all_links.csv

Contains information about all discovered links:

Link URL
Source Page URL
Status Code
Link Type (Internal/External)
Depth
Is Orphaned
Timestamp

Error Categories

The tool categorizes errors into the following types:

Connection errors
Timeout errors
SSL/TLS errors
HTTP errors
Parsing errors
Validation errors
Unknown errors

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

If you encounter any issues or have questions, please open an issue on the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
linkcheck.py		linkcheck.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LinkGuardian

Features

Prerequisites

Installation

Usage

Basic Usage

Advanced Options

Command Line Arguments

Output Structure

Cache System

Error Handling

Output Files

broken_links.csv

all_links.csv

Error Categories

Contributing

License

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LinkGuardian

Features

Prerequisites

Installation

Usage

Basic Usage

Advanced Options

Command Line Arguments

Output Structure

Cache System

Error Handling

Output Files

broken_links.csv

all_links.csv

Error Categories

Contributing

License

Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages