Skip to content

swechchhapatel/Wikipedia-explorer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wikipedia Rabbit Hole Explorer

Python Supported Versions Core Dependencies Terminal UI License: MIT

An interactive Python terminal application that explores Wikipedia's knowledge graph via random discovery, category filtering, and real-time hyperlink web scraping.


✦ Features

  • Random Discovery: Browse any Wikipedia article chosen uniformly at random from across the entire encyclopedia.
  • Curated Discovery: Filter article selection by a user-chosen subject category, enabling focused exploration of a domain.
  • Rabbit Hole Traversal: Starting from a random article, each subsequent article is randomly selected from the hyperlinks found inside the current article. This creates a chain of semantically related pages.
  • Browser Integration: Seamlessly bridge the terminal and the web by pressing [o] at any prompt to open the currently viewed Wikipedia article in your system's default web browser.
  • Rich Terminal UI: The display layer uses raw ANSI escape sequences for 24-bit color and includes a Braille character spinner animation during HTTP requests.
  • Smart Parsing: The scraper automatically strips citation noise from the text and filters out non-article meta-pages.

✦ Architecture

The application is structured into three clearly separated layers:

  • Scraping Layer: Handles network requests using connection pooling and parses HTML.
  • Display Layer: Handles terminal rendering and dynamic text wrapping based on terminal width.
  • Mode Layer: Manages the three independent interactive mode loops.

✦ Tech Stack & Dependencies

The project relies heavily on the Python Standard Library to remain lightweight, alongside two core external dependencies for web scraping.

Library Source Purpose
requests PyPI HTTP GET, session management, redirect following
beautifulsoup4 PyPI HTML parsing, CSS selector queries, DOM navigation
re Built-in Regex to strip citation markers from scraped text
textwrap Built-in Dynamic word-wrapping based on terminal dimensions
urllib.parse Built-in Unquoting and parsing Wikipedia URL slugs for clean titles
webbrowser Built-in Hooking into the OS to open live articles in a native browser
shutil Built-in Detecting real-time terminal column width for rendering
random Built-in Uniform random selection of articles and hyperlinks

✦ Screenshots

Figure 1: Main Menu showing the three modes  Main Menu showing the three modes

Figure 2: Rabbit Hole mode showing traversal depth  Rabbit Hole mode showing traversal depth


✦ Installation & Usage

  1. Clone the repository:
git clone https://github.com/swechchhapatel/wikipedia-explorer.git
cd wikipedia-explorer
  1. Install the required dependencies:
pip install -r requirements.txt
  1. Run the application:
python wiki_explorer.py

✦ Contribution

Contributions are welcome and appreciated!

If you'd like to improve this project, please follow these steps:

  1. Fork the repository
  2. Create a new branch
    git checkout -b feature/your-feature-name
  3. Make changes and commit
    git commit -m "Add: your message"
  4. Push to your branch and open a Pull Request
  • Feel free to improve features, UI, or model performance.

Made with ❤️

About

An interactive Python terminal application that explores Wikipedia's knowledge graph via random discovery, category filtering, and real-time hyperlink web scraping

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages