This document provides a detailed guide to setting up, running, and understanding the Codeforces Problem Scraper program, a Python script that uses Selenium to extract problem statements and metadata from Codeforces.
- Extracts problem titles and statements from Codeforces.
- Saves problem data in a text file (
problem_statement.txt). - Stores metadata (e.g., problem title) in a JSON file (
metadata.json). - Programmed to run in Google Colab using a virtual display.
The setup_virtual_display function uses pyvirtualdisplay to create a virtual screen for headless browsing. This is needed in Google Colab where no graphical interface is available.
The init_firefox_driver function sets up the Firefox web driver in headless mode (no visible browser window).
The scrape_problem function:
- Opens the Codeforces problem URL.
- Extracts the problem title and statement using Selenium.
Two sub-functions save the scraped data:
store_problem_statement: Saves the problem statement in a text file.store_metadata: Saves the problem title in a JSON file.
The program:
- Starts a virtual display.
- Initializes the web driver.
- Scrapes the problem from the provided URL.
- Saves the scraped data.
- Closes the driver and stops the virtual display.
This script uses:
- Selenium for web automation.
- pyvirtualdisplay and xvfb for virtual display management.
- Firefox as the browser for scraping.