ChatbotGDG

Project Submission repository

Codeforces Problem Scraper

Overview

This project is a web scraper designed to extract problems, metadata, and editorials from Codeforces contests. The scraped data is organized and saved a structured format for easy reference .

The scraper performs the following tasks:

Retrieves links to contests from the Codeforces contests page.
Extracts problem statements, input/output examples, and metadata (such as tags, time limits, and memory limits) from individual problems.
Scrapes and saves editorials (if available) for each problem.

Features

Scrape Contest Links: Extracts links to contests from the Codeforces contests page.
Scrape Problem Statements: Downloads problem statements, input/output examples, and stores them in .txt files.
Save Metadata: Extracts metadata like time limits, memory limits, and tags and saves them in .json files.
Scrape Editorials: Retrieves editorials (if available) and saves them as .txt files.
Organized Storage: All problems and editorials are saved in their respective folders with descriptive filenames.

Explanation of Functions in `scrap.py`

sanitize_filename(filename):
- Cleans up a string to make it safe for use as a filename by replacing invalid characters with underscores.
scrape_problem(Link):
- Navigates to a problem's page on Codeforces.
- Scrapes the problem title, time and memory limits, tags, and problem statement with test cases.
- Saves the problem statement as a .txt file and metadata as a .json file.
get_editorial_link(e):
- Finds the editorial link for a given problem (if available).
- Returns the URL of the editorial or 0 if no editorial exists.
scrape_editorial(q, title):
- Navigates to the editorial page using the link provided.
- Scrapes the content of the editorial and saves it as a .txt file.
get_contests_link(l):
- Navigates to the Codeforces contests page.
- Extracts and returns a list of contest links based on the filters applied.
Main Logic:
- Retrieves contest links using get_contests_link().
- Iterates through contests, extracting problems and their metadata using scrape_problem().
- Retrieves and saves editorials for problems using get_editorial_link() and scrape_editorial().

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Editorials		Editorials
problems		problems
Contests_scrapped.txt		Contests_scrapped.txt
README.md		README.md
requirements.txt		requirements.txt
scrap.py		scrap.py
tempCodeRunnerFile.py		tempCodeRunnerFile.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChatbotGDG

Codeforces Problem Scraper

Overview

Features

Explanation of Functions in `scrap.py`

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ChatbotGDG

Codeforces Problem Scraper

Overview

Features

Explanation of Functions in scrap.py

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Explanation of Functions in `scrap.py`

Packages