Skip to content

4MaxR/Customs-github

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Egyptian Customs Scraper Toolkit

A public, portfolio-friendly Python scraping project for collecting and structuring publicly accessible Egyptian Customs tariff and legislation pages from customs.gov.eg.

This repository is intentionally code-first: it includes scraper source, helper utilities, lightweight checks, and a tiny sanitized sample output. Full scraped datasets, generated JSON exports, logs, archives, browser caches, and local settings are excluded.

Features

  • Scrape Egyptian Customs tariff / HS code pages by chapter.
  • Scrape legislation and circular listings from public Egyptian Customs pages.
  • Extract and enrich tariff details with Playwright-powered browser automation.
  • Provide helper scripts for AJAX inspection, pagination checks, ID extraction, and PDF/HTML matching workflows.
  • Include lightweight test and debugging scripts for validating scraper structure and page behavior.

Project Structure

src/           Scraper and helper source files
tests/         Lightweight checks and test scripts
sample_data/   Tiny sanitized example output
web/           Reserved for optional public-safe demos

Setup

Create and activate a virtual environment:

python -m venv .venv
.\.venv\Scripts\Activate.ps1

Install Python dependencies:

pip install -r requirements.txt

Install Playwright browser binaries:

playwright install

Example Usage

Run the scrapers from the repository root:

python .\src\scrape_all_chapters.py
python .\src\scrape_customs.py
python .\src\scrape_legislations.py

Some scripts may create local output such as JSON files, per-chapter data folders, or logs. These outputs are ignored by Git and are not part of the public repository.

Output

Generated scraper outputs are local-only by default. The repository includes only sample_data/sample_output.json, a tiny sanitized sample that demonstrates the expected shape of output records without publishing the full scraped dataset.

Source Attribution

The scraper references publicly accessible pages from the Egyptian Customs website:

Legal Disclaimer

This is an unofficial educational and research project intended as a data-engineering and web-scraping portfolio showcase. It is not an official data source, government service, or commercial customs platform.

The scraper utilities interact with publicly accessible pages of the Egyptian Customs website: https://customs.gov.eg

This repository intentionally excludes:

full scraped datasets generated exports logs archives cached data

Only source code, helper utilities, lightweight tests, and minimal example outputs are included.

Users are solely responsible for ensuring compliance with all applicable laws, website terms of use, robots.txt policies, rate limits, and data usage requirements. Please use respectful request rates and avoid disrupting public services or infrastructure.

Non-Affiliation Statement

This project is not affiliated with, endorsed by, sponsored by, or officially connected to the Egyptian Customs Authority or any government entity.

No government logos, official branding, or complete scraped databases are included in this repository.

License

Released under the MIT License. See the LICENSE file for details.

About

Python + Playwright scraping toolkit for Egyptian Customs tariff and legislation research.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages