SGBList — siberguvenlik.gov.tr blocklist generator

Fetches the full address index from https://siberguvenlik.gov.tr/api/address/index, keeps a database of every record with its original date, and regenerates time-windowed blocklists on every run.

Runs entirely on GitHub Actions — no server needed. A scheduled workflow fetches the data hourly and commits the refreshed lists back to the repo, so your firewall can pull them straight from raw GitHub URLs. A Docker setup is also included for optional self-hosting, but it is not required.

Output files (in `data/`)

Domains and IPs are kept in separate files:

Window	Domains	IPs
All time	`full-domains.txt`	`full-ips.txt`
Last 30 days	`days-30-domains.txt`	`days-30-ips.txt`
Last 60 days	`days-60-domains.txt`	`days-60-ips.txt`
Last 90 days	`days-90-domains.txt`	`days-90-ips.txt`
Last 120 days	`days-120-domains.txt`	`days-120-ips.txt`

Each file is one bare domain or IP per line — no quotes, no surrounding whitespace, LF line endings — sorted (domains alphabetically, IPs numerically) for clean diffs. Point your firewall (pfSense / OPNsense / MikroTik / ipset, etc.) at whichever files you need. (database.jsonl and _state.json are internal bookkeeping; the firewall should ignore them.)

data/database.jsonl is the source of truth (id, url, type, date). The .txt lists are derived from it plus the current clock on every run, so ageing is automatic: a record that turns 31 days old drops out of days-30-* but is still present in days-60-*, days-90-*, days-120-* and full-*. No entry is ever lost from the wider windows.

How it decides what to do

First run (no full-* lists yet): state is wiped and a full crawl of every page begins, with a randomised MIN_DELAY..MAX_DELAY pause between pages so the API is not hammered. The crawl is resumable and checkpointed. The GitHub Actions workflow is set to 5–12 s/page (≈2 days to seed everything); the code/Docker default is the more conservative 10–50 s.
After the full crawl completes: each run does a fast incremental update — only the newest pages are fetched until a record we already have is reached — then the lists are regenerated. The workflow runs this once per hour.
Periodically (every FULL_RESYNC_DAYS, default 7): instead of an incremental update, a fresh full crawl runs to detect entries the source has removed (see below).

Removals (delisting)

The incremental update only ever adds; it cannot see that the source dropped an entry. To catch removals, every FULL_RESYNC_DAYS (default 7) the fetcher re-crawls the whole index. Each record is stamped with the pass that last saw it, and when a full pass finishes, any record not seen during it is treated as removed at the source:

it is deleted from the database, so it disappears from all lists (full-* and every days-* window) on the next regeneration, and

it is appended to data/removed.log — one tab-separated line per removal:

2026-06-08T17:34:38	REMOVED	removed-x.com	type=domain	id=3	added=2026-06-01 10:00:00

Set FULL_RESYNC_DAYS=0 to disable periodic re-syncs (removals then won't be detected). removed.log is append-only history; the firewall ignores it.

⚠️ The index has ~475k records (~23,760 pages), including historical / backdated entries — the full crawl walks every page to the end, so nothing old is missed. On GitHub Actions this happens automatically across many runs: each run crawls for up to ~5 h, commits its progress, and the next run resumes where it left off until the crawl is complete. At the configured 5–12 s/page the initial seed takes ~2 days; after that, every hourly run is just a quick incremental update.

User-Agent

Requests use an ordinary browser User-Agent:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36

Override it with the USER_AGENT environment variable (in update-lists.yml or docker-compose.yml).

Optional: self-host with Docker

Not needed if you use GitHub Actions. Provided for running on your own server.

docker compose up -d --build
docker compose logs -f          # watch progress

Lists appear under ./data. Override pacing/behaviour via environment variables in docker-compose.yml:

Variable	Default	Meaning
`MIN_DELAY` / `MAX_DELAY`	`10` / `50`	seconds between pages during the full crawl
`INC_MIN_DELAY` / `INC_MAX_DELAY`	`3` / `10`	seconds between pages during incremental
`TIME_BUDGET_SECONDS`	`3300`	checkpoint the full crawl after this long
`FULL_RESYNC_DAYS`	`7`	re-crawl everything this often to detect removals (0 = off)
`INCREMENTAL_MAX_PAGES`	`200`	safety cap for the incremental update
`DATA_DIR`	`/data`	output directory
`USER_AGENT`	(above)	request User-Agent

To re-crawl from scratch, just empty full-domains.txt and full-ips.txt (or delete the contents of data/): the next run detects empty full lists, wipes state and starts a fresh full crawl.

GitHub Actions (the main way this runs)

.github/workflows/update-lists.yml runs every hour, executes the fetcher against the committed data/ state, and commits the refreshed lists back to the repo. The first runs perform the resumable full crawl (~5 h budget per run); once it is complete, each hourly run is a fast incremental update.

Your firewall can then consume the lists directly, e.g.:

https://raw.githubusercontent.com/<owner>/<repo>/main/data/days-30-domains.txt
https://raw.githubusercontent.com/<owner>/<repo>/main/data/full-ips.txt

💡 Make the repo public. GitHub Actions is unlimited & free for public repos; the multi-day initial crawl would otherwise burn through a private repo's monthly Actions minutes. Public also makes the raw list URLs easy for a firewall to fetch. Scheduled workflows are paused after 60 days of repo inactivity, but the hourly bot commits count as activity, so it stays alive.

Local run (no Docker)

pip install -r scraper/requirements.txt
DATA_DIR=data python scraper/fetch.py

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
.github/workflows		.github/workflows
data		data
scraper		scraper
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SGBList — siberguvenlik.gov.tr blocklist generator

Output files (in `data/`)

How it decides what to do

Removals (delisting)

User-Agent

Optional: self-host with Docker

GitHub Actions (the main way this runs)

Local run (no Docker)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SGBList — siberguvenlik.gov.tr blocklist generator

Output files (in data/)

How it decides what to do

Removals (delisting)

User-Agent

Optional: self-host with Docker

GitHub Actions (the main way this runs)

Local run (no Docker)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Output files (in `data/`)

Packages