Crate Digger indexes monthly SoundCloud mixtapes and the tracks inside them.
The first version is intentionally local and boring: it stores everything in SQLite, uses SoundCloud's public oEmbed endpoint for mix metadata, extracts 1001Tracklists links from SoundCloud descriptions when present, and imports tracklists from pasted text.
1001Tracklists is great for manual lookup. Crate Digger treats it as an optional source that must be used politely:
- Read
robots.txtbefore fetching. - Use the project user-agent instead of Python's default user-agent.
- Respect the site's crawl delay.
- Stop when the site serves a JavaScript or bot-protection challenge.
If normal page HTML is available, import it directly:
python3 -m uv run crate-digger import-1001-tracklist 1 \
"https://www.1001tracklists.com/tracklist/15vbkbst/the-aston-shuffle-only-100s-april-2026-2026-04-28.html"If the page is challenged, use the dependable manual workflow:
- Add the SoundCloud mix URL.
- Let Crate Digger capture the title, description, and 1001Tracklists link.
- Copy the visible tracklist text from 1001Tracklists into a
.txtfile. - Import that file into the indexed mixtape with the source URL attached.
python3 -m uv run crate-digger import-tracklist 1 tracklists/only100s-2026-04.txt \
--tracklist-url "https://www.1001tracklists.com/tracklist/15vbkbst/the-aston-shuffle-only-100s-april-2026-2026-04-28.html"That gives you fast local search and keeps the index from depending on bypassing bot protection.
For a human-supervised local import, use the assisted browser flow. It opens a visible browser, waits for you to confirm that the tracklist is visible, then reads the rendered page:
python3 -m uv run playwright install chromium
python3 -m uv run crate-digger import-1001-assisted 1 \
"https://www.1001tracklists.com/tracklist/15vbkbst/the-aston-shuffle-only-100s-april-2026-2026-04-28.html" \
--replaceUse --auto-read to skip the terminal confirmation when the visible browser
loads the tracklist normally:
python3 -m uv run crate-digger import-1001-assisted 1 \
"https://www.1001tracklists.com/tracklist/15vbkbst/the-aston-shuffle-only-100s-april-2026-2026-04-28.html" \
--replace \
--auto-readThis command is intended for local use, not GitHub Actions.
This repo uses uv for its Python environment.
Install dependencies and create .venv:
python3 -m uv syncRun the CLI from the managed environment:
python3 -m uv run crate-digger --helpIf you want uv available as a normal shell command, add your user Python bin
directory to your shell profile:
echo 'export PATH="$HOME/Library/Python/3.9/bin:$PATH"' >> ~/.zshrcThen open a new terminal and use uv run ... directly.
Run tests:
python3 -m uv run python -m unittest discover -s testsCrate Digger includes a small Django app for browsing the indexed data and
editing it through Django admin. The Django models point at the existing
crate-digger.sqlite3 tables, so the CLI and web UI share the same local
working database.
Create the Django admin/auth tables in your local SQLite database:
python3 -m uv run python manage.py migrateStart the site:
python3 -m uv run python manage.py runserverThen open http://127.0.0.1:8000/.
To use /admin/, create a local admin user:
python3 -m uv run python manage.py createsuperuserCrate Digger can index tracks from a public SoundCloud profile likes page as a separate local collection. This is useful as a standalone CLI flow today, and it keeps the fetch/normalize/classify pieces reusable for a future Django management command or on-demand UI lookup. The simplest path is fully automated when you know the profile URL:
python3 -m uv run crate-digger index-soundcloud-likes \
--likes-url "https://soundcloud.com/spencer-guy-817516400/likes" \
--limit 100If a likes page requires a signed-in session, the same command can fall back to a human-assisted browser workflow.
Start Chrome with a debugging port and sign into SoundCloud:
open -na "Google Chrome" --args \
--remote-debugging-port=9222 \
--user-data-dir="$HOME/.crate-digger-chrome-profile"Then index the visible likes:
python3 -m uv run crate-digger index-soundcloud-likes \
--source browser \
--cdp-url http://127.0.0.1:9222 \
--limit 100 \
--scroll-pages 10 \
--detail-pagesThe command stores each liked track's SoundCloud URL, title, uploader, artwork,
duration, upload time, description, tags, genre, and an initial kind
classification. kind is mixtape when the duration is at least 20 minutes or
the title includes mix-like words such as mix, mixtape, radio, episode,
podcast, tape, set, or live; otherwise it is track.
Exported liked-track files live next to the mixtape exports:
data/soundcloud_likes.csvdata/soundcloud_likes.jsondata/soundcloud-likes.md
Inspect indexed likes locally:
python3 -m uv run crate-digger soundcloud-likes --limit 25
python3 -m uv run crate-digger soundcloud-likes --kind mixtape
python3 -m uv run crate-digger soundcloud-likes --query "tech house"Repeatable, markdown-defined workflows live in
workflows/. The first workflow,
workflows/enrich-mixtape.md, describes the
post-GitHub-Action enrichment path: pull the latest indexed data, import a
tracklist if needed, enrich tracks with Beatport metadata, export data/, and
prepare a final approval summary.
The workflow is designed for a main invoker agent that delegates bounded tasks
to simple markdown-defined sub-agents in workflows/agents/.
The practical goal is to keep the human loop to two or three useful checkpoints:
browser/source readiness, ambiguous match review, and final commit/push
approval.
Crate Digger uses SQLite as a local working cache and exports reviewable files for GitHub:
crate-digger.sqlite3is ignored because it is a generated binary database.data/mixtapes.csvanddata/tracks.csvare stable, diffable exports.data/track_metadata.csvstores optional confirmed external metadata.data/mixtapes.json,data/tracks.json, anddata/track_metadata.jsonare convenient for scripts.data/index.mdis a quick GitHub-friendly summary.data/latest-mixtapes.mdis a GitHub-friendly latest releases report.
Export the current SQLite index:
python3 -m uv run crate-digger export --output dataLoad the tracked exports back into SQLite:
python3 -m uv run crate-digger load-export --input dataThe GitHub Actions workflow in .github/workflows/index.yml rebuilds the
SQLite database from the tracked data/ exports, checks only for newer
SoundCloud uploads, exports data/, and commits changes when newly indexed
data appears.
For incremental SoundCloud checks, use --stop-at-existing. This keeps
scheduled runs focused on new uploads instead of walking the entire archive
every time.
python3 -m uv run crate-digger add \
"https://soundcloud.com/example/monthly-mix-may-2026" \
--month 2026-05 \
--series "Monthly Mixtape"If you already have a copied tracklist:
python3 -m uv run crate-digger add \
"https://soundcloud.com/example/monthly-mix-may-2026" \
--month 2026-05 \
--series "Monthly Mixtape" \
--tracklist-file tracklists/2026-05.txtIf you want to skip SoundCloud lookup and enter metadata yourself:
python3 -m uv run crate-digger add \
"https://soundcloud.com/example/monthly-mix-may-2026" \
--offline \
--title "May 2026 Monthly Mix" \
--month 2026-05 \
--series "Monthly Mixtape" \
--tracklist-url "https://1001.tl/example" \
--tracklist-file tracklists/2026-05.txtOr import tracks later:
python3 -m uv run crate-digger import-tracklist 1 tracklists/2026-05.txtReplace existing tracks and record the source URL:
python3 -m uv run crate-digger import-tracklist 1 tracklists/2026-05.txt \
--replace \
--tracklist-url "https://1001.tl/example"Summarize the local database:
python3 -m uv run crate-digger statsSearch across indexed mixes and tracks:
python3 -m uv run crate-digger search "Four Tet"Search only tracks:
python3 -m uv run crate-digger search "Jayda G" --type tracksList indexed tracks:
python3 -m uv run crate-digger tracks --year 2021 --limit 20Show one mix:
python3 -m uv run crate-digger show 1List mixes:
python3 -m uv run crate-digger listFilter mix listings:
python3 -m uv run crate-digger list --year 2023 --with-tracks
python3 -m uv run crate-digger list --without-tracks --desc --limit 10Batch import a SoundCloud profile's monthly mixes:
python3 -m uv run crate-digger batch-add-soundcloud-page \
"https://soundcloud.com/itsonly100s/tracks" \
--series "Only 100s" \
--uploader "Only 100s" \
--monthly-onlyThe same workflow can index The Magician's Magic Tape archive:
python3 -m uv run crate-digger batch-add-soundcloud-page \
"https://soundcloud.com/themagician/tracks" \
--series "Magic Tape" \
--uploader "The Magician" \
--match "^Magic Tape [0-9]+"Mau P's weekly XXX Radio archive uses the same pattern:
python3 -m uv run crate-digger batch-add-soundcloud-page \
"https://soundcloud.com/realmaup/tracks" \
--series "XXX Radio" \
--uploader "Mau P" \
--match "^XXX Radio #[0-9]+"Rebūke's weekly ERA archive can be indexed from SoundCloud too:
python3 -m uv run crate-digger batch-add-soundcloud-page \
"https://soundcloud.com/rebukemusic/tracks" \
--series "ERA" \
--uploader "Rebūke" \
--match "^ERA [0-9]+"The SoundCloud batch importer tries SoundCloud's public JSON endpoint first and
falls back to the static page HTML when needed. --monthly-only keeps yearly
recaps and other non-monthly uploads out of the local index.
MixesDB imports are still available as an optional ad hoc fallback, but they are not part of the normal refresh or enrichment workflow because recent archive coverage has been inconsistent.
Import available Only 100s tracklists from MixesDB:
python3 -m uv run crate-digger import-mixesdb-categoryImport available Magic Tape tracklists from MixesDB:
python3 -m uv run crate-digger import-mixesdb-category \
"https://www.mixesdb.com/w/Category:Magic_Tape" \
--series "Magic Tape" \
--uploader "The Magician"Import available XXX Radio tracklists from MixesDB:
python3 -m uv run crate-digger import-mixesdb-category \
"https://www.mixesdb.com/w/Category:XXX_Radio" \
--series "XXX Radio" \
--uploader "Mau P" \
--title-match "XXX Radio"Import available Rebūke ERA source pages from MixesDB:
python3 -m uv run crate-digger import-mixesdb-category \
"https://www.mixesdb.com/w/Category%3AReb%C5%ABke" \
--series "ERA" \
--uploader "Rebūke" \
--title-match "ERA"The importer is safe to rerun; existing tracks are ignored. Use --add-missing
when a MixesDB page exists but the upload did not appear in the SoundCloud
archive response.
Enrich tracks with confirmed Beatport pages in a visible browser:
python3 -m uv run crate-digger enrich-beatport-assisted --limit 10The command opens Beatport searches one track at a time. Navigate to the exact track page, return to the terminal, and press Enter to save BPM, key, genre, label, release metadata, and the confirmed Beatport URL when the page exposes those fields.
By default, it uses Beatport's visible search UI instead of jumping straight to query-param URLs. That keeps the flow closer to a normal manual session. If the search control cannot be found automatically, the command asks you to click into Beatport's search box so it can type and submit the query through the focused field.
For an even more manual start, let the browser open blank and navigate to Beatport yourself before the script touches anything:
python3 -m uv run crate-digger enrich-beatport-assisted --mixtape-id 637 --manual-startYou can also attach to a real Chrome instance that you launch with a debugging port:
open -na "Google Chrome" --args \
--remote-debugging-port=9222 \
--user-data-dir="$HOME/.crate-digger-chrome-profile"
python3 -m uv run crate-digger enrich-beatport-assisted \
--mixtape-id 637 \
--manual-start \
--cdp-url http://127.0.0.1:9222 \
--auto-first-resultIf the right match is visible in Beatport search results, use numbered result
selection instead of opening a track page. The closest row is selected
automatically when title and artist matching produce one clear winner; add
--manual-result-choice to pick the row yourself.
python3 -m uv run crate-digger enrich-beatport-assisted \
--track-id 5100 \
--manual-start \
--cdp-url http://127.0.0.1:9222 \
--choose-resultUseful filters:
python3 -m uv run crate-digger enrich-beatport-assisted --series "XXX Radio" --limit 5
python3 -m uv run crate-digger enrich-beatport-assisted --mixtape-id 12
python3 -m uv run crate-digger enrich-beatport-assisted --track-id 345If Beatport challenges the automated browser, use the normal-browser manual flow. It copies each search query to your clipboard, then asks you to paste the confirmed Beatport track URL:
python3 -m uv run crate-digger enrich-beatport-manual --mixtape-id 637 --limit 3Use --url-only when you only want to save confirmed Beatport links without
typing BPM, key, genre, label, or release fields.
The parser accepts common tracklist lines:
01. [00:00] Artist One - First Track
02. [04:35] Artist Two - Second Track
03. Unknown ID
Lines beginning with # are ignored. URLs are skipped.
- Add a tiny web UI for browsing by month, artist, and track.
- Add CSV export for Rekordbox, Serato notes, or personal archives.
- Add optional MusicBrainz/Discogs lookup for normalized artist/title metadata.
- Add a review queue for unknown IDs and duplicate track detection.