A data engineering toolkit to extract metadata and replays from api.faforever.com and load it into a data lake like BigQuery. The intention is to reconstruct (part) of the Forged Alliance Forver database as a public BigQuery dataset.
Using this toolkit, I've scraped the API and created a dataset of all game models and some associated models (player, gamePlayerStats, mapVersion, etc).
It lets you make stuff like this:

At the time of this writing, there are three public ways to use this dataset:
- A simple Datastudio Dashboard for quick browsing
- A Kaggle dataset where I've flattened, filtered and documented two CSVs
- A publicly accessible BigQuery dataset for your own queries (← the good stuff is here)
The tools includes utilities to extract, transform and load FAF metadata and replay data. Here's a demo session using faf.extract and faf.transform to create a BigQuery table:
An overview of all utilities:
faf.extract: Scrapes models fromapi.faforver.com, storing them as JSONs on disk.faf.transform: Transform extracted JSON files into JSONL files ready for loading to a data lake.faf.parse: Parses a downloaded.fafreplayfile into a.pickle; this speeds up subsequent dumps of the replay.faf.dump: Dumps the content of a replay (raw.fafreplayor pre-parsed.pickle) into a JSONL file to be loaded to the lake.
This is a bit of a fork/rewrite of fafalytics, another project of mine with much larger scope (not just scrape the API, but also download and analyse the binary replay files). I now think it's better to approach this with three smaller scoped projects - one for data engineering, one for dataviz and analytics, and one for ML.