Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions .github/actions/setup-python-poetry/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: Setup Python and Poetry
description: Set up Python with pip caching, install Poetry, and install project dependencies

inputs:
python-version:
description: Python version to use
required: false
default: "3.12"
poetry-install-args:
description: Extra arguments for poetry install (e.g. --no-root --with dev)
required: false
default: ""

runs:
using: composite
steps:
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ inputs.python-version }}
cache: pip

- name: Install Poetry
run: pipx install poetry
shell: bash

- name: Install dependencies
run: poetry install ${{ inputs.poetry-install-args }}
shell: bash
78 changes: 55 additions & 23 deletions .github/workflows/theseus-engine.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,39 +6,71 @@ on:
workflow_dispatch:

jobs:
analyze_codebase:
discover-repos:
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
outputs:
repos: ${{ steps.extract.outputs.repos }}
steps:
- uses: actions/checkout@v4
- id: extract
run: |
REPOS=$(python -c '
import json
with open("theseus.config.json") as f:
config = json.load(f)
names = [r["name"] for r in config.get("repositories", [])]
print(json.dumps(names))
')
echo "repos=$REPOS" >> "$GITHUB_OUTPUT"
Comment thread
coderabbitai[bot] marked this conversation as resolved.

analyze:
needs: discover-repos
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
repo: ${{ fromJSON(needs.discover-repos.outputs.repos) }}
Comment thread
coderabbitai[bot] marked this conversation as resolved.
steps:
- name: Checkout
uses: actions/checkout@v4
- uses: actions/checkout@v4
with:
token: ${{ secrets.GITHUB_TOKEN }}
fetch-depth: 0

- name: Setup python 3.12
uses: actions/setup-python@v5
- name: Setup Python and Poetry
uses: ./.github/actions/setup-python-poetry
with:
python-version: "3.12"
poetry-install-args: --no-interaction --no-root

- name: Install poetry
run: pipx install poetry
- name: Run pipeline for ${{ matrix.repo }}
run: poetry run python scripts/run_pipeline.py --repo ${{ matrix.repo }} --update-survivor
timeout-minutes: 120

- name: Install dependencies
run: poetry install --no-interaction --no-root
- name: Upload data artifacts
if: success()
uses: actions/upload-artifact@v4
with:
name: data-${{ matrix.repo }}
path: |
data/raw/${{ matrix.repo }}_data.json
data/processed/${{ matrix.repo }}_graph.json

- name: Run theseus data pipeline (snapshots → survivor → cleanup)
run: |
# Analyse new snapshot periods, refresh survivor fossils, and clean/minify
# all data payloads. Genesis (historical fossil) is left untouched
# during monthly cron runs.
poetry run python scripts/run_pipeline.py --update-survivor
create-pr:
needs: analyze
if: success()
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
steps:
- uses: actions/checkout@v4

- name: Create pull request for data updates
if: success()
uses: peter-evans/create-pull-request@b1ddad2c994a25fbc81a28b3ec0e368bb2021c50 # v6.0.0
- name: Download all artifacts
uses: actions/download-artifact@v4
with:
pattern: data-*
merge-multiple: true

- name: Create pull request
uses: peter-evans/create-pull-request@b1ddad2c994a25fbc81a28b3ec0e368bb2021c50
with:
token: ${{ secrets.GITHUB_TOKEN }}
commit-message: "chore: update theseus persistence data across all repos"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,14 @@ jobs:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
- name: Setup Python and Poetry
uses: ./.github/actions/setup-python-poetry
with:
python-version: "3.12"
poetry-install-args: --with dev

- name: Install Poetry
run: pipx install poetry

- name: Install dependencies
run: poetry install --with dev
- name: Run linter
run: poetry run pylint scripts/ --output-format=colorized
continue-on-error: true

- name: Run tests
run: poetry run pytest tests/ -v --tb=short
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -213,3 +213,4 @@ __marimo__/
.dev.vars*
!.dev.vars.example
!.env.example
presentation/
2 changes: 1 addition & 1 deletion app.js
Original file line number Diff line number Diff line change
Expand Up @@ -277,7 +277,7 @@ class TheseusVisualizer {
}
this.repoDescription.textContent = repoInfo.description || "";

const response = await fetch(`data/${repoInfo.file}`, { signal });
const response = await fetch(`data/processed/${repoInfo.name}_graph.json`, { signal });
if (!response.ok) throw new Error(`HTTP ${response.status}`);
const rawData = await response.json();

Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
51 changes: 35 additions & 16 deletions docs/CONFIGURATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,11 @@ The Ship of Theseus engine operates centrally off a single file: `theseus.config

```json
{
"$schema": "./schema.json",
"dataDir": "./data",
"repositories": [
{
"name": "react",
"repo": "facebook/react",
"displayName": "React",
"description": "A JavaScript library for building user interfaces",
"milestones": [
{ "date": "2013-05", "title": "Open Source", "description": "React is released." }
Expand All @@ -24,17 +22,16 @@ The Ship of Theseus engine operates centrally off a single file: `theseus.config

### Global Settings

* `dataDir` *(string)*: The relative path to the directory where the engine will save output JSONs. Usually `"./data"`. This config also controls the Javascript engine, so the frontend needs this accurate to know where to fetch data.
* `dataDir` *(string)*: The relative path to the directory where the engine saves output JSONs. Usually `"./data"`. The frontend uses this to know where to fetch data.

### Repositories Array

The `repositories` array takes objects consisting of the following key attributes:

| Key | Type | Description | Example |
| :--- | :---: | :--- | :--- |
| `name` | *String* | A safe, unique identifier. Used for the JSON filename (`{name}_data.json`). Must be snake_case or kebab-case. | `"django"` |
| `repo` | *String* | The GitHub repository namespace (the URL ending). The engine automatically strips trailing slashes and resolves this to `https://github.com/namespace/repo.git`. | `"django/django"` |
| `displayName` | *String* | The aesthetic name rendered on UI Cards. | `"Django"` |
| `name` | *String* | A safe, unique identifier. Used as the repo slug (`--repo NAME`) and as the data filenames — `data/raw/{name}_data.json` (raw with blame metadata) and `data/processed/{name}_graph.json` (graph for frontend). Must be kebab-case. | `"django"` |
| `repo` | *String* | The GitHub repository namespace. The engine resolves this to `https://github.com/owner/repo.git`. | `"django/django"` |
| `description` | *String* | A short UI subheading clarifying what the project is. | `"The web framework for perfectionists with deadlines."` |
| `milestones` | *Array* | An optional list of significant events to display on the timeline. | `[{"date": "2024-01", "title": "Launch"}]` |

Expand All @@ -53,17 +50,39 @@ The `milestones` array contains objects with the following properties:

---

## Modifying Configurations
## Adding a New Repository

### Adding a new target
To begin visualizing a new repository, append it to the `repositories` array.
Paste this template into the `repositories` array in `theseus.config.json`:

1. Add your object to `theseus.config.json`
2. Locally run `poetry run python scripts/analyse_repository.py`
3. The engine will clone the repo into `./temp_repos/` (which can be over `1GB` for massive codebases, so ensure disk space).
4. Local data processing will generate `data/{your_repo}_data.json`.
5. Run `poetry run python scripts/add_fossils.py` to fill in the Genesis/Survivor line references.
6. Check your `index.html` file to see the newly generated visual graph!
```json
{
"name": "REPO-NAME",
"description": "Short description displayed on the dashboard",
"repo": "OWNER/REPO-SLUG",
"milestones": [
{
"date": "YYYY-MM",
"title": "Brief milestone title",
"description": "Optional longer description"
}
]
}
```

Then run the pipeline to generate the data:

```bash
python scripts/run_pipeline.py --repo REPO-NAME
```

This single command clones the repository, runs quarterly/monthly snapshot analysis, discovers both genesis and survivor fossils, and writes two files:
- `data/raw/{name}_data.json` — master data with per-file blame metadata (pipeline state)
- `data/processed/{name}_graph.json` — cleaned graph data for the frontend (only `snapshot_date` + `composition` per entry)

The frontend auto-discovers the new data from `data/processed/` — no additional changes needed.

> [!NOTE]
> Data filenames are derived from `name`: `data/raw/{name}_data.json` and `data/processed/{name}_graph.json`. There is no `file` field to maintain.

> [!CAUTION]
> Avoid modifying the output data within `data/` manually. Doing so will corrupt the incremental snapshot logic, forcing the pipeline to wipe out the cache and restart checking out massive commit trees from scratch.
> Avoid modifying the output data within `data/` manually. Doing so can corrupt the incremental snapshot cache, forcing a full re-clone and re-analysis.
Loading
Loading