Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 5 additions & 6 deletions .github/workflows/theseus-engine.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,9 @@ jobs:
- name: Run pipeline for ${{ matrix.repo }}
id: pipeline
continue-on-error: true
run: poetry run python scripts/run_pipeline.py --repo "${{ matrix.repo }}" --update-survivor
env:
REPO_NAME: ${{ matrix.repo }}
run: poetry run python -m scripts.run_pipeline --repo "$REPO_NAME" --update-survivor
timeout-minutes: 120

- name: Push data to shared branch
Expand Down Expand Up @@ -149,13 +151,10 @@ jobs:
fi
echo "Shared branch has orphaned history. Rebasing onto main..."
SAVE_DIR=$(mktemp -d)
cp -r data/* "$SAVE_DIR"/ 2>/dev/null || true
cp -a data/. "$SAVE_DIR"/ 2>/dev/null || true
git checkout origin/main
# Full reset: remove everything from index and all untracked/ignored files
git rm -rf --cached . >/dev/null 2>&1 || true
git clean -fdx >/dev/null 2>&1 || true
mkdir -p data/raw data/processed data/.status
cp -r "$SAVE_DIR"/* data/ 2>/dev/null || true
cp -a "$SAVE_DIR"/. data/ 2>/dev/null || true
rm -rf "$SAVE_DIR"
git add data/
git -c user.name="github-actions[bot]" \
Expand Down
2 changes: 1 addition & 1 deletion docs/CONFIGURATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ Paste this template into the `repositories` array in `theseus.config.json`:
Then run the pipeline to generate the data:

```bash
python scripts/run_pipeline.py --repo REPO-NAME
python -m scripts.run_pipeline --repo REPO-NAME
```

This single command clones the repository, runs quarterly/monthly snapshot analysis, discovers both genesis and survivor fossils, and writes two files:
Expand Down
14 changes: 9 additions & 5 deletions scripts/run_pipeline.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
"""
Unified orchestration script for the Theseus data pipeline.

Usage::

python -m scripts.run_pipeline [--repo NAME] [--reprocess YYYY-MM] [--update-survivor]

Runs all three stages in sequence on one or more repositories:

1. **Analyse** (snapshot generation via ``analyse_repository``)
Expand Down Expand Up @@ -30,10 +34,10 @@
import sys
import time

import _path_guard # noqa: F401 # pylint: disable=unused-import
import scripts._path_guard # noqa: F401 # pylint: disable=unused-import

from _utils import load_config
from cleanup_data import cleanup_data as run_cleanup
from scripts._utils import load_config
from scripts.cleanup_data import cleanup_data as run_cleanup

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -78,7 +82,7 @@ def run_pipeline(

# ── Stage 1: Analyse ──────────────────────────────────────────────
logger.info("═══ STAGE 1: Snapshot analysis ═══")
from analyse_repository import (
from scripts.analyse_repository import (
process_repository,
)

Expand All @@ -98,7 +102,7 @@ def run_pipeline(
had_failures = True

# ── Stage 2: Fossils ───────────────────────────────────────────────
from add_fossils import backfill_fossils, update_survivor_fossils
from scripts.add_fossils import backfill_fossils, update_survivor_fossils

repo_urls = {
r["name"]: f"https://github.com/{r['repo']}.git"
Expand Down
Loading