Skip to content

#42 updated the github workflow and threshold#43

Merged
Asifdotexe merged 1 commit into
mainfrom
42-fix-action-failure-issue
Jun 3, 2026
Merged

#42 updated the github workflow and threshold#43
Asifdotexe merged 1 commit into
mainfrom
42-fix-action-failure-issue

Conversation

@Asifdotexe

@Asifdotexe Asifdotexe commented Jun 3, 2026

Copy link
Copy Markdown
Owner

TL;DR Added raw data for all 5 repositories (react, numpy, langchain, zed and claude code) and updated github actions and analyze incremental blame threshold

Summary by CodeRabbit

Release Notes

  • Chores
    • Enhanced CI/CD workflow with concurrency control and improved error handling
    • Updated data pipeline to push results to shared branch with automatic retry and backoff logic
    • Refined verification mechanism with dynamic thresholds for improved accuracy across different repository sizes

@Asifdotexe Asifdotexe self-assigned this Jun 3, 2026
@Asifdotexe Asifdotexe added the bug Something isn't working label Jun 3, 2026
@Asifdotexe Asifdotexe linked an issue Jun 3, 2026 that may be closed by this pull request
@Asifdotexe Asifdotexe added the enhancement New feature or request label Jun 3, 2026
@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

The PR restructures the GitHub Actions workflow to persist aggregated repository analysis data to a shared branch instead of uploading artifacts, introduces concurrency controls and error resilience, and updates the repository analysis script to use dynamic rather than fixed thresholds for line-count validation.

Changes

Workflow concurrency and branch-based data flow

Layer / File(s) Summary
Concurrency policy and error handling
.github/workflows/theseus-engine.yml
Adds workflow-level concurrency grouping under "theseus-data-engine" to prevent concurrent runs, and enables continue-on-error: true on the analyze pipeline step to allow downstream steps to execute even if the pipeline fails.
Branch-based data aggregation and push
.github/workflows/theseus-engine.yml
Introduces a new "Push data to shared branch" step with a shell script that writes status.json, ensures the chore/monthly-data-update branch exists (creating an orphan branch if needed), copies raw and processed data artifacts into organized directories, and commits and pushes with automatic retries and exponential backoff.
PR creation condition update
.github/workflows/theseus-engine.yml
Changes the create-pr job condition from if: success() to if: !cancelled(), enabling PR creation even when the workflow is cancelled, and retains all PR creation steps that now consume data from the new shared-branch mechanism.

Dynamic line-count validation threshold

Layer / File(s) Summary
Dynamic threshold logic and documentation
scripts/analyse_repository.py
Updates the _verify_line_count_guard function docstring to explain the new dynamic thresholds (1% for repos under 50k lines, 5% for larger repos) and rationale for relaxation, and modifies the comparison logic to compute the threshold dynamically based on disk total line count rather than applying a fixed 1% threshold.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Asifdotexe/Theseus#41: Modifies the same workflow file's artifact handling and PR creation job flow—directly related to the branch-push and condition changes in this PR.
  • Asifdotexe/Theseus#8: Updates the same workflow and calls scripts/analyse_repository.py in the CI pipeline, overlapping with the analyze/publish behavior extended here.
  • Asifdotexe/Theseus#23: Modifies .github/workflows/theseus-engine.yml to create PRs from action-generated data via dedicated branch flow, directly overlapping with this PR's new branch-based push and PR creation pattern.

Poem

🐰 A rabbit hops through CI flows,
Where data streams now branch and grow,
With gentle thresholds, less so strict,
And failures handled, not evict—
The shared branch blooms, concurrency thrives! 🌿

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: updates to GitHub workflow and threshold logic in the analyze_repository.py script.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 42-fix-action-failure-issue

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
.github/workflows/theseus-engine.yml (3)

57-57: ⚡ Quick win

Status determination should check both raw and graph files.

The status is set to "failure" only if the raw file is missing, but according to the pipeline, both data/raw/{repo}_data.json and data/processed/{repo}_graph.json should be generated for a successful run. Consider checking both files to accurately determine success.

💡 Suggested improvement
-          STATUS="success"
-          RAW_FILE="data/raw/${{ matrix.repo }}_data.json"
-          GRAPH_FILE="data/processed/${{ matrix.repo }}_graph.json"
-          [ -f "$RAW_FILE" ] || STATUS="failure"
+          RAW_FILE="data/raw/${{ matrix.repo }}_data.json"
+          GRAPH_FILE="data/processed/${{ matrix.repo }}_graph.json"
+          
+          if [ -f "$RAW_FILE" ] && [ -f "$GRAPH_FILE" ]; then
+            STATUS="success"
+          else
+            STATUS="failure"
+          fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/theseus-engine.yml at line 57, The status check only
verifies RAW_FILE; update the conditional that sets STATUS (the line using
STATUS="failure" when [ -f "$RAW_FILE" ] fails) to verify both the raw and graph
outputs exist (e.g., ensure [ -f "$RAW_FILE" ] && [ -f "$GRAPH_FILE" ] or invert
with OR to set failure if either is missing). Locate the variables RAW_FILE and
GRAPH_FILE and change the test so STATUS="failure" is assigned when either file
is absent, ensuring both data/raw/{repo}_data.json and
data/processed/{repo}_graph.json are required for success.

98-109: 🏗️ Heavy lift

Consider the implications of concurrent branch pushes from matrix jobs.

Multiple matrix jobs will attempt to push to chore/monthly-data-update concurrently, creating a race condition. While the retry/rebase logic should handle this for a small number of concurrent jobs, it may be fragile at scale:

  • Each job gets 5 attempts with linear backoff (5, 10, 15, 20, 25 seconds)
  • If many repositories are processed simultaneously (10+), some pushes may fail after exhausting retries
  • The rebase on line 106 assumes no conflicts, but doesn't explicitly handle rebase failures

The current approach works for typical cases where each job modifies different files. However, for improved reliability, consider refactoring to:

  1. Have analyze jobs save artifacts
  2. Add a single aggregation job that collects all artifacts and pushes once

This would eliminate the race condition entirely.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/theseus-engine.yml around lines 98 - 109, The current push
loop to branch chore/monthly-data-update (the for attempt in 1 2 3 4 5; do ...
git rebase origin/chore/monthly-data-update) is racy when many matrix jobs run
concurrently; instead have the analyzer matrix jobs persist their results as
workflow artifacts and add a single aggregation job that downloads all
artifacts, merges/combines them, and performs the git add/commit/push once
(removing concurrent pushes), and if you keep the retry loop also explicitly
handle rebase failures by checking rebase exit status, running git rebase
--abort on conflict, and retrying with exponential backoff or failing fast so
conflicts aren’t left in a broken state.

180-210: ⚖️ Poor tradeoff

Consider how validation failures affect PR creation when all analyses fail.

The validation step exits with an error if no processed graph files exist (line 189). This means:

  • If all repository analyses fail → status files exist but no graph files → validation fails → no PR created
  • If some analyses succeed → at least one graph file exists → validation passes → PR created

This behavior may be intentional (only merge when there's actual data), but it means maintainers won't be notified via PR when all analyses fail. Consider whether you want to:

  1. Create a PR even with all failures to notify maintainers, or
  2. Add a separate notification mechanism for complete failures (e.g., GitHub Actions summary, issue creation)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/theseus-engine.yml around lines 180 - 210, The validation
step ("Validate graph files") currently calls sys.exit(1) when no processed
files are found (the check on files and the sys.exit(1) for the empty list),
which prevents a PR when all analyses fail; change the script so it does not
exit with a non-zero code when files is empty—instead print a clear warning like
"No processed files found to validate." and exit 0 (or set a GitHub Actions
output/flag indicating no data) so the workflow can continue to create a
PR/notify maintainers; keep the existing error handling for individual file
validation (errors variable and non-zero exit when errors > 0) intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/theseus-engine.yml:
- Around line 147-178: The YAML step's embedded Python heredoc using python <<
'PYEOF' with f-strings (containing {repo}, {passed}, {total}) is breaking YAML
parsing; replace the step with a proper multi-line run block so the YAML parser
sees a literal block: use run: | followed by an indented python - <<'PYEOF' ...
PYEOF (keep the single-quoted delimiter to avoid shell interpolation) and keep
your f-strings and file write to "pr-body.md" unchanged; ensure the heredoc
delimiter 'PYEOF' is flush-left and the entire block is indented correctly under
run: so YAML tools accept it.

---

Nitpick comments:
In @.github/workflows/theseus-engine.yml:
- Line 57: The status check only verifies RAW_FILE; update the conditional that
sets STATUS (the line using STATUS="failure" when [ -f "$RAW_FILE" ] fails) to
verify both the raw and graph outputs exist (e.g., ensure [ -f "$RAW_FILE" ] &&
[ -f "$GRAPH_FILE" ] or invert with OR to set failure if either is missing).
Locate the variables RAW_FILE and GRAPH_FILE and change the test so
STATUS="failure" is assigned when either file is absent, ensuring both
data/raw/{repo}_data.json and data/processed/{repo}_graph.json are required for
success.
- Around line 98-109: The current push loop to branch chore/monthly-data-update
(the for attempt in 1 2 3 4 5; do ... git rebase
origin/chore/monthly-data-update) is racy when many matrix jobs run
concurrently; instead have the analyzer matrix jobs persist their results as
workflow artifacts and add a single aggregation job that downloads all
artifacts, merges/combines them, and performs the git add/commit/push once
(removing concurrent pushes), and if you keep the retry loop also explicitly
handle rebase failures by checking rebase exit status, running git rebase
--abort on conflict, and retrying with exponential backoff or failing fast so
conflicts aren’t left in a broken state.
- Around line 180-210: The validation step ("Validate graph files") currently
calls sys.exit(1) when no processed files are found (the check on files and the
sys.exit(1) for the empty list), which prevents a PR when all analyses fail;
change the script so it does not exit with a non-zero code when files is
empty—instead print a clear warning like "No processed files found to validate."
and exit 0 (or set a GitHub Actions output/flag indicating no data) so the
workflow can continue to create a PR/notify maintainers; keep the existing error
handling for individual file validation (errors variable and non-zero exit when
errors > 0) intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 97070ebf-48e9-481a-9183-74a33c174577

📥 Commits

Reviewing files that changed from the base of the PR and between be35901 and bf846f2.

📒 Files selected for processing (6)
  • .github/workflows/theseus-engine.yml
  • data/raw/claude-code_data.json
  • data/raw/langchain_data.json
  • data/raw/numpy_data.json
  • data/raw/react_data.json
  • scripts/analyse_repository.py

Comment thread .github/workflows/theseus-engine.yml
@Asifdotexe Asifdotexe merged commit 49124a4 into main Jun 3, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix action failure issue

1 participant