fix: eval worktree CWD, fallback patch discovery, Triton wrapper dete… by iraj465 · Pull Request #118 · AMD-AGI/GEAK

iraj465 · 2026-04-08T16:40:37Z

Title: fix: eval worktree CWD and fallback patch discovery

Body:

Two fixes for the post-round evaluation pipeline:

commandment.py: Add cd "${GEAK_WORK_DIR}" && before exec python3 in all run.sh variants. Without this, the eval worktree's Python CWD was wrong, so open('kernel.py') and harness imports resolved from the wrong directory. This caused FULL_BENCHMARK verification to test the unpatched baseline kernel instead of the optimized one.

evaluation.py: When no per-task best_results.json exists, check for best_patch.diff at the round root. Handles the case where dispatch_tasks fails and the orchestrator LLM creates patches directly.

Evidence: In AKA benchmark runs, GEAK internally reported verified_speedup=1.0x for refk_identity while AKA's independent re-evaluation measured 4.46x (baseline=0.0174ms, optimized=0.0039ms). The CWD fix ensures both measurements agree by running the correct kernel in the eval worktree.

Tested on: 7 AKA Triton kernels completed with standard evaluation flow, all compile=true, correct=true.

Two fixes for the post-round evaluation pipeline: 1. commandment.py: Add `cd "${GEAK_WORK_DIR}" &&` before `exec python3` in all run.sh variants. Without this, the eval worktree's Python CWD was wrong, so `open('kernel.py')` and harness imports resolved from the wrong directory during FULL_BENCHMARK verification. This caused verified_speedup to always show ~1.0x even when the agent achieved real speedups (e.g. 4.46x measured independently by AKA). 2. evaluation.py: When no per-task best_results.json exists, check for best_patch.diff at the round root. Handles the case where dispatch_tasks fails and the orchestrator LLM creates patches directly. Made-with: Cursor

iraj465 · 2026-04-08T16:52:46Z

It's WIP. Some aspects of code-quality can be better, open to suggestions

iraj465 requested review from Umangatamd and sdubagun-amd April 8, 2026 16:40

iraj465 force-pushed the fix/eval-worktree-cwd-and-patches branch from d1aa1f8 to 7c422bd Compare April 8, 2026 16:41

iraj465 added bug Something isn't working WIP labels Apr 8, 2026

Umangatamd force-pushed the fix/eval-worktree-cwd-and-patches branch from 7c422bd to 7053e00 Compare May 3, 2026 07:15

Umangatamd force-pushed the main branch from 70dd063 to 5ef30d0 Compare May 4, 2026 06:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: eval worktree CWD, fallback patch discovery, Triton wrapper dete…#118

fix: eval worktree CWD, fallback patch discovery, Triton wrapper dete…#118
iraj465 wants to merge 1 commit into
mainfrom
fix/eval-worktree-cwd-and-patches

iraj465 commented Apr 8, 2026 •

edited

Loading

Uh oh!

iraj465 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

iraj465 commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iraj465 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

iraj465 commented Apr 8, 2026 •

edited

Loading