Skip to content

Backfill citation goal-start trajectory summary#807

Open
huangruiteng wants to merge 1 commit into
mainfrom
codex/case-analysis-goalstart-trajectory
Open

Backfill citation goal-start trajectory summary#807
huangruiteng wants to merge 1 commit into
mainfrom
codex/case-analysis-goalstart-trajectory

Conversation

@huangruiteng

Copy link
Copy Markdown
Owner

Summary

  • Backfill a public-safe trajectory_public_summary block for the citation-check goal-start bridge-timeout run.
  • Refresh the generated case-analysis Markdown coverage tables from compact JSON.
  • Update the focused case-analysis smoke expectations from 5 to 6 trajectory summaries and assert the new goal-start summary counters.

Validation

  • python3 examples/benchmark-case-analysis-smoke.py
  • python3 -m json.tool docs/research/long-horizon-agent-benchmarks/benchmark-case-analysis.json
  • git diff --check -- docs/research/long-horizon-agent-benchmarks/benchmark-case-analysis.json docs/research/long-horizon-agent-benchmarks/benchmark-case-analysis.md examples/benchmark-case-analysis-smoke.py
  • loopx check --scan-path docs/research/long-horizon-agent-benchmarks/benchmark-case-analysis.json --scan-path docs/research/long-horizon-agent-benchmarks/benchmark-case-analysis.md --scan-path examples/benchmark-case-analysis-smoke.py

Boundary

  • Used only compact/public trajectory counters from the local public trace artifact.
  • Did not read or commit raw task text, raw logs, raw trajectory bodies, verifier output, credentials, uploads, or submissions.
  • No benchmark jobs launched and no runner/scoring behavior changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant