feat(run): library entrypoint + thin CLI for the e2e pipeline#24
Draft
pradeepvrd wants to merge 1 commit into
Draft
feat(run): library entrypoint + thin CLI for the e2e pipeline#24pradeepvrd wants to merge 1 commit into
pradeepvrd wants to merge 1 commit into
Conversation
448e337 to
46d99e1
Compare
pradeepvrd
commented
Jun 20, 2026
46a0cbd to
d219d2c
Compare
46d99e1 to
f28d2ec
Compare
d219d2c to
0c74d15
Compare
c820ad0 to
58d521b
Compare
0c74d15 to
9865495
Compare
58d521b to
c9553ba
Compare
9865495 to
54182fe
Compare
c9553ba to
0e2995e
Compare
54182fe to
a92f734
Compare
0e2995e to
a01e4c7
Compare
a92f734 to
ecf7f3f
Compare
a01e4c7 to
fec6322
Compare
ecf7f3f to
75281ee
Compare
fec6322 to
104e20a
Compare
75281ee to
3763c42
Compare
104e20a to
6b67eed
Compare
3763c42 to
969eda8
Compare
The e2e pipeline used to be driven by running `pkg/evaluator/evaluate.py` directly (positional args + env overrides) via `scripts/entrypoint.sh`; this adds a library entrypoint `devops_bench/run.py` (`run_benchmark` + `BenchmarkConfig`), a thin argparse CLI `devops_bench/cli.py` (`python -m devops_bench`), and `scripts/entrypoint_harness.sh` + `Dockerfile.harness`. No new deps. **Behavior changes** - Configuration is typed through `BenchmarkConfig.from_env()`, with CLI flags overriding env and primary/secondary env names unified (`PROJECT_ID`/`GCP_PROJECT_ID`, `CLUSTER_NAME`/`GKE_CLUSTER_NAME`). - `--no-infra` / `--no-teardown` are reflected into the env before the harness is built, keeping the harness the single env-read site. - The run returns a typed `BenchmarkResult` (results, run dir, results.json path); exit codes are 0 success / 1 task failure / 2 config error. **Bugs fixed** - Required config (project/cluster when infra is enabled) is validated up front and raises `ConfigError`, instead of failing partway through the run.
6b67eed to
8fdeb40
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The e2e pipeline used to be driven by running
pkg/evaluator/evaluate.pydirectly (positional args + env overrides) viascripts/entrypoint.sh; this adds a library entrypointdevops_bench/run.py(run_benchmark+BenchmarkConfig), a thin argparse CLIdevops_bench/cli.py(python -m devops_bench), andscripts/entrypoint_harness.sh+Dockerfile.harness. No new deps.Behavior changes
BenchmarkConfig.from_env(), with CLI flags overriding env and primary/secondary env names unified (PROJECT_ID/GCP_PROJECT_ID,CLUSTER_NAME/GKE_CLUSTER_NAME).--no-infra/--no-teardownare reflected into the env before the harness is built, keeping the harness the single env-read site.BenchmarkResult(results, run dir, results.json path); exit codes are 0 success / 1 task failure / 2 config error.Bugs fixed
ConfigError, instead of failing partway through the run.