Skip to content

perf: add DD-061 benchmark harness#133

Open
larimonious wants to merge 2 commits into
mainfrom
perf/dd061-benchmark-harness
Open

perf: add DD-061 benchmark harness#133
larimonious wants to merge 2 commits into
mainfrom
perf/dd061-benchmark-harness

Conversation

@larimonious

Copy link
Copy Markdown
Contributor

Summary

  • Adds the DD-061 PR 2 benchmark harness at scripts/bench/run-benchmarks.py.
  • Adds representative perf fixtures under examples/perf/ for plaintext, JSON, route params, compute loop, templates/partials/loops, CLI compute, and optional PostgreSQL routes.
  • Updates DD-061 to mark PR 1/2 complete and recommend the template AST cache as PR 3.

Notes

  • Default runs are database-free; DB routes are opt-in with --include-db and DATABASE_URL.
  • Results are written to target/perf-bench/ as JSON and Markdown.
  • Greptile local review got several rounds of harness cleanup; final rerun timed out after dispatch, so this is pushed for the PR-side check rather than continuing the tiny robot treadmill.

Verification

  • python3 AST parse for scripts/bench/run-benchmarks.py
  • ./target/dev-release/ntnt validate examples/perf
  • ./target/dev-release/ntnt lint examples/perf --strict (suggestions only: missing contracts in perf fixture functions)
  • python3 scripts/bench/run-benchmarks.py --quick --duration 1
  • python3 scripts/bench/run-benchmarks.py --duration 500ms rejects unsupported duration units
  • git diff --check

@greptile-apps

greptile-apps Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR implements the DD-061 benchmark harness (PR 2 of the performance roadmap): a Python script at scripts/bench/run-benchmarks.py that builds the dev-release binary, starts a representative ntnt HTTP server, runs routes with wrk or a sequential urllib fallback, captures interpreter-only CLI timings, and writes JSON + Markdown results to target/perf-bench/.

  • scripts/bench/run-benchmarks.py — 415-line harness with wrk/urllib runners, duration validation, process-group cleanup, git context capture, and Markdown report generation. DB routes are opt-in via --include-db + DATABASE_URL.
  • examples/perf/ — new fixture directory with server.tnt (plaintext, JSON, route-param, compute, template, and optional PostgreSQL routes), compute_cli.tnt for interpreter-only timing, HTML templates, and a README.
  • design-docs/dd-061-interpreter-performance-roadmap.md — marks PR 1 and PR 2 complete and advances the recommendation to PR 3 (automatic template AST cache).

Confidence Score: 5/5

Safe to merge — adds opt-in benchmark tooling and fixtures with no effect on CI or production paths.

All changes are additive: a new Python harness, ntnt fixture files, HTML templates, and a doc update. Nothing touches production code paths, CI gates, or existing tests. The harness is guarded behind an explicit invocation and writes results only to target/perf-bench/.

No files require special attention; the two minor logic gaps noted are in the benchmark reporting path only.

Important Files Changed

Filename Overview
scripts/bench/run-benchmarks.py New 415-line benchmark harness. wrk/urllib runner, CLI timing, JSON+Markdown output. Two minor logic edge cases: trailing slash in base URL doubles path separators; urllib loop marks a zero-request run as ok.
examples/perf/server.tnt New HTTP fixture server with plaintext, JSON, route-param, compute, template, and optional DB routes. O(n²) array construction in make_rows is noted in-code as a known limitation; rows are built once at startup so per-request timings are unaffected.
examples/perf/compute_cli.tnt New interpreter-only CLI benchmark fixture. Simple, deterministic compute loop; no issues.
examples/perf/views/page.html New template exercising layout, partial include, and for-loop with empty fallback. Correct template syntax.
examples/perf/README.md New documentation covering quick/full suite invocations, optional DB routes, and benchmarked route inventory. Accurate and consistent with the harness.
design-docs/dd-061-interpreter-performance-roadmap.md Marks PR 1 and PR 2 checklist items as complete and updates the current recommendation to target PR 3 (template AST cache). Documentation-only change.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant User
    participant Harness as run-benchmarks.py
    participant Cargo
    participant Server as ntnt server.tnt
    participant WRK as wrk / urllib
    participant CLI as ntnt compute_cli.tnt
    participant FS as target/perf-bench/

    User->>Harness: python3 run-benchmarks.py [--quick] [--include-db]
    Harness->>Cargo: cargo build --profile dev-release
    Cargo-->>Harness: ntnt binary
    Harness->>Server: Popen(ntnt run server.tnt) + wait_for_server
    Server-->>Harness: HTTP 2xx on /
    loop for each HTTP benchmark
        Harness->>WRK: run route (wrk -d Ns or urllib loop)
        WRK-->>Harness: RPS / latency result
    end
    Harness->>Server: SIGTERM (killpg)
    Harness->>CLI: ntnt run compute_cli.tnt x N runs
    CLI-->>Harness: elapsed_ms, stdout
    Harness->>FS: write ntnt-perf-stamp.json
    Harness->>FS: write ntnt-perf-stamp.md
    Harness-->>User: paths to JSON + Markdown output
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant User
    participant Harness as run-benchmarks.py
    participant Cargo
    participant Server as ntnt server.tnt
    participant WRK as wrk / urllib
    participant CLI as ntnt compute_cli.tnt
    participant FS as target/perf-bench/

    User->>Harness: python3 run-benchmarks.py [--quick] [--include-db]
    Harness->>Cargo: cargo build --profile dev-release
    Cargo-->>Harness: ntnt binary
    Harness->>Server: Popen(ntnt run server.tnt) + wait_for_server
    Server-->>Harness: HTTP 2xx on /
    loop for each HTTP benchmark
        Harness->>WRK: run route (wrk -d Ns or urllib loop)
        WRK-->>Harness: RPS / latency result
    end
    Harness->>Server: SIGTERM (killpg)
    Harness->>CLI: ntnt run compute_cli.tnt x N runs
    CLI-->>Harness: elapsed_ms, stdout
    Harness->>FS: write ntnt-perf-stamp.json
    Harness->>FS: write ntnt-perf-stamp.md
    Harness-->>User: paths to JSON + Markdown output
Loading

Reviews (2): Last reviewed commit: "fix: address benchmark review diagnostic..." | Re-trigger Greptile

Comment thread scripts/bench/run-benchmarks.py Outdated
Comment thread scripts/bench/run-benchmarks.py Outdated
Comment thread examples/perf/server.tnt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant