Skip to content

ci: add PR comment reporting weight changes vs master#1454

Open
dmoka wants to merge 13 commits into
masterfrom
feat/add-weight-comparison-task
Open

ci: add PR comment reporting weight changes vs master#1454
dmoka wants to merge 13 commits into
masterfrom
feat/add-weight-comparison-task

Conversation

@dmoka
Copy link
Copy Markdown
Contributor

@dmoka dmoka commented May 13, 2026

Description

Adds a new workflow that diffs runtime/hydradx/src/weights/ against master on every PR and posts a sticky comment summarising per-extrinsic changes in ref_time, proof_size, and DB reads/writes. Changes exceeding ±10% are flagged so big regressions are hard to miss.

The weight check runs in 2 places:

  • after every commit
  • when full benchmark weight is generated for pallet(s)

Example result

Weight Diff Report

⚠️ 4 pallet(s) have changes exceeding ±10% threshold

8 extrinsic(s) changed across 5 pallet(s). New: 0. Removed: 0.

frame_system

Extrinsic RefTime Proof Size Reads Writes
remark ⚠️ +106.0% (27.3M → 56.2M) ⚠️ +100.0% (0 → 1024)

pallet_balances

Extrinsic RefTime Proof Size Reads Writes
transfer_allow_death ⚠️ +25.0% (6196 → 7745) 2 → 3 (+1)
transfer_keep_alive 1 → 2 (+1)

pallet_hsm

Extrinsic RefTime Proof Size Reads Writes
sell ⚠️ +120.0% (794.1M → 1.7B) ⚠️ +30.0% (26820 → 34866)
buy ⚠️ -15.0% (789.6M → 671.2M)
calculate_sell +3.0% (614.1M → 632.5M)

pallet_stableswap

Extrinsic RefTime Proof Size Reads Writes
router_execution_sell ⚠️ +42.0% (1.2B → 1.8B)

pallet_omnipool

Extrinsic RefTime Proof Size Reads Writes
add_liquidity -7.0% (349.3M → 324.8M)

Threshold: ±10%. Base Weight::from_parts(ref_time, proof_size) compared; per-unit components ignored.

Adds a new workflow that diffs `runtime/hydradx/src/weights/` against
master on every PR and posts a sticky comment summarising per-extrinsic
changes in ref_time, proof_size, and DB reads/writes. Changes exceeding
±10% are flagged so big regressions are hard to miss.

Two interchangeable diff implementations are included so we can pick the
preferred one to keep:

- `scripts/weight-diff/` — POSIX awk parser + bash orchestrator (no
  runtime deps beyond bash + awk on the runner).
- `scripts/weight-diff-py/weight_diff.py` — single-file Python reference
  with inline self-tests; produces byte-identical output.

The workflow runs on `ubuntu-latest` independently of the existing
`benchmark-check` job: it doesn't need the Rust toolchain or the
self-hosted runner, and shows up as the "Weight diff" check on each PR.

Refs #1393.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

Runtime version has not been increased.

dmoka and others added 11 commits May 13, 2026 11:24
DO NOT MERGE. Will be reverted once we confirm the weight-diff workflow
posts the expected sticky comment.

Covers the matrix:
- ref_time +120% / -15% / +3% / +106% / +42% / -7%
- proof_size +30% / 0 -> 1024 / +25%
- DB reads +1 (combined with proof change, and standalone)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the four top-N sections (RefTime up/down, Proof Size, DB
reads/writes) with one section per affected pallet. Each section shows
all changed extrinsics in that pallet with five columns: Extrinsic,
RefTime, Proof Size, Reads, Writes. Cells render `—` when the metric
didn't change.

Pallet headers carry `⚠️` if any extrinsic in the pallet exceeds the
±10% threshold; pallets are ordered with warned ones first
(alphabetical), then non-warned (alphabetical). Banner now counts
pallets with significant changes rather than individual extrinsics.

Both the bash and Python implementations are updated; outputs remain
byte-identical and the Python self-tests still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The cell-level warnings carry the signal; the duplicate emoji on the
section header was visual noise. Pallets with warnings still float to
the top of the report via the existing sort order.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Promotes the 10% threshold to a named DEFAULT_THRESHOLD constant at the
top of each script. Removes the --top flag, which became dead code when
the report was regrouped per pallet (top-N pagination no longer
applies).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Removes the parallel bash + awk implementation now that we've settled
on the Python one. Renames `scripts/weight-diff-py/` to
`scripts/weight-diff/` (the `-py` suffix only existed to differentiate
from the bash directory) and updates the workflow to invoke
`python3 scripts/weight-diff/weight_diff.py`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The standalone weight-diff workflow runs on every PR push, but the
bot's "Update pallets weights" commit (made with GITHUB_TOKEN) cannot
retrigger workflows — so a full benchmark run otherwise leaves the
sticky comment stale. Add an inline weight-diff step set to the
benchmark-check job: it materialises master's weights, runs the same
Python diff, and posts to the same `weight-diff` sticky header, which
correctly overwrites the standalone workflow's last comment.

Only runs when benchmark_type == 'full' so quick-mode PR runs are not
affected.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Quick benchmark at commit e9725b5 has been executed successfully.
View results

@dmoka dmoka marked this pull request as ready for review May 15, 2026 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant