Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 23 additions & 4 deletions .github/workflows/comment-perf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,32 @@ jobs:
uses: actions/download-artifact@v4
with:
name: benchmark-report
path: benchmark-report.txt
path: benchmark-report.md
github-token: ${{ secrets.ROCKSBOT_CHISEL_PR_COMMENTER }}
run-id: ${{ github.event.workflow_run.id }}

- name: Get PR number
id: get-pr-number
env:
GH_TOKEN: ${{ secrets.ROCKSBOT_CHISEL_PR_COMMENTER }}
Comment thread
upils marked this conversation as resolved.
# The branch name is considered an untrusted input value (under the
# contributor's control), so store it in a variable to avoid shell
# injection.
# In the unlikely case where multiple PRs, on the same branch, from the
# same author exists, the most recent one is selected.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean we are silently injecting wrong data to the PR?

Can we come up with a better plan here? As mentioned before, this is not looking great.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean we are silently injecting wrong data to the PR?

Sorry it's not clear which part of the highlighted comments you're referring to?:

  • Shell injection? No the comments are informing the maintainer how that is being avoided (so that no future edit unknowingly introduces a vulnerability by inlining third-party controlled data).
  • Selecting the most recent PR? That is best effort and realistically should not happen, but if someone had malicious intent they may attempt to automate a separate PR to be selected instead (which could technically be done to produce a single result in the same way, so long as the timing is right AFAIK).

So it's preventative to the extent that is possible. The exploit is avoided, but technically getting the workflow to target a different PR is viable AFAIK. In this case it's just the posting of a comment of benchmark report to a PR... not great if it's the wrong PR but not necessarily bad either?

The extra care and verbosity in commentary however is preventative towards future maintainers introducing other workflows that may introduce more risk if manipulated.

At least I assume that referencing and trusting a vetted workflow in the repo is far more likely (by said future maintainer/reviewer) than them investing time to understand and be confident in the details, so it's best to account for those risks and surface extra context for awareness (as digging into history of discussion/decisions like in this PR via git blame is far more time intensive).

Copy link
Copy Markdown

@polarathene polarathene Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we come up with a better plan here?

The solution is quite terse (commentary / context aside).

I'm not sure how you want to reduce that further without introducing the security risks?

You can certainly add more complexity, such as adding labels for filter include/exclude criteria, or using manual workflow triggers and reviewers providing the input (again manually).

As the task itself is just to post a comment with text (which the contributor has full control of due to pull_request to a PR), you can choose to keep it simple.

  • The proposal here is only to prevent a bad actor from controlling the PR number (and avoiding earlier accidents that would have compromised the repo).
  • If in future a workflow requires a similar untrusted to trusted context, but the consequences are worse, you would go through all this again to enable it securely. The same mistakes can happen and not be caught by review.

Overview - Workflow flow and challenges faced

Let's just quickly go over the whole scenario:

  1. pull_request workflow (untrusted context) produces output (a benchmark report) that should be posted as a comment on that PR.
  2. Automating a comment requires a trusted context (and thus a separate workflow) to grant permission via secrets / token.
  3. The trusted workflow (in this case workflow_run, which is triggered upon the pull_request workflow completing) lacks the context to identify the correct PR with certainity (only applies when PR source is from a repo fork).
  • This is a fault of the GitHub team, which has done nothing to resolve the concern over the years they've been aware of it. If you'd like me to direct you to advice from the Github Security Labs team on this particular workflow workaround, let me know.
  • Therefore you must reconstruct this information as best you can with the information you do have available. However you must be very careful when that information could be exploited by a third-party.
  1. Query for PRs at the repo with constraints on context available to the workflow. Ideally this is a single PR, but in the odd event it is not and is genuine, the first result will represent the most recently updated PR (which should be correct, assuming no malicious activity to game that).

Beyond that you can add untrusted input from the pull_request workflow to provide the PR number as additional validation. Just be mindful that the attacker can manipulate pull_request as they please, so while it can help avoid extremely rare false-positives, it adds nothing to the preventative security aspect.

Overview - Query for the PR number via gh pr list

A GraphQL query using the available context for the workflow to be as precise as possible at narrowing down the PR number:

  • head:${{ github.event.workflow_run.head_branch }}
    • PR branch name defined by the contributor.
    • Default via GitHub web UI is patch-1.
    • A PR author can raise multiple PRs with this same branch name, and may have already done so in the past, and could do so in future (notable while a workflow run is in progress).
  • sort:updated-desc
    • In that unlikely event the author has existing PRs to the project with the same branch and one of those were updated instead, this constraint prevents mistakenly selecting the newer PR by creation date.
  • author:${{ github.event.workflow_run.head_repository.owner.login }}
    • As head only filters by branch name (eg: the common patch-1), constraining by author is important.
  • ${{ github.event.workflow_run.head_sha }}
    • IIRC, this only ensures that commit SHA that triggered the workflow from the PR exists in any of the PRs. Thus it could technically match more than one PR.

gh pr list is then invoked with those search parameters and a JQ query to extract the PR number from the results or trigger an error to bail early if that search failed.

It is effectively this:

gh pr list --repo "${{ github.repository }}" --state all --search "${QUERY_PR}" \  
  --json number --jq "${JQ_FILTER}" >> "${GITHUB_OUTPUT}"

Inlined QUERY_PR + JQ_FILTER for clarity:

gh pr list \
  --repo "${{ github.repository }}" \
  --state all \
  --search '
    head:${{ github.event.workflow_run.head_branch }}
    sort:updated-desc
    author:${{ github.event.workflow_run.head_repository.owner.login }}
    ${{ github.event.workflow_run.head_sha }}
  ' \
  --json number --jq '
    .[0]
    | if (.number == null) then error("Cannot find PR number") end
    | "number=\(.number)"
  ' \
  >> "${GITHUB_OUTPUT}"

NOTE: As detailed in comments, ENV are used not only to make that expanded version more compact to be easier to digest, but to prevent an attacker from performing shell-injection in the run (_since the ${{ ... }} are interpolated prior to parsing the YAML and executing run.

Copy link
Copy Markdown

@polarathene polarathene Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: Please clarify the direction you want to take:

  • Only allow benchmarks to be posted to be posted by members that create branches on the repo.
  • Take preventative measures to minimize PR number manipulation.
  • Do the bare minimum, permit PR number manipulation.

If you want to pursue the preventative approach:

  • What exactly doesn't sit well with the step in question?
  • What would a more ideal approach look like to you given the constraints (requiring a workaround due to GH not providing what is needed)?
    • The noise can be shifted away from the workflow to live somewhere else
    • You could minimize the logic with pull_request_target, which may feel good to look at as a diff but more prone to compromising security (which this team unfortunately lacks timely access to the needed security expertise for this to even be worth entertaining?).
- name: Get PR number
  env:
    GH_TOKEN: # ...
    QUERY_PR: # ...
    JQ_FILTER: # ...
  run: gh pr list --repo "${{ github.repository }}" --state all --search "${QUERY_PR}" --json number --jq "${JQ_FILTER}" >> "${GITHUB_OUTPUT}"

That isn't that complicated to grok, especially with the detailed breakdown provided.

It should not require further modification, but if that were necessary, you can always choose to communicate very clearly inline for reviewers to be very careful here (with git blame leading to this PR which is abundant in context).


Original comment

As mentioned before, this is not looking great.

You could:

  • Revert to the original proposal if that sits better with you?
  • Decide that posting a benchmark report comment in PRs should be exclusive to branches within the source repo only.
  • Accept the workaround provided it's documented clear and sufficiently (_advisable, given risks caught here may be missed in future_).
  • Hope someone else eventually chimes in with something "better" (I've invested far more time than I'd like on the topic, I'm doubtful unless you've got access to a specialist or convince GH to resolve it).

Or continue to express concerns (1, 2), whilst ignoring my attempt to engage 🤷‍♂️ (I realize I'm incredibly verbose this 2nd time around, but hopefully it's structured in a way that's easier to digest?)

I understand your time is limited, so that you may only be able to glance over the PR diff itself (and maybe skim the discussion for some context).

I hope my replies this time make that skimming easier, such that you're able to ask more meaningful questions regarding how the concern could be approached better, or what tradeoffs you're willing to make instead.

Original Proposed change

The original PR commit effectively added these changes:

# ‎.github/workflows/performance.yaml
- name: Save PR number
  run: echo "${{ github.event.pull_request.number }}" > pr-number

# ‎.github/workflows/comment-perf.yaml
- name: Get PR number
  id: get-pr-number
  run: echo "number=$(cat ./pr-number)" >> "${GITHUB_OUTPUT}"

Quite simple approach, it works good enough?

Broader context view of the integration (NOTE: content below from the referenced commit has been slightly adjusted to fit the current PR review feedback):

    • Added "Save PR number" step.
    • Additionally upload the pr-number file.

‎.github/workflows/performance.yaml: (pull_request, untrusted context)

- name: Run benchmark
  # ...

- name: Save PR number
  run: echo "${{ github.event.pull_request.number }}" > pr-number

- name: Upload result
  uses: actions/upload-artifact@v4
  with:
    name: benchmark-report
    path: |
      benchmark-report.md
      pr-number
    retention-days: 1
    • Added "Read PR number" step.
    • Removed path in "Download comment" (extract to CWD).
    • Swapped ${{ github.event.workflow_run.pull_requests[0].number }} for ${{ steps.get-pr-number.outputs.number }}.

‎.github/workflows/comment-perf.yaml: (workflow_run, untrusted context)

- name: Download comment
  uses: actions/download-artifact@v4
  with:
    name: benchmark-report
    github-token: ${{ secrets.ROCKSBOT_CHISEL_PR_COMMENTER }}
    run-id: ${{ github.event.workflow_run.id }}

- name: Get PR number
  id: get-pr-number
  run: echo "number=$(cat ./pr-number)" >> "${GITHUB_OUTPUT}"

- name: Post message to PR
  uses: mshick/add-pr-comment@dd126dd8c253650d181ad9538d8b4fa218fc31e8 # v2
  with:
    message-path: benchmark-report.md
    issue: ${{ steps.get-pr-number.outputs.number }}

My original feedback was mistakenly aimed at >> "${GITHUB_ENV}", which detailed and exploit, but since the environment wasn't being updated there was no issue here (beyond contributor manipulating PR number that the comment gets posted to).

The PR then tried to better defend against the PR number manipulation, but introduced a vulnerability that would have compromised the workflow (shell injection, enabling RCE). Which has since been resolved 👍


Diff - Single step to find PR number

For context, the only difference from the above snippets (in this comment) was to replace the "Save PR number" + "Get PR number" steps with:

‎.github/workflows/comment-perf.yaml:

- name: Get PR number
  id: get-pr-number
  env:
    GH_TOKEN: ${{ secrets.ROCKSBOT_CHISEL_PR_COMMENTER }}
    # Search constraints - Isolate the PR as best as possible, select number of first PR result.
    # SECURITY: Contributor controlled context expressions must evaluate here, not in `run`
    #           to avoid shell injection vulnerability.
    QUERY_PR: "head:${{ github.event.workflow_run.head_branch }} sort:updated-desc author:${{ github.event.workflow_run.head_repository.owner.login }} ${{ github.event.workflow_run.head_sha }}"
    # Filters and formats the JSON into a `key=value` string with basic error handling:
    JQ_FILTER: >-
      .[0]
      | if (.number == null) then error("Cannot find PR number") end
      | "number=\(.number)"
  run: |
    gh pr list --repo "${{ github.repository }}" --state all --search "${QUERY_PR}" \ 
    --json number --jq "${JQ_FILTER}" >> "${GITHUB_OUTPUT}"

No need to upload a potentially manipulated PR number from the contributor pull_request workflow 👍

NOTE: As noted in this feedback (end of comment), GH_TOKEN could be GH_TOKEN: ${{ github.token }} instead, you'd just need to scope the workflow token permissions like this:

permissions:
  contents: read
  pull-requests: read

That verbosity just better mitigates manipulation of the PR number as opposed to blindly trusting it. I'd love to know a better way but unless GH bothers to resolve it properly and include that in the workflow_run context when PRs are from forks, there's not much you can do. You can of course hide this verbosity away in a separate file to invoke as an action (or composite action if folding multiple steps).

QUERY_PR: "head:${{ github.event.workflow_run.head_branch }} sort:updated-desc author:${{ github.event.workflow_run.head_repository.owner.login }} ${{ github.event.workflow_run.head_sha }}"
# Filters and formats the JSON into a `key=value` string with basic error handling.
JQ_FILTER: >-
.[0]
| if (.number == null) then error("Cannot find PR number") end
| "number=\(.number)"
run: |
gh pr list --repo "${{ github.repository }}" --state all --search "${QUERY_PR}" \
Comment thread
upils marked this conversation as resolved.
--json number --jq "${JQ_FILTER}" >> "${GITHUB_OUTPUT}"

- name: Post message to PR
uses: mshick/add-pr-comment@dd126dd8c253650d181ad9538d8b4fa218fc31e8
uses: mshick/add-pr-comment@dd126dd8c253650d181ad9538d8b4fa218fc31e8 # v2
with:
message-path: benchmark-report.txt
issue: ${{ github.event.workflow_run.pull_requests[0].number }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for context of this removal for anyone interested, github.event.workflow_run.pull_requests is only present when the PR is not a branch from a fork of the repo IIRC, so it's an unreliable source.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing something the context for this PR... how come we have the PR at hand and not the number?

This whole workflow is conditioning on github.event.workflow_run.event == 'pull_request'.

I realize that this is all being done in the name of security, but the amout of fiddling with stuff that supposedly should be simple really makes it feel like we'll see more serious issues soon.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing something the context for this PR... how come we have the PR at hand and not the number?

This workflow is triggered by the other one and is receiving a github.event.workflow_run payload. However github.event.workflow_run.pull_requests is empty when triggered from a fork. This issue was raised to GitHub multiple times, but AFAIK they never really answered nor addressed it. So we have to work around that and resort to a more convoluted method to get the PR number.

message-path: benchmark-report.md
issue: ${{ steps.get-pr-number.outputs.number }}
repo-token: ${{ secrets.ROCKSBOT_CHISEL_PR_COMMENTER }}
6 changes: 2 additions & 4 deletions .github/workflows/performance.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -65,14 +65,12 @@ jobs:
- name: Run benchmark
id: benchmark
run: |
msg_file="$(mktemp)"
echo "msg_file=$msg_file" >> $GITHUB_OUTPUT
chmod +x base head
hyperfine --export-markdown "$msg_file" "./base info --release ./chisel-releases 'python3.12_core'" -n "BASE" "./head info --release ./chisel-releases 'python3.12_core'" -n "HEAD"
hyperfine --export-markdown benchmark-report.md "./base info --release ./chisel-releases 'python3.12_core'" -n "BASE" "./head info --release ./chisel-releases 'python3.12_core'" -n "HEAD"

- name: Upload result
uses: actions/upload-artifact@v4
with:
name: benchmark-report
path: ${{ steps.benchmark.outputs.msg_file }}
path: benchmark-report.md
retention-days: 1
Loading