Skip to content

Migrate to OSDC runners and containerize the GPU workflows#179

Open
huydhn wants to merge 4 commits into
mainfrom
migrate-h100-to-osdc-runners
Open

Migrate to OSDC runners and containerize the GPU workflows#179
huydhn wants to merge 4 commits into
mainfrom
migrate-h100-to-osdc-runners

Conversation

@huydhn
Copy link
Copy Markdown
Contributor

@huydhn huydhn commented May 29, 2026

Migrates this repo off the AWS H100/A100 runners onto OSDC (ARC) runners, and rewrites the GPU workflows to run their workloads inside a job-level container: — OSDC/ARC runners are ephemeral pods with no Docker daemon, so the old docker run … docker exec pattern cannot work on them. The container pattern matches pytorch/pytorch _linux-test.yml (test-osdc) and pytorch/helion.

1. Runner label migration (arc.yaml mapping + mt- prefix)

Old OSDC label
linux.aws.a100 mt-l-x86iavx512-11-125-a100
linux.aws.h100 mt-l-x86iamx-22-225-h100
linux.aws.h100.4 mt-l-x86iamx-88-900-h100-4
linux.aws.h100.8 mt-l-bx86iamx-176-1800-h100-8

2. Containerization

  • vllm-ci-test.yml / vllm-profiling.yml — add an ubuntu-latest resolve-image pre-job (docker manifest inspect needs a daemon the pod lacks) that picks the latest available vLLM CI image and passes it to a container: (--gpus all). Drop the docker run/docker exec wrapper + /tmp/workspace bind mount; run scripts natively. run_vllm_profiling.sh now uses $GITHUB_WORKSPACE. Profiling assumes the upload IAM role via OIDC.
  • pytorch-bisect.yaml — the mt- path builds PyTorch inside pytorch/pytorch:2.12.0-cuda13.0-cudnn9-devel (conditional container:); linux.dgx.b200 keeps the bare-host path. CUDA_HOME points at the image CUDA on the container path.
  • vllm-benchmark.yml / sglang-benchmark.ymlgenerate_vllm_benchmark_matrix.py now emits a per-entry device-name; set-parameters resolves the per-device image up front and enriches the matrix with container-image + container-options; every device runs in a container: with native execution; all devices assume the upload role via OIDC.

⚠️ Needs CI validation (no OSDC/host runner available locally)

These were verified for YAML validity + the matrix/enrichment pipeline end-to-end, but the runtime behavior must be confirmed on real runners:

  • rocm / hpu container options are best-effort (/dev/kfd+/dev/dri, Habana runtime) — no verified evidence they work as job-level container.options on these pods.
  • --shm-size (4g / 32g) may be silently capped on ARC pods; large multi-GPU NCCL runs may need attention.
  • The per-model S3 “already benchmarked” dedup in the benchmarks was dropped (it needs the runtime GPU device-type, which is unknown before the container starts) — the resolved commit is now benchmarked unconditionally.
  • The bisect CUDA-devel image tag (2.12.0-cuda13.0-cudnn9-devel) and tritonparse build scripts working in-container are unverified.
  • OIDC token retrieval from inside the container for the S3 upload role.

Out of scope (not OSDC)

flash_attention.yml (b200 DGX), inductor.yml / tritonbench*.yml (AWS g5 / b200, reusable workflow) — left untouched.

🤖 Generated with Claude Code

Switch all linux.aws.h100* and linux.aws.a100 runner labels to their
OSDC/ARC equivalents. Labels follow the mapping in pytorch/pytorch
.github/arc.yaml, with the mt- (Meta multi-tenant) prefix that OSDC
production runners use:

  linux.aws.a100   -> mt-l-x86iavx512-11-125-a100   (1 GPU)
  linux.aws.h100   -> mt-l-x86iamx-22-225-h100       (1 GPU)
  linux.aws.h100.4 -> mt-l-x86iamx-88-900-h100-4     (4 GPU)
  linux.aws.h100.8 -> mt-l-bx86iamx-176-1800-h100-8  (8 GPU)

Files:
- generate_vllm_benchmark_matrix.py: TP_TO_RUNNER_MAPPING and
  RUNNER_TO_PLATFORM_MAPPING get the full label rename. In PLATFORM_SKIPS
  the skip tokens become the bare GPU-type 'h100'/'a100' so they remain a
  substring of the OSDC names, preserving the 'skip the whole family'
  behavior the substring matcher relies on (matters for h100, which has
  1/4/8-GPU variants).
- vllm-ci-test.yml, vllm-profiling.yml, pytorch-bisect.yaml: runs-on /
  runner choice updated.
- test fixture: expected runner values updated to the OSDC names.

The matrix output is unchanged except for the runner label strings
(verified: every model<->runner pairing is identical after the rename).
@huydhn huydhn force-pushed the migrate-h100-to-osdc-runners branch from 8d498d9 to a7a54a0 Compare May 29, 2026 19:18
@huydhn huydhn changed the title Migrate linux.aws.h100 runners to OSDC (ARC) runners Migrate linux.aws.h100/a100 runners to OSDC (ARC) runners May 29, 2026
OSDC/ARC runners are ephemeral pods with no Docker daemon, so the old
'docker run --gpus all + docker exec' pattern cannot work on them. Run the
vLLM CI image via the job-level container: key with options '--gpus all'
instead (the GPU is injected by the runner pod), matching pytorch/pytorch
_linux-test.yml (test-osdc) and pytorch/helion.

- vllm-ci-test.yml / vllm-profiling.yml: add an ubuntu-latest 'resolve-image'
  pre-job that runs 'docker manifest inspect' (needs a daemon the pod lacks)
  to pick the latest available vLLM CI image and pass it down as the
  container image. Drop the GPU_FLAG/docker run/docker exec wrapper and the
  /tmp/workspace bind-mount; run the scripts directly in the container.
- vllm-profiling.yml: assume the upload IAM role via OIDC before the S3
  upload (ephemeral pods have no host instance role); pass the resolved
  vLLM commit through as S3_HEAD_SHA.
- run_vllm_profiling.sh: use $GITHUB_WORKSPACE instead of the hardcoded
  /tmp/workspace bind-mount path.
@huydhn huydhn temporarily deployed to pytorch-x-vllm May 29, 2026 19:50 — with GitHub Actions Inactive
Comment on lines 33 to +82
@@ -52,11 +44,12 @@ jobs:
ref: ${{ inputs.vllm_branch || 'main' }}
fetch-depth: 0

- name: Set Docker registry
shell: bash
- name: Resolve the latest available vLLM CI image
id: resolve
working-directory: vllm
env:
HEAD_BRANCH: ${{ inputs.vllm_branch || 'main' }}
DEVICE_NAME: ${{ matrix.device-name }}
HEAD_SHA: ${{ inputs.vllm_commit || '' }}
run: |
set -eux

@@ -67,67 +60,59 @@ jobs:
DOCKER_IMAGE_PREFIX=public.ecr.aws/q9t5s3a7/vllm-ci-test-repo
fi

DOCKER_IMAGE_SUFFIX=""
if [[ "${DEVICE_NAME}" == "rocm" ]]; then
DOCKER_IMAGE_PREFIX=docker.io/rocm/vllm-ci
elif [[ "${DEVICE_NAME}" == "cpu" ]]; then
DOCKER_IMAGE_SUFFIX=-cpu
fi
echo "DOCKER_IMAGE_PREFIX=$DOCKER_IMAGE_PREFIX" >> $GITHUB_ENV
echo "DOCKER_IMAGE_SUFFIX=$DOCKER_IMAGE_SUFFIX" >> $GITHUB_ENV

- name: Check for available Docker image
working-directory: vllm
env:
HEAD_BRANCH: ${{ inputs.vllm_branch || 'main' }}
HEAD_SHA: ${{ inputs.vllm_commit || '' }}
run: |
set -eux

if [[ -z "${HEAD_SHA}" ]]; then
# Looking back the latest 100 commits is enough
for i in {0..99}
do
for i in {0..99}; do
# Check if the image is there, if it doesn't then check an older one
# because the commit is too recent
HEAD_SHA=$(git rev-parse --verify HEAD~${i})
DOCKER_IMAGE="${DOCKER_IMAGE_PREFIX}:${HEAD_SHA}${DOCKER_IMAGE_SUFFIX}"

# No Docker image available yet because the commit is too recent
DOCKER_IMAGE="${DOCKER_IMAGE_PREFIX}:${HEAD_SHA}"
if docker manifest inspect "${DOCKER_IMAGE}"; then
break
fi
done
fi

echo "HEAD_SHA=$HEAD_SHA" >> $GITHUB_ENV
echo "docker-image=${DOCKER_IMAGE_PREFIX}:${HEAD_SHA}" >> "${GITHUB_OUTPUT}"

- name: Setup CUDA GPU_FLAG for docker run
if: matrix.device-name == 'cuda'
test:
name: Run vLLM tests
needs: resolve-image
if: ${{ !github.event.pull_request.head.repo.fork && github.repository_owner == 'pytorch' }}
strategy:
Comment on lines 33 to +80
@@ -124,98 +60,86 @@ jobs:
DOCKER_IMAGE_PREFIX=public.ecr.aws/q9t5s3a7/vllm-ci-test-repo
fi

DOCKER_IMAGE_SUFFIX=""
if [[ "${DEVICE_NAME}" == "rocm" ]]; then
DOCKER_IMAGE_PREFIX=docker.io/rocm/vllm-ci
elif [[ "${DEVICE_NAME}" == "cpu" ]]; then
DOCKER_IMAGE_SUFFIX=-cpu
fi
echo "DOCKER_IMAGE_PREFIX=$DOCKER_IMAGE_PREFIX" >> $GITHUB_ENV
echo "DOCKER_IMAGE_SUFFIX=$DOCKER_IMAGE_SUFFIX" >> $GITHUB_ENV

- name: Check for last commit
working-directory: vllm-profiling/vllm
env:
HEAD_BRANCH: ${{ inputs.vllm_branch || 'main' }}
HEAD_SHA: ${{ inputs.vllm_commit || '' }}
run: |
set -eux

if [[ -z "${HEAD_SHA}" ]]; then
for i in {0..99}
do
for i in {0..99}; do
HEAD_SHA=$(git rev-parse --verify HEAD~${i})
DOCKER_IMAGE="${DOCKER_IMAGE_PREFIX}:${HEAD_SHA}${DOCKER_IMAGE_SUFFIX}"

DOCKER_IMAGE="${DOCKER_IMAGE_PREFIX}:${HEAD_SHA}"
# Docker image available for this commit, then exit
if docker manifest inspect "${DOCKER_IMAGE}"; then
break
fi
done
fi

echo "HEAD_SHA=$HEAD_SHA" >> $GITHUB_ENV
echo "docker-image=${DOCKER_IMAGE_PREFIX}:${HEAD_SHA}" >> "${GITHUB_OUTPUT}"
echo "head-sha=${HEAD_SHA}" >> "${GITHUB_OUTPUT}"
echo "### Run profiling on [${HEAD_SHA}](https://github.com/vllm-project/vllm/commit/${HEAD_SHA})" >> "${GITHUB_STEP_SUMMARY}"

- name: Setup CUDA GPU_FLAG for docker run
if: env.DEVICE_NAME == 'cuda'
profiling:
name: Run vLLM profiling
needs: resolve-image
…ainer

The mt- runner is an ephemeral OSDC pod with no host CUDA toolchain, so build
PyTorch inside pytorch/pytorch:2.12.0-cuda13.0-cudnn9-devel (--gpus all) when
the mt- runner is selected; linux.dgx.b200 keeps the existing bare-host path
(conditional container via fromJSON('null')). CUDA_HOME points at the image's
/usr/local/cuda on the container path (run.sh requires it non-empty). Add a
git safe.directory step for the root-owned in-container checkout.
@huydhn huydhn temporarily deployed to pytorch-x-vllm May 29, 2026 20:06 — with GitHub Actions Inactive
Run every matrix device inside a job-level container: instead of the old
'docker run + docker exec' pattern, since OSDC/ARC pods have no Docker daemon.

- generate_vllm_benchmark_matrix.py: emit a per-entry 'device-name' so the
  workflow can resolve the container image up front (regenerated the test
  fixture, which also clears pre-existing config drift).
- set-parameters: resolve the upstream image on ubuntu-latest (which has a
  daemon) via 'docker manifest inspect', then enrich every matrix entry with
  container-image + device-appropriate container-options. sglang resolves per
  image suffix (cuda / -cu128-b200 / -rocm630-mi30x) and skips non-cuda/rocm
  devices instead of failing the whole matrix.
- benchmarks job: add container: { image, options }, drop the device probe
  (device-name comes from the matrix) while keeping the runtime DEVICE_TYPE
  detection, run the benchmark script natively, and assume the upload IAM
  role via OIDC for all devices (no host instance role inside a pod).
  chown is made sudo-optional for the in-container root user.

Flagged for CI validation: the per-model S3 'already benchmarked' dedup is
dropped (needs the runtime device-type before the container exists); the
rocm/hpu container options are best-effort; and --shm-size may be capped on
ARC pods.
@huydhn huydhn changed the title Migrate linux.aws.h100/a100 runners to OSDC (ARC) runners Migrate to OSDC runners and containerize the GPU workflows May 29, 2026
@huydhn huydhn temporarily deployed to pytorch-x-vllm May 29, 2026 20:33 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants