Skip to content

feat: integrate vLLM artifact transfer#451

Open
zhengluo-nv wants to merge 5 commits into
mainfrom
zheluo/artifact-transfer-pr-4-vllm-loader
Open

feat: integrate vLLM artifact transfer#451
zhengluo-nv wants to merge 5 commits into
mainfrom
zheluo/artifact-transfer-pr-4-vllm-loader

Conversation

@zhengluo-nv

@zhengluo-nv zhengluo-nv commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Overview

This PR integrates cache artifact transfer into the vLLM ModelExpress loader so a target replica can reuse runtime-generated JIT caches from an already-warm source replica before vLLM starts model initialization.

The implementation covers three cache families:

  • torch compile cache: VLLM_CACHE_ROOT/torch_compile_cache
  • Triton cache: TRITON_CACHE_DIR or ~/.triton/cache
  • DeepGEMM cache: DG_JIT_CACHE_DIR, DEEP_GEMM_CACHE_DIR, or VLLM_CACHE_ROOT/deep_gemm

Artifact transfer is opt-in through MX_ARTIFACT_TRANSFER=1 and requires the P2P metadata path. If artifact transfer is enabled while MX_P2P_METADATA=0, the loader logs a warning and skips artifact transfer rather than changing the metadata default.

What changed

  • Installs compatible vLLM cache artifacts before initialize_model() runs.
  • Schedules artifact publication after model load, gated by vLLM readiness and a short cache-settle check so generated cache files are present before publishing.
  • Refactors the heartbeat path into PublisherThread, which can retry initial publication and then heartbeat READY status.
  • Shares the existing rank-local P2P WorkerGrpcServer for both tensor manifests and artifact manifests/chunk serving, instead of starting one artifact server per artifact source.
  • Publishes each artifact type once per pod/cache root to avoid multiple TP ranks untarring into the same cache directory.
  • Adds TIMING logs for artifact prepare, transfer, install, publish, and vLLM artifact install/publish wrapper timing.
  • Adds CI knobs and tests for artifact source publication and target artifact install.
  • Updates docs for artifact transfer env vars, P2P metadata requirement, and publisher-thread behavior.

Benchmark

DeepSeek-V4-Pro standalone vLLM on nscale B200:

  • vLLM 0.23.0
  • TP=8
  • --enable-flashinfer-autotune
  • ModelExpress P2P weights via RDMA
  • Cache artifacts enabled for Triton + DeepGEMM
  • torch.compile cache was not used because vLLM reports DeepSeek-V4-Pro does not support torch.compile
Scenario API ready K8s Ready Notes
Cold start replica 8m12s 8m17s Rerun after node OS page-cache drop; disk load path, no P2P source
P2P weights only replica 7m06s 7m14s RDMA weights in ~4.0-4.1s/rank at 212-219 Gbps
P2P weights + cache artifacts replica 3m17s 3m17s Triton install 0.223s, DeepGEMM install 0.276s

Additional timing detail:

  • Cold start rerun after OS page-cache drop: loader time ~77.8-80.1s, graph capture 109s, vLLM engine init 351.6s
  • P2P weights only loader time: ~18.2-18.5s
  • P2P weights + cache loader time: ~15.2-15.7s
  • P2P weights only graph capture: 108s
  • P2P weights + cache graph capture: 75s
  • P2P weights only vLLM engine init: 350.6s
  • P2P weights + cache vLLM engine init: 122.91s

Impact in this run:

  • P2P weights only: 8m12s -> 7m06s API ready
  • P2P weights + caches: 8m12s -> 3m17s API ready
  • Cache artifacts reduced the later vLLM warmup/init work, not the RDMA weight-transfer time.

Correctness sanity:

  • Source chat request returned Four
  • Target chat request returned Four

Validation

  • uv run --project modelexpress_client/python --extra dev pytest modelexpress_client/python/tests/test_vllm_artifacts.py modelexpress_client/python/tests/test_artifact_transfer.py modelexpress_client/python/tests/test_heartbeat.py -q
  • python3 -m py_compile ci/k8s/client/test_p2p_k8s.py
  • nscale DeepSeek-V4-Pro TP=8 e2e benchmark above

Summary by CodeRabbit

  • New Features

    • Added vLLM cache artifact transfer for peer-to-peer deployments, enabling automatic installation of torch-compile, Triton, and DeepGEMM caches across workers.
    • Introduced artifact readiness checking and health validation for reliable transfer.
    • New CI test to validate artifact transfer behavior in Kubernetes P2P jobs.
  • Documentation

    • Updated deployment and architecture documentation with artifact-transfer configuration and workflow details.
  • Chores

    • Upgraded vLLM Docker base image to v0.23.0.

@copy-pr-bot copy-pr-bot Bot temporarily deployed to automated-release June 23, 2026 23:06 Inactive
@copy-pr-bot copy-pr-bot Bot temporarily deployed to automated-release June 23, 2026 23:06 Inactive
@github-actions github-actions Bot added the feat label Jun 23, 2026
@zhengluo-nv zhengluo-nv marked this pull request as ready for review June 23, 2026 23:11
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

Introduces end-to-end vLLM cache artifact transfer in ModelExpress. HeartbeatThread is replaced by a general PublisherThread supporting deferred publish callbacks and readiness gating. WorkerGrpcServer is refactored to serve multiple artifact sources keyed by mx_source_id. publish_artifact_source now accepts a pre-started server. A new artifacts.py module installs and schedules publication of torch-compile, Triton, and DeepGEMM caches, wired into the loader. CI/CD is extended to assert artifact transfer completion in the vLLM test matrix.

Changes

vLLM Cache Artifact Transfer

Layer / File(s) Summary
PublisherThread replaces HeartbeatThread
modelexpress_client/python/modelexpress/metadata/heartbeat.py, modelexpress_client/python/modelexpress/__init__.py
heartbeat.py is rewritten as PublisherThread supporting heartbeat-only, retried-publish, and publish-then-stop modes with optional ready_fn/cleanup_fn; HeartbeatThread becomes an alias. PublisherThread is added to the package __all__.
WorkerGrpcServer multi-source artifact serving
modelexpress_client/python/modelexpress/metadata/worker_server.py
WorkerServiceServicer replaces single-artifact fields with a locked _artifact_sources map and _select_artifact_source helper. All artifact RPCs route through the selected source. WorkerGrpcServer accepts optional mx_source_id and exposes register_artifact_source/unregister_artifact_source/set_mx_source_id.
artifact_transfer.py API refactor and timing logs
modelexpress_client/python/modelexpress/metadata/artifact_transfer.py
publish_artifact_source now takes a pre-started WorkerGrpcServer instead of a port, registers/unregisters artifact sources on it, and PublishedArtifactSource.stop() unregisters instead of stopping the server. worker_rank becomes int|None. [TIMING] log lines with throughput are added to prepare_source, install, and transfer_artifact_from_worker.
publish.py migration to PublisherThread
modelexpress_client/python/modelexpress/metadata/publish.py
publish_metadata_and_ready creates WorkerGrpcServer with mx_source_id=None, defers _publish_metadata_to_server into a publish_fn callback given to PublisherThread, and sets mx_source_id on the server post-publish. A cleanup_fn stops and removes the server for P2P paths.
New vLLM cache artifacts module
modelexpress_client/python/modelexpress/engines/vllm/artifacts.py
New module adds install_vllm_cache_artifacts (one-time marker-gated installs of torch-compile, Triton, DeepGEMM caches) and schedule_vllm_cache_artifact_publish (schedules PublisherThread per cache type, readiness-gated by vLLM HTTP health + cache directory stability). Includes NIXL manager init, artifact identity construction, SHA256 marker keys, and version/GPU-arch helpers.
Loader wiring
modelexpress_client/python/modelexpress/engines/vllm/loader.py
load_model() calls install_vllm_cache_artifacts(ctx) inside maybe_enter_vmm_arena and schedule_vllm_cache_artifact_publish(ctx) after registry updates.
Tests: PublisherThread, artifact transfer, and loader publish flow
modelexpress_client/python/tests/test_heartbeat.py, modelexpress_client/python/tests/test_artifact_transfer.py, modelexpress_client/python/tests/test_vllm_loader.py
TestPublisherPublishAndReady tests first-tick publish+READY, ready_fn gating, and heartbeat_after_publish=False. Artifact transfer tests add a shared-port tensor+artifact gRPC test, caplog [TIMING] assertions, and updated publish_artifact_source tests using _FakeWorkerGrpcServer. Loader tests validate PublisherThread start, publish_fn retry behavior, P2P gRPC server ordering, and set_mx_source_id.
Tests: vLLM cache artifacts module
modelexpress_client/python/tests/test_vllm_artifacts.py
New test module covers default-off install, P2P metadata disabled warning, NIXL init failure skip, artifact identity fields, distinct transfer source types and cache roots, one-time install marker, publisher scheduling with readiness gating, _vllm_artifact_ready_fn health+settle gating, and _vllm_health_url derivation.
CI/CD: artifact transfer assertions and K8s manifest
.github/actions/run-mx-p2p-test/action.yml, .github/workflows/modelexpress-ci-tests.yml, ci/k8s/client/conftest.py, ci/k8s/client/test_p2p_k8s.py, ci/k8s/client/vllm/manifest-azure.yaml
Action adds expected_artifact_sources input, bash CR-counting helpers, a wait loop, and conditional pytest flags. Workflow sets expected_artifact_sources=1 for vLLM. conftest.py adds --require-artifact-transfer / --expected-artifact-sources options and fixtures. test_p2p_k8s.py adds _artifact_source_count and test_artifact_transfer. Azure manifest enables MX_P2P_METADATA, MX_ARTIFACT_TRANSFER, and MX_ARTIFACT_READY_URL.
Docs and Dockerfile
docs/ARCHITECTURE.md, docs/DEPLOYMENT.md, docs/metadata.md, examples/p2p_transfer_k8s/client/vllm/Dockerfile
Architecture and metadata docs replace HeartbeatThread with PublisherThread; deployment docs add artifact-transfer env vars, update MX_WORKER_GRPC_PORT default to 6555, and add a P2P artifact transfer configuration section. Dockerfile bumps vllm-openai base from v0.17.1 to v0.23.0.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐇 Hop, hop! A new PublisherThread sprouts,
Cache artifacts bundled without any doubts.
The worker gRPC serves many at once,
No heartbeat left dangling — no silly stunts!
vLLM caches flow peer-to-peer through the night,
This bunny ships artifacts with pure delight! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.07% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: integrate vLLM artifact transfer' directly and clearly summarizes the main change in the pull request.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/p2p_transfer_k8s/client/vllm/Dockerfile`:
- Line 4: Update the verification claim in docs/ARCHITECTURE.md that currently
documents the adopt_hidden_tensors() fix verification only on v0.17.1 and
v0.19.0. Since v0.23.0 is now established as the canonical vLLM version in the
Dockerfile, update ARCHITECTURE.md to reflect that the fix has been verified on
v0.23.0 and include the context of testing with DeepSeek-V4-Pro as mentioned in
this PR scope. Ensure the documentation reflects v0.23.0 as the current baseline
version while you may retain historical version references if needed for
context.

In `@modelexpress_client/python/modelexpress/engines/vllm/artifacts.py`:
- Around line 349-350: The code uses raw int(os.environ.get(...)) for parsing
environment variables at two locations, which will raise a ValueError if the
environment variables contain malformed values and cause load_model() to fail.
Wrap the int() conversion for both the MX_METADATA_PORT and the second
environment variable (around line 539) in a try-except block that catches
ValueError, logs a warning message indicating the invalid value and that the
default is being used, and then gracefully falls back to the default value
instead of allowing the exception to propagate and break artifact loading.
- Around line 512-525: The is_vllm_server_ready() function calls
urllib.request.urlopen() with a URL that may come from the environment variable
_READY_URL_ENV without validating the URL scheme, which could allow dangerous
protocols like file://, gopher://, etc. Add URL scheme validation in the
_vllm_health_url() function to ensure the returned URL only uses http or https
schemes, and raise an appropriate exception or return a safe default if the
scheme is invalid. This validation should occur before the URL is used in the
urlopen() call within is_vllm_server_ready().

In `@modelexpress_client/python/modelexpress/metadata/artifact_transfer.py`:
- Around line 394-396: The current guard at the artifact publish location
(checking if worker_grpc_server.port is None) only catches servers that were
never started, not ones that have been stopped. In the WorkerGrpcServer.stop()
method, the _port is not cleared when the server stops, allowing stopped servers
to still return an old port. Fix this by either clearing the _port (and
_servicer if applicable) in the WorkerGrpcServer.stop() method so the port check
correctly identifies stopped servers, or alternatively add an explicit
is_started property to WorkerGrpcServer that properly tracks the server state
and use that property in the guard instead of checking if port is None.

In `@modelexpress_client/python/modelexpress/metadata/heartbeat.py`:
- Around line 133-139: A race condition exists where stop() can return after the
thread join times out while _tick() is still running, allowing _tick() to
publish a READY status after _mark_stale() has been called. To fix this, add a
check within the _tick() method to prevent publishing or sending READY status if
_stop_event has already been set. This ensures that once stop() has been
invoked, no READY announcements will be sent even if _tick() continues executing
briefly. The same prevention logic should also be applied to any other methods
listed in the comment (around lines 206-241 and 243-257) that may publish status
updates.

In `@modelexpress_client/python/modelexpress/metadata/publish.py`:
- Around line 39-40: The cleanup callback for worker servers is removing entries
from _worker_servers unconditionally by device_id, which can cause a race
condition where a stale publisher cleanup removes a newly registered server for
the same device. Guard the cleanup operation by checking that the current value
in _worker_servers for that device_id still matches the server instance being
cleaned up before removing it. This ensures only the actual stale server is
removed and not a newer registration that arrived after this server was
initially registered.

In `@modelexpress_client/python/modelexpress/metadata/worker_server.py`:
- Around line 84-90: The unregister_artifact_source method removes the entire
source from _artifact_sources dictionary when a single artifact_id is found,
making other manifests under the same mx_source_id unreachable. Instead of
calling pop on _artifact_sources, only remove the specific artifact_id from the
source.manifests dictionary. You may optionally remove the entire source only if
no manifests remain after this deletion to maintain a clean state.
- Around line 349-378: The new methods set_mx_source_id,
register_artifact_source, and unregister_artifact_source use _servicer and _port
as liveness guards to detect if the server is running, but the stop() method is
not clearing these fields when the server shuts down. This allows a stopped
server to still pass these checks and accept operations on dead endpoints.
Locate the stop() method in this class and ensure it explicitly sets both
_servicer and _port to None when the server is stopped, so these fields properly
serve as liveness indicators for the guard checks in the new helper methods.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: aef8a993-785d-41bc-bfe1-bb0b51229909

📥 Commits

Reviewing files that changed from the base of the PR and between 485dfaf and 443c985.

📒 Files selected for processing (20)
  • .github/actions/run-mx-p2p-test/action.yml
  • .github/workflows/modelexpress-ci-tests.yml
  • ci/k8s/client/conftest.py
  • ci/k8s/client/test_p2p_k8s.py
  • ci/k8s/client/vllm/manifest-azure.yaml
  • docs/ARCHITECTURE.md
  • docs/DEPLOYMENT.md
  • docs/metadata.md
  • examples/p2p_transfer_k8s/client/vllm/Dockerfile
  • modelexpress_client/python/modelexpress/__init__.py
  • modelexpress_client/python/modelexpress/engines/vllm/artifacts.py
  • modelexpress_client/python/modelexpress/engines/vllm/loader.py
  • modelexpress_client/python/modelexpress/metadata/artifact_transfer.py
  • modelexpress_client/python/modelexpress/metadata/heartbeat.py
  • modelexpress_client/python/modelexpress/metadata/publish.py
  • modelexpress_client/python/modelexpress/metadata/worker_server.py
  • modelexpress_client/python/tests/test_artifact_transfer.py
  • modelexpress_client/python/tests/test_heartbeat.py
  • modelexpress_client/python/tests/test_vllm_artifacts.py
  • modelexpress_client/python/tests/test_vllm_loader.py

Comment thread examples/p2p_transfer_k8s/client/vllm/Dockerfile Outdated
Comment thread modelexpress_client/python/modelexpress/engines/vllm/artifacts.py Outdated
Comment thread modelexpress_client/python/modelexpress/engines/vllm/artifacts.py
Comment thread modelexpress_client/python/modelexpress/metadata/heartbeat.py
Comment thread modelexpress_client/python/modelexpress/metadata/publish.py
Comment thread modelexpress_client/python/modelexpress/metadata/worker_server.py
Comment thread modelexpress_client/python/modelexpress/metadata/worker_server.py
@zhengluo-nv zhengluo-nv self-assigned this Jun 24, 2026
@zhengluo-nv zhengluo-nv force-pushed the zheluo/artifact-transfer-pr-4-vllm-loader branch from 11a10c4 to bb0afaa Compare June 24, 2026 17:20
@copy-pr-bot copy-pr-bot Bot temporarily deployed to automated-release June 24, 2026 17:20 Inactive
@copy-pr-bot copy-pr-bot Bot temporarily deployed to automated-release June 24, 2026 17:20 Inactive
@copy-pr-bot copy-pr-bot Bot temporarily deployed to automated-release June 24, 2026 19:04 Inactive
@copy-pr-bot copy-pr-bot Bot temporarily deployed to automated-release June 24, 2026 19:04 Inactive
@ganeshku1 ganeshku1 requested a review from nv-hwoo June 24, 2026 20:17
Signed-off-by: Zheng Luo <zheluo@nvidia.com>
Signed-off-by: Zheng Luo <zheluo@nvidia.com>
Signed-off-by: Zheng Luo <zheluo@nvidia.com>
Signed-off-by: Zheng Luo <zheluo@nvidia.com>
@zhengluo-nv zhengluo-nv force-pushed the zheluo/artifact-transfer-pr-4-vllm-loader branch from cefea86 to 695d385 Compare June 24, 2026 22:10
@copy-pr-bot copy-pr-bot Bot temporarily deployed to automated-release June 24, 2026 22:10 Inactive
@copy-pr-bot copy-pr-bot Bot temporarily deployed to automated-release June 24, 2026 22:10 Inactive
@copy-pr-bot copy-pr-bot Bot temporarily deployed to automated-release June 25, 2026 21:05 Inactive
@copy-pr-bot copy-pr-bot Bot temporarily deployed to automated-release June 25, 2026 21:05 Inactive

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions:

  1. should be update the file name to publisher.py as well?
  2. should we leave heartbeat thread as is and extend with publisher thread as a subclass? we then keep the simple heartbeat thread for cases not involving artifact publishing jobs.
  3. with artifact transfer enabled, does the source become READY only after all the artifacts are done transferring?

Comment on lines +479 to +501
def _triton_cache_identity(ctx: LoadContext) -> p2p_pb2.SourceIdentity:
# Triton cache entries are self-keyed by source/options; share by runtime stack.
identity = p2p_pb2.SourceIdentity(
mx_source_type=p2p_pb2.MX_SOURCE_TYPE_TRITON_CACHE,
backend_framework=p2p_pb2.BACKEND_FRAMEWORK_VLLM,
cuda_version=torch.version.cuda or "",
triton_version=_triton_version(),
gpu_arch=_gpu_arch(ctx.device_id),
)
_set_extra_if_present(identity, "triton_key", _triton_key())
return identity


def _deep_gemm_cache_identity(ctx: LoadContext) -> p2p_pb2.SourceIdentity:
# DeepGEMM cache entries are self-keyed by JIT source/compiler; share by runtime.
identity = p2p_pb2.SourceIdentity(
mx_source_type=p2p_pb2.MX_SOURCE_TYPE_DEEP_GEMM_CACHE,
backend_framework=p2p_pb2.BACKEND_FRAMEWORK_VLLM,
cuda_version=torch.version.cuda or "",
gpu_arch=_gpu_arch(ctx.device_id),
)
_set_extra_if_present(identity, "deep_gemm_jit_key", _deep_gemm_jit_key())
return identity

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does the two identity work without specifying model_name like torch.compile identity? it seems like the server rejects any identity with empty model name.

# vLLM JIT cache artifacts are pod-scoped, so one successful install per pod
# is enough for all local workers.
marker_path = _artifact_marker_path(transfer, identity, "install")
with _artifact_lock(marker_path):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On success this is intended (other TP ranks block until the cache is installed once per pod). But when a transfer fails after discovery (source went STALE or RDMA error mid-stream) the install marker is never written, the lock releases, and the next blocked rank re-runs the full transfer from scratch. With N TP ranks this serializes into up to N x 120s of blocked load_model before all ranks give up (e.g. for TP8, this translates to 8 x 120s = 16mins).

Fix suggestion: Mitigate with a pod-level negative-result marker (short TTL), a hard cap of one transfer attempt per pod, or a shorter timeout on the install path.

self._cleanup()
logger.info(f"[Worker {self._worker_rank}] Publisher thread stopped")

def _on_exit(self) -> None:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness warning: _on_exit (atexit) sets _stop_event then calls _mark_stale() -> _update_status(SOURCE_STATUS_STALE) WITHOUT joining the worker thread -- unlike stop(), which joins first (line 137). The daemon _run/_tick thread can already be past its _stop_event.is_set() checks (lines 251 and 263) and then send _update_status(SOURCE_STATUS_READY) at line 265 after the STALE, leaving the source advertised READY until the server-side reaper TTL. That directly defeats the documented guarantee 'sends UpdateStatus(STALE) for immediate detection without waiting for the reaper timeout' (docstring lines 30-32), which is exactly the P2P failover latency this project cares about. The window is narrow but real and the fix is cheap: join the thread (bounded) in _on_exit, or serialize _mark_stale against _update_status(READY) so STALE always wins.

marker_path.unlink(missing_ok=True)


def _install_vllm_cache_artifact_once(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security warning: _install_vllm_cache_artifact_once -> transfer.install(header) unpacks a tar fetched from a peer replica into the live torch_compile / triton / deep_gemm cache dirs, which the engine then loads and executes (TorchInductor generated code + .so, Triton .cubin/.ptx, DeepGEMM JIT kernels). The integrity chain (_crc32c_hex per chunk + artifact_manifest_id sha256 recomputed in _validate_fetched_artifact_manifest) only proves the received bytes match what the source advertised; it does NOT authenticate the source or attest the content. Any replica (or a compromised MX metadata server) able to publish a source with a matching SourceIdentity can therefore ship arbitrary executable code to every target that pulls it -> RCE across the deployment. It is opt-in (MX_ARTIFACT_TRANSFER=1) and identity-matched, but identity-match is not authorization. Document this trust boundary in docs/DEPLOYMENT.md and docs/ARCHITECTURE.md, require the worker gRPC + MX endpoints to be network-isolated to the trusted deployment, and consider artifact signing/attestation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants