Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 15 additions & 14 deletions .github/workflows/all_test.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: ci_2_virt_node

on:
workflow_dispatch:
push:
pull_request:
types:
Expand Down Expand Up @@ -121,18 +122,18 @@ jobs:
- name: Normalize ci_2_virt_node debug artifact permissions
if: ${{ always() }}
run: |
chmod -R a+rX .dever || true
chmod -R a+rX ci_2_virt_node_workdir || true
chmod -R a+rX fluxon_release || true
chmod -R a+rX setup_and_pack/nix/runs || true
find .dever -type f -name '*.yaml' -exec chmod a+r {} + || true
find .dever -type f -name '*.log' -exec chmod a+r {} + || true
find .dever -type f -name '*.json' -exec chmod a+r {} + || true
find .dever -type f -name '*.html' -exec chmod a+r {} + || true
find .dever -type f -name '*.txt' -exec chmod a+r {} + || true
find .dever -type f -name '*.sha256' -exec chmod a+r {} + || true
find .dever -type d -path '*/pack_release_runtime/*' -exec chmod a+rx {} + || true
find .dever -type d -path '*/pack_release_runtime/project-data/*' -exec chmod a+rx {} + || true
find .dever -path '*/pack_release_runtime/project-data/*' \
find ci_2_virt_node_workdir -type f -name '*.yaml' -exec chmod a+r {} + || true
find ci_2_virt_node_workdir -type f -name '*.log' -exec chmod a+r {} + || true
find ci_2_virt_node_workdir -type f -name '*.json' -exec chmod a+r {} + || true
find ci_2_virt_node_workdir -type f -name '*.html' -exec chmod a+r {} + || true
find ci_2_virt_node_workdir -type f -name '*.txt' -exec chmod a+r {} + || true
find ci_2_virt_node_workdir -type f -name '*.sha256' -exec chmod a+r {} + || true
find ci_2_virt_node_workdir -type d -path '*/pack_release_runtime/*' -exec chmod a+rx {} + || true
find ci_2_virt_node_workdir -type d -path '*/pack_release_runtime/project-data/*' -exec chmod a+rx {} + || true
find ci_2_virt_node_workdir -path '*/pack_release_runtime/project-data/*' \
\( -path '*/instances/*/logs' -o -path '*/instances/*/release' -o -path '*/assemblies/*/profile' \) \
-exec chmod -R a+rX {} + || true

Expand All @@ -144,9 +145,9 @@ jobs:
if-no-files-found: warn
compression-level: 1
path: |
.dever/**
ci_2_virt_node_workdir/**
fluxon_release/**
setup_and_pack/nix/runs/**
.dever/**/pack_release_runtime/project-data/**/instances/**/logs/**
.dever/**/pack_release_runtime/project-data/**/instances/**/release/**
.dever/**/pack_release_runtime/project-data/**/assemblies/**/profile/**
ci_2_virt_node_workdir/**/pack_release_runtime/project-data/**/instances/**/logs/**
ci_2_virt_node_workdir/**/pack_release_runtime/project-data/**/instances/**/release/**
ci_2_virt_node_workdir/**/pack_release_runtime/project-data/**/assemblies/**/profile/**
93 changes: 93 additions & 0 deletions .github/workflows/remote_testbed_bench.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
name: remote_testbed_bench

on:
workflow_dispatch:
inputs:
bootstrap_mode:
description: "Remote testbed bootstrap mode"
required: true
default: "bare_then_apply"
type: choice
options:
- bare_then_apply
- apply_only
- bare_only

permissions:
contents: read

jobs:
remote-testbed-bench:
runs-on: ubuntu-latest
steps:
- name: Check out repository
uses: actions/checkout@v6
with:
fetch-depth: 0

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.10"

- name: Install host dependencies
run: |
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
git \
pigz \
protobuf-compiler \
python3-venv \
rsync \
sshpass

- name: Install Python dependencies
run: python3 -m pip install PyYAML

- name: Sync rather_no_git_submodule workspace inputs
run: python3 fluxon_rs/scripts/rather_no_git_submodule.py

- name: Write local remote testbed config
env:
FLUXON_REMOTE_TESTBED_LOCAL_CONFIG_YAML: ${{ secrets.FLUXON_REMOTE_TESTBED_LOCAL_CONFIG_YAML }}
run: |
rm -f ci_remote_testbed.local.yaml
python3 - <<'PY'
import os
from pathlib import Path
import yaml

raw = os.environ.get("FLUXON_REMOTE_TESTBED_LOCAL_CONFIG_YAML", "")
if not raw.strip():
raise SystemExit("missing secret FLUXON_REMOTE_TESTBED_LOCAL_CONFIG_YAML")
payload = yaml.safe_load(raw)
if not isinstance(payload, dict):
raise SystemExit("FLUXON_REMOTE_TESTBED_LOCAL_CONFIG_YAML must decode to a YAML mapping")
Path("ci_remote_testbed.local.yaml").write_text(
yaml.safe_dump(payload, sort_keys=False, allow_unicode=False),
encoding="utf-8",
)
PY

- name: Run remote shared-testbed benchmark flow
run: |
python3 fluxon_test_stack/ci_remote_testbed.py \
--bootstrap-mode "${{ inputs.bootstrap_mode }}" \
--print-generated

- name: Normalize remote testbed debug artifact permissions
if: ${{ always() }}
run: |
chmod -R a+rX ci_remote_testbed_workdir || true
chmod -R a+rX fluxon_release || true

- name: Upload remote testbed debug artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: remote-testbed-bench-debug-${{ github.sha }}
if-no-files-found: warn
compression-level: 1
path: |
ci_remote_testbed_workdir/**
fluxon_release/**
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -52,11 +52,13 @@ node_modules
fluxon_test_stack/bench_runner/
fluxon_test_stack/start_test_bed/
fluxon_test_stack/.manual_dispatch_release_tmp/
.dever
fluxon_release_*
*.exit
bench_suite.lock
.bench-venv
deployment/local/
setup_and_pack/pack_fluxonkv_pylib_env.yaml
fluxon_rs/moka/
/ci_remote_testbed.local.yaml
/ci_remote_testbed_workdir/
/ci_2_virt_node_workdir/
2 changes: 1 addition & 1 deletion deployment/manual_dispatch_release.py
Original file line number Diff line number Diff line change
Expand Up @@ -509,7 +509,7 @@ def _test_rsc_manifest_relpaths(*, src_release_dir: Path, dispatch_release_scope

def _dispatch_tmp_root(*, deployconf_path: Path) -> Path:
# English note:
# - Do not inherit TMPDIR from outer automation (it may point into a tool-managed .dever namespace).
# - Do not inherit TMPDIR from outer automation (it may point into a tool-managed workspace namespace).
# - Keep temp artifacts next to the deployconf so the path is deterministic and discoverable.
p = deployconf_path.resolve().parent / ".manual_dispatch_release_tmp"
p.mkdir(parents=True, exist_ok=True)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -695,6 +695,53 @@ GitHub Actions 主窗口中的许多日志并非本地直接打印,而是由 `

因此,GitHub Actions 现在覆盖的是“由单一 `ci_2_virt_node.py` 入口启动,并通过 top-attention CI scene 执行 workload”这条真实 CI 路径,而不是在 suite 里再并存一层旧 scene。

### 9.2 GitHub Actions 远端集群扩展:`ci_remote_testbed.py`

**稳定结论:**

- `ci_remote_testbed.py` 不是另一套 runner,它是 `ci_2_virt_node.py` 的远端共享 testbed 扩展,最终仍然复用 `test_runner.py`。
- 它把一次 GitHub Actions 触发固定拆成两个 phase:`ci` 和 `benchmark`。
- `ci` phase 直接继承仓库里的 canonical CI scene catalog;`benchmark` phase 只保留远端集群 `supported_topologies > 1` 的多机拓扑。
- 本地远端配置只走 `ci_remote_testbed.local.yaml`,且必须是 YAML mapping;敏感 SSH / bastion / controller exec 信息只进入 `remote_auth.yaml`,不进入 manifest。

| phase | 输入来源 | 选择规则 | 产物 |
| --- | --- | --- | --- |
| `ci` | `ci_test_list.yaml` | scene id 复用 canonical CI catalog,profile id 直接沿用 suite 声明 | `generated/ci.yaml` |
| `benchmark` | `benchmark_full_matrix.yaml` | 只保留远端集群 `supported_topologies > 1` 的 multi-machine topology,对应的 scene / scale 才进入执行计划 | `generated/benchmark.yaml` |

固定执行链路如下:

```text
GitHub Actions workflow_dispatch
-> write ci_remote_testbed.local.yaml from secret YAML
-> ci_remote_testbed.py
-> generate ci.yaml + benchmark.yaml
-> pack release once
-> dispatch once
-> start shared testbed once
-> test_runner.py once for ci
-> test_runner.py once for benchmark
```

- `phase_runs` 是这两个 runner 调用之间的稳定连接面,记录 `phase_name`、`suite_path`、`runner_workdir`、`scene_ids`、`profile_ids`、`allowed_scale_topologies`。
- workflow 只负责触发和落地本地 YAML,不承载实际测试语义;测试语义仍由 `ci_remote_testbed.py` 和 `test_runner.py` 共同决定。

### 9.3 远端触发的实际链路

`ci_remote_testbed.py` 的远端执行不是“GitHub 每次都在远端直接跑一整套脚本再退出”,而是固定为:

1. 在本地生成 `ci_remote_testbed.local.yaml` 和派生 bundle。
2. 通过一次 SSH 触发 `controller_exec_host` 上的远端 launcher。
3. 远端 launcher 在 `controller_exec_host` 上后台启动 `remote_runner.py`。
4. GitHub Actions 继续通过同一个 `controller_exec_host` 轮询 `.remote_runner_exit_code` 和 `remote_runner.launch.log`。
5. `remote_runner.py` 在远端按 phase 顺序调用 `test_runner.py`,先跑 `ci`,再跑 `benchmark`。

这里的关键边界是:

- SSH 触发只发生一次;
- 后续状态收敛依赖轮询,而不是重复 SSH 启动;
- `test_runner.py` 始终运行在远端机器上,不在 GitHub runner 本地执行。

## 10. GitOps 与 UI 的归属

GitOps 挂在 test_runner UI 服务下。这里的约束是不额外拆出第二个独立控制面服务,不是要求 UI 随某一次测试 run 一起退出。
Expand Down Expand Up @@ -767,5 +814,5 @@ GitOps 挂在 test_runner UI 服务下。这里的约束是不额外拆出第二
- 先准备 / 启动 testbed;
- 再由 `test_runner.py` 执行 suite。
- 对 `CI` 实现来说,远端 `ci_runner.sh` 负责执行命令,`test_runner.py` 持有 case 执行 authority。
- `ci_2_virt_node.py` 只是把“本地双逻辑节点环境下的标准 CI 流程”封装出来,不改变 runner 的核心分层。
- `ci_2_virt_node.py` 只是把“本地双逻辑节点环境下的标准 CI 流程”封装出来;`ci_remote_testbed.py` 则把同一套 runner 扩展到 GitHub Actions 触发的远端共享 testbed,不改变 runner 的核心分层。
- UI 和 GitOps 都属于 `test_runner` 服务面;其中 UI 应作为常驻服务运行,不构成额外的测试执行框架。
2 changes: 1 addition & 1 deletion fluxon_test_stack/ci_2_virt_node.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
DEFAULT_RATHER_NO_GIT_SUBMODULE_SCRIPT = (
REPO_ROOT / "fluxon_rs" / "scripts" / "rather_no_git_submodule.py"
)
DEFAULT_CI_2_VIRT_NODE_WORKDIR = REPO_ROOT / ".dever" / "ci_2_virt_node"
DEFAULT_CI_2_VIRT_NODE_WORKDIR = REPO_ROOT / "ci_2_virt_node_workdir"
DEFAULT_RELEASE_DIR = REPO_ROOT / "fluxon_release"
PUBLIC_PROFILE_ID = "fluxon_tcp_thread"
PUBLIC_ARTIFACT_SET_ID = "fluxon_tcp_thread"
Expand Down
Loading
Loading