Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 14 additions & 10 deletions .github/workflows/axi-sdk-js-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,18 @@ on:
push:
branches: [main]
paths:
- package.json
- packages/axi-sdk-js/**
- pnpm-lock.yaml
- pnpm-workspace.yaml
- .github/workflows/axi-sdk-js-ci.yml
pull_request:
branches: [main]
paths:
- package.json
- packages/axi-sdk-js/**
- pnpm-lock.yaml
- pnpm-workspace.yaml
- .github/workflows/axi-sdk-js-ci.yml

jobs:
Expand All @@ -18,17 +24,15 @@ jobs:
steps:
- uses: actions/checkout@v4

- uses: pnpm/action-setup@v4

- uses: actions/setup-node@v6
with:
node-version: 24
cache: npm
cache-dependency-path: |
package-lock.json
packages/axi-sdk-js/package-lock.json
cache: pnpm

- run: npm ci
- run: npm --prefix packages/axi-sdk-js ci
- run: npm run format:check
- run: npm run lint
- run: npm --prefix packages/axi-sdk-js run build
- run: npm --prefix packages/axi-sdk-js test
- run: pnpm install --frozen-lockfile
- run: pnpm run format:check
- run: pnpm run lint
- run: pnpm --dir packages/axi-sdk-js run build
- run: pnpm --dir packages/axi-sdk-js test
20 changes: 9 additions & 11 deletions .github/workflows/axi-sdk-js-release-please.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,29 +28,27 @@ jobs:
- uses: actions/checkout@v4
if: ${{ steps.release.outputs['packages/axi-sdk-js--release_created'] == 'true' }}

- uses: pnpm/action-setup@v4
if: ${{ steps.release.outputs['packages/axi-sdk-js--release_created'] == 'true' }}

- uses: actions/setup-node@v6
if: ${{ steps.release.outputs['packages/axi-sdk-js--release_created'] == 'true' }}
with:
node-version: 24
cache: npm
cache-dependency-path: |
package-lock.json
packages/axi-sdk-js/package-lock.json
cache: pnpm
registry-url: "https://registry.npmjs.org"

- run: npm ci
if: ${{ steps.release.outputs['packages/axi-sdk-js--release_created'] == 'true' }}
- run: npm --prefix packages/axi-sdk-js ci
- run: pnpm install --frozen-lockfile
if: ${{ steps.release.outputs['packages/axi-sdk-js--release_created'] == 'true' }}
- run: npm run format:check
- run: pnpm run format:check
if: ${{ steps.release.outputs['packages/axi-sdk-js--release_created'] == 'true' }}
- run: npm run lint
- run: pnpm run lint
if: ${{ steps.release.outputs['packages/axi-sdk-js--release_created'] == 'true' }}

- run: npm --prefix packages/axi-sdk-js run build
- run: pnpm --dir packages/axi-sdk-js run build
if: ${{ steps.release.outputs['packages/axi-sdk-js--release_created'] == 'true' }}

- run: npm --prefix packages/axi-sdk-js test
- run: pnpm --dir packages/axi-sdk-js test
if: ${{ steps.release.outputs['packages/axi-sdk-js--release_created'] == 'true' }}

- run: npm publish --access public --provenance
Expand Down
25 changes: 11 additions & 14 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,30 +18,27 @@ The reference AXI implementation (`gh-axi`) lives in a separate repo: [kunchengu
### Benchmark harness (GitHub)

```sh
cd bench-github
npm install
npm run bench -- run --condition axi --task merged_pr_ci_audit --repeat 5 --agent claude
npm run bench -- matrix --repeat 5 --agent claude
npm run bench -- report
npm test # Run bench tests (vitest)
pnpm install
pnpm --dir bench-github run bench -- run --condition axi --task merged_pr_ci_audit --repeat 5 --agent claude
pnpm --dir bench-github run bench -- matrix --repeat 5 --agent claude
pnpm --dir bench-github run bench -- report
pnpm --dir bench-github test # Run bench tests (vitest)
```

### Benchmark harness (Browser)

```sh
cd bench-browser
npm install
npm run bench -- run --condition agent-browser --task read_static_page --repeat 5
npm run bench -- matrix --repeat 5 # full run: all conditions × all tasks × 5 repeats
npm run bench -- report
npm test # Run bench tests (vitest)
pnpm install
pnpm --dir bench-browser run bench -- run --condition agent-browser --task read_static_page --repeat 5
pnpm --dir bench-browser run bench -- matrix --repeat 5 # full run: all conditions × all tasks × 5 repeats
pnpm --dir bench-browser run bench -- report
pnpm --dir bench-browser test # Run bench tests (vitest)
```

### Social video rendering

```sh
cd bench-browser
npm run render:social # Render social/index.html via HyperFrames to docs/social/rendered/race.mp4
pnpm --dir bench-browser run render:social # Render social/index.html via HyperFrames to docs/social/rendered/race.mp4
```

The source composition is `bench-browser/social/` (a HyperFrames project). Edit `social/index.html` for content/animation; see `social/DESIGN.md` for the visual identity. Use the `/hyperframes` skill when modifying the composition.
Expand Down
22 changes: 10 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,23 +96,22 @@ This installs the [AXI skill](.agents/skills/axi/SKILL.md) — a detailed guide
The browser benchmark harness lives in `bench-browser/`. It compares browser automation tools across 16 browsing tasks.

```sh
cd bench-browser
npm install
pnpm install

# Run a single condition × task
npm run bench -- run --condition chrome-devtools-axi --task read_static_page
pnpm --dir bench-browser run bench -- run --condition chrome-devtools-axi --task read_static_page

# Run the full matrix
npm run bench -- matrix --repeat 5
pnpm --dir bench-browser run bench -- matrix --repeat 5

# Generate summary report
npm run bench -- report
pnpm --dir bench-browser run bench -- report

# Render the social video
npm run render:social
pnpm --dir bench-browser run render:social
```

The HyperFrames composition for the social asset lives in `bench-browser/social/`. Edit `social/index.html` for the animation and render `docs/social/rendered/race.mp4` with `npm run render:social`.
The HyperFrames composition for the social asset lives in `bench-browser/social/`. Edit `social/index.html` for the animation and render `docs/social/rendered/race.mp4` with `pnpm --dir bench-browser run render:social`.

Published results (490 runs): [`bench-browser/published-results/report.md`](bench-browser/published-results/report.md)

Expand All @@ -121,17 +120,16 @@ Published results (490 runs): [`bench-browser/published-results/report.md`](benc
The GitHub benchmark harness lives in `bench-github/`. It runs agent tasks across different interface conditions and grades results with an LLM judge.

```sh
cd bench-github
npm install
pnpm install

# Run a single condition × task
npm run bench -- run --condition axi --task merged_pr_ci_audit --repeat 5 --agent claude
pnpm --dir bench-github run bench -- run --condition axi --task merged_pr_ci_audit --repeat 5 --agent claude

# Run the full matrix
npm run bench -- matrix --repeat 5 --agent claude
pnpm --dir bench-github run bench -- matrix --repeat 5 --agent claude

# Generate summary report
npm run bench -- report
pnpm --dir bench-github run bench -- report
```

Published results (425 runs): [`bench-github/published-results/STUDY.md`](bench-github/published-results/STUDY.md)
Expand Down
Loading
Loading