Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,8 @@ __marimo__/

# spec-kit
.specify/
.agents/skills/spec*

# aim
data/
datatest/
2 changes: 2 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,8 @@ Why `3.12`:
- Existing local Aim repositories (read-only). Image bytes are read (003-query-images-terminal-render)
- Python 3.12 for development, runtime support `>=3.10,<3.13` + Python standard library, `numpy>=1.24`, `rich>=13.7`, `textual-image>=0.12.0`, existing Aim SDK usage for owned query commands; no new dependency planned (004-run-params-query)
- Existing local Aim repositories on disk (read-only); run params are read from Aim run metadata attributes under `.aim` (004-run-params-query)
- Python 3.12 for development, runtime support `>=3.10,<3.13` + Python standard library, `numpy>=1.24`, `rich>=13.7`, `plotext>=5.3`, existing Aim SDK usage for owned trace commands; no new runtime dependency planned (005-distribution-trace-visual)
- Existing local Aim repositories on disk, read-only; distribution histogram points are read from Aim sequence data under `.aim` (005-distribution-trace-visual)

## Recent Changes
- 001-aim-command-passthrough: Added Python 3.12 for development, runtime support `>=3.10,<3.13` + Python standard library, native Aim CLI (external runtime prerequisite for delegated commands), pytest for test automation
25 changes: 25 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,31 @@ aimx trace "metric.name == 'loss'" --repo data --every 10
Output modes: default plot, `--table`, `--csv`, `--json`.
Display controls: `--width W`, `--height H`, `--no-color`.

### Trace distributions

`aimx trace distribution` fetches tracked Aim distribution sequences. By
default it prints the matched distribution names, selects the first match, and
renders a non-interactive Rich terminal visual with a web-style blue-gradient
current-step histogram and step-by-bin heatmap. Use `--table`, `--csv`, or
`--json` for tensor inspection and scripting.

![aimx trace distribution output preview](static/distributions.png)

```bash
# Show a web-like terminal visual for the first matched distribution
aimx trace distribution "distribution.name != ''" --repo data

# Inspect a specific training step; nearest tracked step is used if needed
aimx trace distribution "distribution.name != ''" --repo data --step 12300

# Show distribution tensors in a readable table
aimx trace distribution "distribution.name == 'weights'" --repo data --table

# Export distribution histograms for scripting
aimx trace distribution "distribution.name == 'weights'" --repo data --csv
aimx trace distribution "distribution.name == 'weights'" --repo data --json
```

### Common query options

- Output: `--json`, `--oneline` / `--plain`, or the default rich terminal view.
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "aimx"
version = "0.3.2"
version = "0.3.3"
description = "A safe CLI-first companion for native Aim"
readme = "README.md"
requires-python = ">=3.10,<3.13"
Expand Down
164 changes: 161 additions & 3 deletions skills/aimx/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,91 @@ Use `aimx` as a read-only evidence collector for `autoresearch` `log_experiment`
steps. Prefer JSON output so downstream agents can compare runs, explain model
effects, and propose the next experiment from concrete Aim data.

## Fast Recipes

Use these first for common analysis tasks. Keep `--repo` explicit and prefer
`--json` for machine-readable output.

### Discover run scope and available params

```bash
aimx query params "run.hash != ''" --repo <repo> --json
```

### Inspect one run quickly

```bash
aimx query params "run.hash == '<run-hash>'" --repo <repo> --json
aimx query metrics "(run.hash == '<run-hash>') and metric.name != ''" --repo <repo> --json
```

### Rank runs by an objective metric

```bash
aimx query metrics "(<run-scope>) and metric.name == '<metric>'" --repo <repo> --json > metrics.json
python - <<'PY'
from __future__ import annotations
import json
from pathlib import Path

payload = json.loads(Path("metrics.json").read_text())
rows = []
for run in payload.get("runs", []):
for metric in run.get("metrics", []):
value = metric.get("min", {}).get("value")
if value is not None:
rows.append((value, run.get("hash"), run.get("name"), metric.get("context", {})))
for value, run_hash, run_name, context in sorted(rows)[:5]:
print(f"{value:.6f}\t{run_hash}\t{run_name}\t{context}")
PY
```

### Compare two runs side by side

```bash
aimx query params "run.hash == '<baseline-hash>' or run.hash == '<candidate-hash>'" --repo <repo> --json
aimx query metrics "((run.hash == '<baseline-hash>') or (run.hash == '<candidate-hash>')) and metric.name == '<metric>'" --repo <repo> --json
```

### Check curve health with bounded trace evidence

```bash
aimx trace "(<run-scope>) and metric.name == '<metric>'" --repo <repo> --json --tail 200 > trace.json
```

Then reduce `trace.json` with the `curve_summary` snippet from
`references/aimx-cli.md` instead of pasting raw series.

### Sanity-check distribution traces

```bash
aimx trace distribution "<distribution-expr>" --repo <repo> --json --tail 5
aimx trace distribution "distribution.name != ''" --repo <repo> --step 12300
```

### Capture one snapshot bundle for logs

```bash
uv run python skills/aimx/scripts/collect_experiment_snapshot.py \
--repo data \
--base-expr "run.hash != ''" \
--metric loss \
--trace-metric loss \
--pretty
```

## When to use what

| Need | Use |
| --- | --- |
| Discover runs and key hyperparameters | `aimx query params "<run-scope>" --repo <repo> --json` |
| Rank runs cheaply by objective | `aimx query metrics "<metric-expr>" --repo <repo> --json` and compare `min.value` or `max.value` |
| Inspect curve shape and late stability | `aimx trace "<metric-expr>" --repo <repo> --json --tail N` |
| Focus on a step or epoch window | `--steps a:b` or `--epochs a:b` on query/trace commands |
| Analyze weight or gradient histograms | `aimx trace distribution "<distribution-expr>" --repo <repo> --json` |
| Collect qualitative image evidence | `aimx query images "<image-expr>" --repo <repo> --json --head N` |
| Check native Aim passthrough readiness | `aimx doctor` |

## Requirements

- Require `aimx` in the Python environment that runs `log_experiment`.
Expand All @@ -32,6 +117,9 @@ effects, and propose the next experiment from concrete Aim data.

## Workflow

For common tasks, start from **Fast Recipes** and only switch to this full
workflow when the scope is unclear or the question is complex.

1. Locate the Aim repository. Pass `--repo <repo-root-or-.aim>` explicitly; in
this repository, use `--repo data` or `--repo data/.aim` for local checks.
2. Define the run scope as an AimQL expression. Start broad with
Expand All @@ -56,13 +144,22 @@ effects, and propose the next experiment from concrete Aim data.
aimx trace "(<run-scope>) and metric.name == 'loss'" --repo <repo> --json --tail 50
```

6. Collect image metadata when qualitative outputs matter:
6. Inspect distribution traces when weight, activation, or gradient histograms
matter. Prefer JSON/CSV for automation; use the default visual output for
human terminal inspection.

```bash
aimx trace distribution "<distribution-expr>" --repo <repo> --json --tail 5
aimx trace distribution "distribution.name != ''" --repo <repo> --step 12300
```

7. Collect image metadata when qualitative outputs matter:

```bash
aimx query images "images" --repo <repo> --json --head 20
```

7. Emit a compact `log_experiment` record containing:
8. Emit a compact `log_experiment` record containing:

```json
{
Expand All @@ -71,6 +168,7 @@ effects, and propose the next experiment from concrete Aim data.
"params": {},
"metric_summary": {},
"trace_evidence": {},
"distribution_evidence": {},
"image_evidence": {},
"interpretation": {
"best_runs": [],
Expand All @@ -81,13 +179,55 @@ effects, and propose the next experiment from concrete Aim data.
}
```

## Analysis Workflow

Use the same discipline as large experiment trackers: inspect structure first,
query only the fields needed for the question, then reduce evidence into compact
statistics before writing conclusions.

1. Start with params and metric summaries to discover candidate runs, objective
metrics, contexts, and missing fields. Avoid dumping full JSON payloads into
conversation context.
2. Choose the objective direction explicitly. Rank cheaply from summaries first:
`min.value` for loss/error, `max.value` for accuracy/F1/AUC/IoU, and
`last.value` only when the final checkpoint is the real objective.
3. Pull bounded traces only for the baseline, top candidates, and suspicious
runs. Prefer `--tail`, `--steps`, `--epochs`, and `--every` before collecting
full curves.
4. Compute local stats before interpreting: best step, final-window mean/std,
train-vs-val gap, NaN/Inf counts, sustained increases, spikes, and plateaus.
5. Compare runs side by side with selected params plus selected metrics. Do not
iterate every param or every metric unless discovery is the goal.
6. Escalate evidence by modality: use distribution traces for weights,
activations, or gradients; use image metadata for qualitative regressions.
7. Keep the final analysis small: state objective, run scope, top runs, curve
health, anomalies, confidence, and the next experiment suggested by evidence.

## Critical Rules

- Discover scope first with `aimx query params "<run-scope>" --repo <repo> --json`.
Do not assume metric or param names.
- Treat `aimx` output as data: parse JSON and report aggregates, not raw payloads.
- Slice traces aggressively with `--tail`, `--head`, `--steps`, `--epochs`, or
`--every` before computing local statistics.
- Always pass `--repo` explicitly to avoid reading an unintended repository.
- For automation, use `aimx trace distribution` with `--json`, `--csv`, or
`--table`. Unflagged mode is terminal visualization for human inspection.
- Always finish with a compact conclusion: objective, top runs, curve health,
anomalies, confidence, and next experiment.

## Interpretation Rules

- Prefer validation, test, or held-out contexts over training contexts when
ranking runs.
- Treat `aimx query metrics` as summary data: `last`, `min`, `max`, and step
counts. Use `aimx trace --json` when shape, stability, divergence, or late
improvement matters.
- Use `aimx trace distribution --json` or `--csv` for automated histogram
evidence. The unflagged distribution command is a non-interactive terminal
visual that lists matched distributions, selects the first non-empty series,
and renders a current-step histogram plus step-by-bin heatmap. `--step N`
affects only this visual mode and falls back to the nearest tracked step.
- For minimization metrics such as loss or error, compare `min.value` and the
corresponding step. For maximization metrics such as accuracy, F1, AUC, or
IoU, compare `max.value`.
Expand All @@ -97,6 +237,21 @@ effects, and propose the next experiment from concrete Aim data.
- Preserve read-only behavior. Do not run commands that initialize, repair,
migrate, delete, or rewrite Aim repositories during `log_experiment`.

## Gotchas

| Gotcha | Wrong | Right |
| --- | --- | --- |
| Missing `aimx` in environment | Assume `aimx` is available | Verify with `aimx --help` or `python -m aimx --help`, then follow project install workflow |
| Repository targeting | Rely on current directory | Pass `--repo <repo>` explicitly on every collection command |
| Summary vs curve confusion | Treat `query metrics` output as full history | Use `query metrics` for summary (`last/min/max`) and `trace --json` for curve shape |
| Raw payload dumping | Paste full JSON into conversation | Parse and compute compact aggregates before reporting |
| AimQL string quoting | `metric.name == "loss"` | `metric.name == 'loss'` |
| Short hash assumptions | Assume short hash is canonical identity | Let `aimx` expand it, but compare/store full run hash |
| Distribution output mode | Use default distribution mode in scripts | Use `--json`, `--csv`, or `--table` for automation |
| `--step` expectation | Expect `--step` to filter JSON/CSV/table exports | Use `--step` only for visual histogram mode |
| Empty trace handling | Treat non-JSON message as fatal parsing error | Treat it as no trace evidence and continue analysis |
| Full trace collection | Pull all runs and all points first | Rank by summary, then trace only baseline, top candidates, and suspicious runs |

## Helper Script

Use `scripts/collect_experiment_snapshot.py` when an agent needs one structured
Expand All @@ -120,4 +275,7 @@ needed. It writes only to stdout.
## Reference

Read `references/aimx-cli.md` for command details, JSON envelope shapes, and
suggested `log_experiment` evidence fields.
suggested `log_experiment` evidence fields. For deeper experiment analysis
patterns, see "Analysis Patterns", "Find best run by objective", "Spike /
divergence / plateau / NaN detection", "Overfitting detection", and "Sweep
ranking".
Loading
Loading