Skip to content

feat: adding 'trace' command and enhance query metrics#7

Merged
blizhan merged 2 commits into
mainfrom
001-aim-command-passthrough
Apr 22, 2026
Merged

feat: adding 'trace' command and enhance query metrics#7
blizhan merged 2 commits into
mainfrom
001-aim-command-passthrough

Conversation

@blizhan

@blizhan blizhan commented Apr 22, 2026

Copy link
Copy Markdown
Owner

fix: #6

@blizhan

blizhan commented Apr 22, 2026

Copy link
Copy Markdown
Owner Author

@codex 审核代码变更

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the aimx trace command for plotting metric time series and significantly enhances the aimx query command with rich table formatting, JSON/CSV/oneline exports, and step-range filtering. It also adds a hash resolver to allow short run hashes in query expressions. Feedback highlights critical data mapping issues in _extract_values where epochs were incorrectly used as steps, and the use of the wrong attribute for run creation timestamps. Additionally, there are recommendations to improve performance by using Aim's pre-computed summary statistics and to avoid silencing potential SDK errors by unconditionally redirecting stderr.

Comment on lines +64 to +100
def _extract_values(metric: Any) -> tuple[np.ndarray, np.ndarray, np.ndarray | None]:
"""Extract values and step indices from an aim Metric object, sorted by step.

aim 3.x stores metrics with an internal key order that does not match the
user-provided step sequence. ``metric.epochs.sparse_numpy()`` returns
``(internal_keys, epoch_numbers)`` where the epoch numbers are the
meaningful step indices. ``metric.values.sparse_numpy()`` returns
``(internal_keys, float_values)`` in the same internal-key order.

We sort both arrays by the epoch (step) column and return them aligned.
The ``epochs`` return value is ``None`` here because the epochs ARE the
steps (the calling code stores them as ``steps``).
"""
try:
# aim 3.x: (internal_keys, epoch_numbers)
epoch_raw = metric.epochs.sparse_numpy()
val_raw = metric.values.sparse_numpy()
epoch_arr = np.array(epoch_raw[1], dtype=float)
values_arr = np.array(val_raw[1], dtype=float)

# Sort both arrays by step (epoch) to get chronological order
sort_idx = np.argsort(epoch_arr)
steps = epoch_arr[sort_idx].astype(int)
values = values_arr[sort_idx]
return values, steps, None
except Exception:
pass

# Fallback: iterate over values in storage order, using sequential indices
try:
v_list: list[float] = []
for item in metric.values:
v_list.append(float(item))
s_list = list(range(1, len(v_list) + 1))
return np.array(v_list, dtype=float), np.array(s_list, dtype=int), None
except Exception:
return np.array([], dtype=float), np.array([], dtype=int), None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The _extract_values function contains several issues:

  1. Data Mapping Bug: It uses metric.epochs.sparse_numpy()[1] (the epoch numbers) as the steps for the series (lines 81, 86). In Aim, steps and epochs are distinct. The actual steps are returned as the first element of the tuple from sparse_numpy(). This will cause metrics to be plotted/summarized against epoch numbers instead of iteration steps.
  2. Data Loss: It returns None for the epochs array (line 88), even though it successfully fetched them at line 79. This makes the EPOCH column in the UI always empty.
  3. Broad Exception Handling: Catching all exceptions with pass (line 89) or returning empty arrays (line 99) hides potential integration issues with the Aim SDK.
  4. Incorrect Fallback: The fallback logic (lines 92-98) invents sequential steps using range(1, len + 1), which misrepresents the data if the actual steps were non-sequential or didn't start at 1. It should use metric.values.items() to retrieve the actual step-value pairs.
def _extract_values(metric: Any) -> tuple[np.ndarray, np.ndarray, np.ndarray | None]:
    """Extract values, steps, and epochs from an aim Metric object."""
    try:
        # Aim 3.x sparse_numpy() returns (indices, values) where indices are steps
        steps, values = metric.values.sparse_numpy()
        epochs = None
        try:
            # Attempt to fetch epochs aligned with the same steps
            _, epochs = metric.epochs.sparse_numpy()
        except Exception:
            pass

        return (
            np.array(values, dtype=float),
            np.array(steps, dtype=int),
            np.array(epochs, dtype=float) if epochs is not None else None
        )
    except Exception:
        # Fallback: iterate over values using .items() to get (step, value) pairs
        try:
            v_list, s_list = [], []
            for step, val in metric.values.items():
                v_list.append(float(val))
                s_list.append(int(step))
            return np.array(v_list, dtype=float), np.array(s_list, dtype=int), None
        except Exception:
            return np.array([], dtype=float), np.array([], dtype=int), None

Comment thread src/aimx/aim_bridge/metric_stats.py Outdated
hash=run.hash,
experiment=getattr(run, "experiment", None),
name=getattr(run, "name", None),
created_at=getattr(run, "created_at", None),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the Aim 3.x SDK, the attribute for a run's creation timestamp is creation_time, not created_at. Using getattr(run, "created_at", None) will always return None, resulting in missing metadata in the RunMeta object.

Suggested change
created_at=getattr(run, "created_at", None),
created_at=getattr(run, "creation_time", None),

Comment on lines +126 to +127
with contextlib.redirect_stderr(stderr_buf):
query_result = repo.query_metrics(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Silencing stderr without checking the buffer can hide critical Aim SDK warnings or connection errors. Since report_mode=QueryReportMode.DISABLED is already used to silence the progress bar, the redirection is likely unnecessary or should at least be logged if an exception occurs.

def collect_query_rows(invocation: QueryInvocation, repo_path: Path) -> list[dict[str, Any]]:
Repo, QueryReportMode = load_aim_query_support()
repo = Repo(str(repo_path))
series_list = collect_metric_series(invocation.expression, repo_path)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The query command fetches the full time-series data (all values and steps) for every matching metric just to compute summary statistics (last, min, max). This is highly inefficient for large repositories. Aim metrics provide pre-computed summary statistics in their metadata (e.g., metric.last_value, metric.min, metric.max) which should be used when no --steps filter is applied. If a step filter is present, consider using Aim's slicing API (e.g., metric.values[start:end]) before calling sparse_numpy() to reduce data transfer.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f6e80298b7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/aimx/aim_bridge/metric_stats.py Outdated
Comment on lines +85 to +87
sort_idx = np.argsort(epoch_arr)
steps = epoch_arr[sort_idx].astype(int)
values = values_arr[sort_idx]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use tracked step indices instead of epochs

This code sets steps from metric.epochs, but Aim exposes step and epoch as separate axes (indices are the tracking steps, epochs is a different optional timeline), so runs tracked with track(..., step=..., epoch=...) will get incorrect step ordering, --steps filtering, and wrong last/min/max @step annotations. The SDK docs explicitly distinguish these fields (see property indices vs property epochs: https://aimstack.readthedocs.io/en/v3.10.0/refs/sdk.html).

Useful? React with 👍 / 👎.

Comment on lines +193 to +197
output = render_plot(
series_list,
width=invocation.width,
height=invocation.height,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor --no-color in default trace plot mode

run_trace_command computes effective_no_color, but the default plot branch ignores it and always calls render_plot(...), while render_plot hardcodes a themed plot (plt.theme("dark")). This makes aimx trace ... --no-color (and non-TTY output where no-color is auto-forced) still render themed/colorized output, violating the CLI flag contract.

Useful? React with 👍 / 👎.

…rrent directory

- Enhanced README to clarify that `--repo` is optional for `query` and `trace` commands, defaulting to the current directory.
- Updated command implementations to reflect this change, allowing users to omit the `--repo` argument when in the Aim repository root.
- Adjusted related tests to ensure they validate the new default behavior.
@blizhan blizhan merged commit 69670a2 into main Apr 22, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant