feat: adding 'trace' command and enhance query metrics#7
Conversation
|
@codex 审核代码变更 |
There was a problem hiding this comment.
Code Review
This pull request introduces the aimx trace command for plotting metric time series and significantly enhances the aimx query command with rich table formatting, JSON/CSV/oneline exports, and step-range filtering. It also adds a hash resolver to allow short run hashes in query expressions. Feedback highlights critical data mapping issues in _extract_values where epochs were incorrectly used as steps, and the use of the wrong attribute for run creation timestamps. Additionally, there are recommendations to improve performance by using Aim's pre-computed summary statistics and to avoid silencing potential SDK errors by unconditionally redirecting stderr.
| def _extract_values(metric: Any) -> tuple[np.ndarray, np.ndarray, np.ndarray | None]: | ||
| """Extract values and step indices from an aim Metric object, sorted by step. | ||
|
|
||
| aim 3.x stores metrics with an internal key order that does not match the | ||
| user-provided step sequence. ``metric.epochs.sparse_numpy()`` returns | ||
| ``(internal_keys, epoch_numbers)`` where the epoch numbers are the | ||
| meaningful step indices. ``metric.values.sparse_numpy()`` returns | ||
| ``(internal_keys, float_values)`` in the same internal-key order. | ||
|
|
||
| We sort both arrays by the epoch (step) column and return them aligned. | ||
| The ``epochs`` return value is ``None`` here because the epochs ARE the | ||
| steps (the calling code stores them as ``steps``). | ||
| """ | ||
| try: | ||
| # aim 3.x: (internal_keys, epoch_numbers) | ||
| epoch_raw = metric.epochs.sparse_numpy() | ||
| val_raw = metric.values.sparse_numpy() | ||
| epoch_arr = np.array(epoch_raw[1], dtype=float) | ||
| values_arr = np.array(val_raw[1], dtype=float) | ||
|
|
||
| # Sort both arrays by step (epoch) to get chronological order | ||
| sort_idx = np.argsort(epoch_arr) | ||
| steps = epoch_arr[sort_idx].astype(int) | ||
| values = values_arr[sort_idx] | ||
| return values, steps, None | ||
| except Exception: | ||
| pass | ||
|
|
||
| # Fallback: iterate over values in storage order, using sequential indices | ||
| try: | ||
| v_list: list[float] = [] | ||
| for item in metric.values: | ||
| v_list.append(float(item)) | ||
| s_list = list(range(1, len(v_list) + 1)) | ||
| return np.array(v_list, dtype=float), np.array(s_list, dtype=int), None | ||
| except Exception: | ||
| return np.array([], dtype=float), np.array([], dtype=int), None |
There was a problem hiding this comment.
The _extract_values function contains several issues:
- Data Mapping Bug: It uses
metric.epochs.sparse_numpy()[1](the epoch numbers) as thestepsfor the series (lines 81, 86). In Aim,stepsandepochsare distinct. The actual steps are returned as the first element of the tuple fromsparse_numpy(). This will cause metrics to be plotted/summarized against epoch numbers instead of iteration steps. - Data Loss: It returns
Nonefor theepochsarray (line 88), even though it successfully fetched them at line 79. This makes theEPOCHcolumn in the UI always empty. - Broad Exception Handling: Catching all exceptions with
pass(line 89) or returning empty arrays (line 99) hides potential integration issues with the Aim SDK. - Incorrect Fallback: The fallback logic (lines 92-98) invents sequential steps using
range(1, len + 1), which misrepresents the data if the actual steps were non-sequential or didn't start at 1. It should usemetric.values.items()to retrieve the actual step-value pairs.
def _extract_values(metric: Any) -> tuple[np.ndarray, np.ndarray, np.ndarray | None]:
"""Extract values, steps, and epochs from an aim Metric object."""
try:
# Aim 3.x sparse_numpy() returns (indices, values) where indices are steps
steps, values = metric.values.sparse_numpy()
epochs = None
try:
# Attempt to fetch epochs aligned with the same steps
_, epochs = metric.epochs.sparse_numpy()
except Exception:
pass
return (
np.array(values, dtype=float),
np.array(steps, dtype=int),
np.array(epochs, dtype=float) if epochs is not None else None
)
except Exception:
# Fallback: iterate over values using .items() to get (step, value) pairs
try:
v_list, s_list = [], []
for step, val in metric.values.items():
v_list.append(float(val))
s_list.append(int(step))
return np.array(v_list, dtype=float), np.array(s_list, dtype=int), None
except Exception:
return np.array([], dtype=float), np.array([], dtype=int), None| hash=run.hash, | ||
| experiment=getattr(run, "experiment", None), | ||
| name=getattr(run, "name", None), | ||
| created_at=getattr(run, "created_at", None), |
There was a problem hiding this comment.
In the Aim 3.x SDK, the attribute for a run's creation timestamp is creation_time, not created_at. Using getattr(run, "created_at", None) will always return None, resulting in missing metadata in the RunMeta object.
| created_at=getattr(run, "created_at", None), | |
| created_at=getattr(run, "creation_time", None), |
| with contextlib.redirect_stderr(stderr_buf): | ||
| query_result = repo.query_metrics( |
There was a problem hiding this comment.
| def collect_query_rows(invocation: QueryInvocation, repo_path: Path) -> list[dict[str, Any]]: | ||
| Repo, QueryReportMode = load_aim_query_support() | ||
| repo = Repo(str(repo_path)) | ||
| series_list = collect_metric_series(invocation.expression, repo_path) |
There was a problem hiding this comment.
The query command fetches the full time-series data (all values and steps) for every matching metric just to compute summary statistics (last, min, max). This is highly inefficient for large repositories. Aim metrics provide pre-computed summary statistics in their metadata (e.g., metric.last_value, metric.min, metric.max) which should be used when no --steps filter is applied. If a step filter is present, consider using Aim's slicing API (e.g., metric.values[start:end]) before calling sparse_numpy() to reduce data transfer.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f6e80298b7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| sort_idx = np.argsort(epoch_arr) | ||
| steps = epoch_arr[sort_idx].astype(int) | ||
| values = values_arr[sort_idx] |
There was a problem hiding this comment.
Use tracked step indices instead of epochs
This code sets steps from metric.epochs, but Aim exposes step and epoch as separate axes (indices are the tracking steps, epochs is a different optional timeline), so runs tracked with track(..., step=..., epoch=...) will get incorrect step ordering, --steps filtering, and wrong last/min/max @step annotations. The SDK docs explicitly distinguish these fields (see property indices vs property epochs: https://aimstack.readthedocs.io/en/v3.10.0/refs/sdk.html).
Useful? React with 👍 / 👎.
| output = render_plot( | ||
| series_list, | ||
| width=invocation.width, | ||
| height=invocation.height, | ||
| ) |
There was a problem hiding this comment.
Honor --no-color in default trace plot mode
run_trace_command computes effective_no_color, but the default plot branch ignores it and always calls render_plot(...), while render_plot hardcodes a themed plot (plt.theme("dark")). This makes aimx trace ... --no-color (and non-TTY output where no-color is auto-forced) still render themed/colorized output, violating the CLI flag contract.
Useful? React with 👍 / 👎.
…rrent directory - Enhanced README to clarify that `--repo` is optional for `query` and `trace` commands, defaulting to the current directory. - Updated command implementations to reflect this change, allowing users to omit the `--repo` argument when in the Aim repository root. - Adjusted related tests to ensure they validate the new default behavior.
fix: #6