Skip to content

Add optional MLflow logging to the cross-validation CLI#407

Open
gbeane wants to merge 6 commits into
mainfrom
feature/cv-cli-mlflow-logging
Open

Add optional MLflow logging to the cross-validation CLI#407
gbeane wants to merge 6 commits into
mainfrom
feature/cv-cli-mlflow-logging

Conversation

@gbeane

@gbeane gbeane commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds opt-in MLflow tracking to the jabs-cli cross-validation command. Each run can record aggregate cross-validation metrics, run parameters, descriptive tags, and the training report as an artifact, so cross-validation runs of a behavior can be compared over time. MLflow is a fully optional dependency — the base install and all existing behavior are unchanged when it isn't used.

Tracks KLAUS-444.

What's included

New module jabs.classifier.mlflow_logging

  • log_cross_validation_to_mlflow(...) — creates one MLflow run, logs metrics/params/tags + the report artifact, returns (run_id, tracking_uri).
  • Helpers: aggregate_cv_metrics, build_params, build_tags, resolve_experiment_name, parse_kv_tags, load_env_file, mlflow_available, and MlflowLoggingError.
  • import mlflow is lazy (inside the logging function only), so the base package never depends on it.

Optional dependency — new mlflow extra: pip install 'jabs-behavior-classifier[mlflow]'.

CLI options on cross-validation

  • --mlflow [ENV_FILE] — enable logging; optional .env file with MLFLOW_* connection settings (ambient env if omitted).
  • --mlflow-experiment NAME — override the experiment (see below).
  • --mlflow-tag KEY=VALUE — repeatable free-form run tags.
  • --mlflow-no-report — skip the report artifact (metrics + params only).

Per-behavior experiments — runs default to experiment jabs-<behavior> so a behavior's runs form their own leaderboard (mixing behaviors isn't comparable). Precedence: --mlflow-experimentMLFLOW_EXPERIMENT_NAMEjabs-<behavior>. The experiment is auto-created.

Leaderboard metricscv_f1_behavior_mean, cv_accuracy_mean, precision/recall (mean + std), iteration count, and dataset composition are logged as MLflow metrics, so the experiment's runs table is sortable by mean F1. Full per-fold detail rides along as the report artifact.

Graceful degradation & exit codes

  • Logging runs after results are printed and the report is saved.
  • mlflow extra not installed → warn, ignore MLflow options, exit 0.
  • Push fails (server/auth/TLS) → results/report preserved, warn, exit 3 (distinct from the generic 1).

Docs — both copies (online + in-app cli-tools.md) gain a jabs-cli cross-validation command section and a detailed MLflow integration section (install, enabling, connection config, experiment selection, leaderboard, tags, exit codes).

Example

jabs-cli cross-validation /path/to/project --behavior grooming \
    --mlflow settings.env --mlflow-tag purpose=baseline
# -> logs to experiment "jabs-grooming"

Testing

  • New tests/classifier/test_mlflow_logging.py (logging module, with a fake mlflow injected — no server/network) and MLflow CLI option-parsing tests in tests/scripts/test_cross_validation_cli.py.
  • Full tests/classifier/ + tests/scripts/: 302 passed. ruff check/format clean.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds opt-in MLflow tracking to the jabs-cli cross-validation workflow so cross-validation runs can be logged (metrics/params/tags + optional report artifact) and compared over time, while keeping MLflow as a fully optional dependency.

Changes:

  • Introduces jabs.classifier.mlflow_logging with helpers to aggregate CV metrics, parse tags, load MLFLOW_* env files, resolve experiment names, and push a single MLflow run per invocation.
  • Extends the cross-validation CLI with --mlflow [ENV_FILE], --mlflow-experiment, --mlflow-tag, and --mlflow-no-report, plus a distinct exit code (3) for MLflow push failures.
  • Adds unit tests for the logging module (with a fake injected mlflow) and CLI option parsing; updates docs in both the online and in-app copies.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
uv.lock Adds mlflow as an optional extra in the lock metadata.
pyproject.toml Declares the mlflow optional dependency extra.
src/jabs/classifier/init.py Re-exports MLflow helpers and error type from the classifier package.
src/jabs/classifier/mlflow_logging.py New module implementing MLflow availability checks, env loading, aggregation, tagging, and run/artifact logging.
src/jabs/scripts/cli/cli.py Adds MLflow-related CLI options and wiring into run_cross_validation, including exit code mapping.
src/jabs/scripts/cli/cross_validation.py Adds MLflow logging after report save, and raises MlflowLoggingError on push failure.
tests/classifier/test_mlflow_logging.py New tests for metrics aggregation, tag parsing, env file loading, experiment selection, and logging behavior via fake MLflow.
tests/scripts/test_cross_validation_cli.py New tests for MLflow option parsing/forwarding and exit code mapping.
docs/user-guide/cli-tools.md Documents the cross-validation command and MLflow integration (online docs).
src/jabs/resources/docs/user_guide/cli-tools.md Mirrors the same CLI + MLflow documentation for the in-app docs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/jabs/scripts/cli/cli.py
Comment thread src/jabs/classifier/mlflow_logging.py Outdated
Comment thread src/jabs/scripts/cli/cross_validation.py
Comment thread docs/user-guide/cli-tools.md Outdated
Comment thread docs/user-guide/cli-tools.md Outdated
Comment thread src/jabs/resources/docs/user_guide/cli-tools.md Outdated
Comment thread src/jabs/resources/docs/user_guide/cli-tools.md Outdated
@gbeane gbeane requested a review from Copilot June 24, 2026 01:12
@gbeane gbeane self-assigned this Jun 24, 2026
@gbeane gbeane requested review from bergsalex and keithshep June 24, 2026 01:12

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 1 comment.

Comment thread tests/classifier/test_mlflow_logging.py Outdated
Comment on lines +432 to +442
# If MLflow logging was requested but the optional 'mlflow' extra is not
# installed, warn and ignore the MLflow options rather than failing -- the
# cross-validation still runs and the report is still produced.
if mlflow_enabled and not mlflow_available():
click.echo(
"Warning: MLflow logging was requested (--mlflow) but the optional 'mlflow' "
"dependency is not installed; ignoring MLflow options. Install it with "
"\"pip install 'jabs-behavior-classifier[mlflow]'\" to enable logging.",
err=True,
)
mlflow_enabled = False

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if this should be a fail-fast situation where we just return? Not blocking just a thought.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants