Add optional MLflow logging to the cross-validation CLI by gbeane · Pull Request #407 · KumarLabJax/JABS-behavior-classifier

gbeane · 2026-06-24T00:36:48Z

Summary

Adds opt-in MLflow tracking to the jabs-cli cross-validation command. Each run can record aggregate cross-validation metrics, run parameters, descriptive tags, and the training report as an artifact, so cross-validation runs of a behavior can be compared over time. MLflow is a fully optional dependency — the base install and all existing behavior are unchanged when it isn't used.

Tracks KLAUS-444.

What's included

New module jabs.classifier.mlflow_logging

log_cross_validation_to_mlflow(...) — creates one MLflow run, logs metrics/params/tags + the report artifact, returns (run_id, tracking_uri).
Helpers: aggregate_cv_metrics, build_params, build_tags, resolve_experiment_name, parse_kv_tags, load_env_file, mlflow_available, and MlflowLoggingError.
import mlflow is lazy (inside the logging function only), so the base package never depends on it.

Optional dependency — new mlflow extra: pip install 'jabs-behavior-classifier[mlflow]'.

CLI options on cross-validation

--mlflow [ENV_FILE] — enable logging; optional .env file with MLFLOW_* connection settings (ambient env if omitted).
--mlflow-experiment NAME — override the experiment (see below).
--mlflow-tag KEY=VALUE — repeatable free-form run tags.
--mlflow-no-report — skip the report artifact (metrics + params only).

Per-behavior experiments — runs default to experiment jabs-<behavior> so a behavior's runs form their own leaderboard (mixing behaviors isn't comparable). Precedence: --mlflow-experiment → MLFLOW_EXPERIMENT_NAME → jabs-<behavior>. The experiment is auto-created.

Leaderboard metrics — cv_f1_behavior_mean, cv_accuracy_mean, precision/recall (mean + std), iteration count, and dataset composition are logged as MLflow metrics, so the experiment's runs table is sortable by mean F1. Full per-fold detail rides along as the report artifact.

Graceful degradation & exit codes

Logging runs after results are printed and the report is saved.
mlflow extra not installed → warn, ignore MLflow options, exit 0.
Push fails (server/auth/TLS) → results/report preserved, warn, exit 3 (distinct from the generic 1).

Docs — both copies (online + in-app cli-tools.md) gain a jabs-cli cross-validation command section and a detailed MLflow integration section (install, enabling, connection config, experiment selection, leaderboard, tags, exit codes).

Example

jabs-cli cross-validation /path/to/project --behavior grooming \
    --mlflow settings.env --mlflow-tag purpose=baseline
# -> logs to experiment "jabs-grooming"

Testing

New tests/classifier/test_mlflow_logging.py (logging module, with a fake mlflow injected — no server/network) and MLflow CLI option-parsing tests in tests/scripts/test_cross_validation_cli.py.
Full tests/classifier/ + tests/scripts/: 302 passed. ruff check/format clean.

Copilot

Pull request overview

Adds opt-in MLflow tracking to the jabs-cli cross-validation workflow so cross-validation runs can be logged (metrics/params/tags + optional report artifact) and compared over time, while keeping MLflow as a fully optional dependency.

Changes:

Introduces jabs.classifier.mlflow_logging with helpers to aggregate CV metrics, parse tags, load MLFLOW_* env files, resolve experiment names, and push a single MLflow run per invocation.
Extends the cross-validation CLI with --mlflow [ENV_FILE], --mlflow-experiment, --mlflow-tag, and --mlflow-no-report, plus a distinct exit code (3) for MLflow push failures.
Adds unit tests for the logging module (with a fake injected mlflow) and CLI option parsing; updates docs in both the online and in-app copies.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
uv.lock	Adds `mlflow` as an optional extra in the lock metadata.
pyproject.toml	Declares the `mlflow` optional dependency extra.
src/jabs/classifier/init.py	Re-exports MLflow helpers and error type from the classifier package.
src/jabs/classifier/mlflow_logging.py	New module implementing MLflow availability checks, env loading, aggregation, tagging, and run/artifact logging.
src/jabs/scripts/cli/cli.py	Adds MLflow-related CLI options and wiring into `run_cross_validation`, including exit code mapping.
src/jabs/scripts/cli/cross_validation.py	Adds MLflow logging after report save, and raises `MlflowLoggingError` on push failure.
tests/classifier/test_mlflow_logging.py	New tests for metrics aggregation, tag parsing, env file loading, experiment selection, and logging behavior via fake MLflow.
tests/scripts/test_cross_validation_cli.py	New tests for MLflow option parsing/forwarding and exit code mapping.
docs/user-guide/cli-tools.md	Documents the cross-validation command and MLflow integration (online docs).
src/jabs/resources/docs/user_guide/cli-tools.md	Mirrors the same CLI + MLflow documentation for the in-app docs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ror wrapping

Copilot

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 1 comment.

keithshep · 2026-06-24T18:33:10Z

+    # If MLflow logging was requested but the optional 'mlflow' extra is not
+    # installed, warn and ignore the MLflow options rather than failing -- the
+    # cross-validation still runs and the report is still produced.
+    if mlflow_enabled and not mlflow_available():
+        click.echo(
+            "Warning: MLflow logging was requested (--mlflow) but the optional 'mlflow' "
+            "dependency is not installed; ignoring MLflow options. Install it with "
+            "\"pip install 'jabs-behavior-classifier[mlflow]'\" to enable logging.",
+            err=True,
+        )
+        mlflow_enabled = False


wondering if this should be a fail-fast situation where we just return? Not blocking just a thought.

gbeane added 4 commits June 23, 2026 14:04

Add optional MLflow logging to cross-validation CLI

1c7672a

Warn and skip MLflow logging when the mlflow extra is not installed

2e0ec33

Document cross-validation CLI and MLflow integration in user guide

a5e0832

Log cross-validation runs to a per-behavior MLflow experiment

94b4c82

gbeane requested a review from Copilot June 24, 2026 00:38

Copilot started reviewing on behalf of gbeane June 24, 2026 00:39 View session

Copilot AI reviewed Jun 24, 2026

View reviewed changes

Address PR review: scope MLflow option parsing, fix docstring/docs/er…

2de35e4

…ror wrapping

gbeane requested a review from Copilot June 24, 2026 01:12

gbeane self-assigned this Jun 24, 2026

gbeane requested review from bergsalex and keithshep June 24, 2026 01:12

Copilot started reviewing on behalf of gbeane June 24, 2026 01:12 View session

Copilot AI reviewed Jun 24, 2026

View reviewed changes

Comment thread tests/classifier/test_mlflow_logging.py Outdated

Make load_env_file test robust to ambient environment

75f86a6

keithshep approved these changes Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optional MLflow logging to the cross-validation CLI#407

Add optional MLflow logging to the cross-validation CLI#407
gbeane wants to merge 6 commits into
mainfrom
feature/cv-cli-mlflow-logging

gbeane commented Jun 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

keithshep Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gbeane commented Jun 24, 2026

Summary

What's included

Example

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

keithshep Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants