Add optional MLflow logging to the cross-validation CLI#407
Open
gbeane wants to merge 6 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds opt-in MLflow tracking to the jabs-cli cross-validation workflow so cross-validation runs can be logged (metrics/params/tags + optional report artifact) and compared over time, while keeping MLflow as a fully optional dependency.
Changes:
- Introduces
jabs.classifier.mlflow_loggingwith helpers to aggregate CV metrics, parse tags, loadMLFLOW_*env files, resolve experiment names, and push a single MLflow run per invocation. - Extends the
cross-validationCLI with--mlflow [ENV_FILE],--mlflow-experiment,--mlflow-tag, and--mlflow-no-report, plus a distinct exit code (3) for MLflow push failures. - Adds unit tests for the logging module (with a fake injected
mlflow) and CLI option parsing; updates docs in both the online and in-app copies.
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Adds mlflow as an optional extra in the lock metadata. |
| pyproject.toml | Declares the mlflow optional dependency extra. |
| src/jabs/classifier/init.py | Re-exports MLflow helpers and error type from the classifier package. |
| src/jabs/classifier/mlflow_logging.py | New module implementing MLflow availability checks, env loading, aggregation, tagging, and run/artifact logging. |
| src/jabs/scripts/cli/cli.py | Adds MLflow-related CLI options and wiring into run_cross_validation, including exit code mapping. |
| src/jabs/scripts/cli/cross_validation.py | Adds MLflow logging after report save, and raises MlflowLoggingError on push failure. |
| tests/classifier/test_mlflow_logging.py | New tests for metrics aggregation, tag parsing, env file loading, experiment selection, and logging behavior via fake MLflow. |
| tests/scripts/test_cross_validation_cli.py | New tests for MLflow option parsing/forwarding and exit code mapping. |
| docs/user-guide/cli-tools.md | Documents the cross-validation command and MLflow integration (online docs). |
| src/jabs/resources/docs/user_guide/cli-tools.md | Mirrors the same CLI + MLflow documentation for the in-app docs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
keithshep
approved these changes
Jun 24, 2026
Comment on lines
+432
to
+442
| # If MLflow logging was requested but the optional 'mlflow' extra is not | ||
| # installed, warn and ignore the MLflow options rather than failing -- the | ||
| # cross-validation still runs and the report is still produced. | ||
| if mlflow_enabled and not mlflow_available(): | ||
| click.echo( | ||
| "Warning: MLflow logging was requested (--mlflow) but the optional 'mlflow' " | ||
| "dependency is not installed; ignoring MLflow options. Install it with " | ||
| "\"pip install 'jabs-behavior-classifier[mlflow]'\" to enable logging.", | ||
| err=True, | ||
| ) | ||
| mlflow_enabled = False |
Contributor
There was a problem hiding this comment.
wondering if this should be a fail-fast situation where we just return? Not blocking just a thought.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds opt-in MLflow tracking to the
jabs-cli cross-validationcommand. Each run can record aggregate cross-validation metrics, run parameters, descriptive tags, and the training report as an artifact, so cross-validation runs of a behavior can be compared over time. MLflow is a fully optional dependency — the base install and all existing behavior are unchanged when it isn't used.Tracks KLAUS-444.
What's included
New module
jabs.classifier.mlflow_logginglog_cross_validation_to_mlflow(...)— creates one MLflow run, logs metrics/params/tags + the report artifact, returns(run_id, tracking_uri).aggregate_cv_metrics,build_params,build_tags,resolve_experiment_name,parse_kv_tags,load_env_file,mlflow_available, andMlflowLoggingError.import mlflowis lazy (inside the logging function only), so the base package never depends on it.Optional dependency — new
mlflowextra:pip install 'jabs-behavior-classifier[mlflow]'.CLI options on
cross-validation--mlflow [ENV_FILE]— enable logging; optional.envfile withMLFLOW_*connection settings (ambient env if omitted).--mlflow-experiment NAME— override the experiment (see below).--mlflow-tag KEY=VALUE— repeatable free-form run tags.--mlflow-no-report— skip the report artifact (metrics + params only).Per-behavior experiments — runs default to experiment
jabs-<behavior>so a behavior's runs form their own leaderboard (mixing behaviors isn't comparable). Precedence:--mlflow-experiment→MLFLOW_EXPERIMENT_NAME→jabs-<behavior>. The experiment is auto-created.Leaderboard metrics —
cv_f1_behavior_mean,cv_accuracy_mean, precision/recall (mean + std), iteration count, and dataset composition are logged as MLflow metrics, so the experiment's runs table is sortable by mean F1. Full per-fold detail rides along as the report artifact.Graceful degradation & exit codes
mlflowextra not installed → warn, ignore MLflow options, exit0.3(distinct from the generic1).Docs — both copies (online + in-app
cli-tools.md) gain ajabs-cli cross-validationcommand section and a detailed MLflow integration section (install, enabling, connection config, experiment selection, leaderboard, tags, exit codes).Example
jabs-cli cross-validation /path/to/project --behavior grooming \ --mlflow settings.env --mlflow-tag purpose=baseline # -> logs to experiment "jabs-grooming"Testing
tests/classifier/test_mlflow_logging.py(logging module, with a fakemlflowinjected — no server/network) and MLflow CLI option-parsing tests intests/scripts/test_cross_validation_cli.py.tests/classifier/+tests/scripts/: 302 passed.ruff check/formatclean.