diff --git a/models/integrations/accelerate.mdx b/models/integrations/accelerate.mdx index 210b75257b..9d693af63c 100644 --- a/models/integrations/accelerate.mdx +++ b/models/integrations/accelerate.mdx @@ -1,15 +1,16 @@ --- -description: Training and inference at scale made simple, efficient and adaptable +description: Training and inference at scale made simple, efficient, and adaptable title: Hugging Face Accelerate +keywords: ["multi-GPU", "mixed precision", "DeepSpeed", "FSDP"] --- Hugging Face Accelerate is a library that enables the same PyTorch code to run across any distributed configuration, to simplify model training and inference at scale. -Accelerate includes a W&B Tracker which we show how to use below. You can also read more about [Accelerate Trackers in Hugging Face](https://huggingface.co/docs/accelerate/main/en/usage_guides/tracking). +Accelerate includes a W&B Tracker, which this page shows how to use to log metrics, configuration, and artifacts from distributed training runs to W&B. For more information, see [Accelerate Trackers in Hugging Face](https://huggingface.co/docs/accelerate/main/en/usage_guides/tracking). ## Start logging with Accelerate -To get started with Accelerate and W&B you can follow the pseudocode below: +This section shows how to configure Accelerate to log experiment data to W&B during training. To get started with Accelerate and W&B, follow this pseudocode: ```python from accelerate import Accelerator @@ -34,33 +35,33 @@ accelerator.log({"train_loss": 1.12, "valid_loss": 0.8}, step=global_step) accelerator.end_training() ``` -Explaining more, you need to: -1. Pass `log_with="wandb"` when initialising the Accelerator class +In more detail: +1. Pass `log_with="wandb"` when you initialize the `Accelerator` class. 2. Call the [`init_trackers`](https://huggingface.co/docs/accelerate/main/en/package_reference/accelerator#accelerate.Accelerator.init_trackers) method and pass it: -- a project name via `project_name` -- any parameters you want to pass to [`wandb.init()`](/models/ref/python/functions/init) via a nested dict to `init_kwargs` -- any other experiment config information you want to log to your wandb run, via `config` -3. Use the `wandb.Run.log()` method to log to Weigths & Biases; the `step` argument is optional -4. Call `.end_training()` when finished training + - A project name via `project_name`. + - Any parameters you want to pass to [`wandb.init()`](/models/ref/python/functions/init) through a nested dict to `init_kwargs`. + - Any other experiment config information you want to log to your wandb run, through `config`. +3. Use the `wandb.Run.log()` method to log to W&B. The `step` argument is optional. +4. Call `.end_training()` when training finishes. ## Access the W&B tracker -To access the W&B tracker, use the `Accelerator.get_tracker()` method. Pass in the string corresponding to a tracker’s `.name` attribute, which returns the tracker on the `main` process. +Once Accelerate logs to W&B, you may want direct access to the underlying W&B run object to log artifacts, custom charts, or other data that the tracker doesn't expose. To access the W&B tracker, use the `Accelerator.get_tracker()` method. Pass in the string corresponding to a tracker's `.name` attribute, which returns the tracker on the `main` process. ```python wandb_tracker = accelerator.get_tracker("wandb") ``` -From there you can interact with wandb’s run object like normal: +From there, you can interact with the `wandb` run object as usual: ```python wandb_tracker.log_artifact(some_artifact_to_log) ``` -Trackers built in Accelerate will automatically execute on the correct process, so if a tracker is only meant to be ran on the main process it will do so automatically. +Trackers built in Accelerate automatically execute on the correct process, so if a tracker only needs to run on the main process it does so automatically. -If you want to truly remove Accelerate’s wrapping entirely, you can achieve the same outcome with: +To remove Accelerate's wrapping entirely, you can achieve the same outcome with: ```python wandb_tracker = accelerator.get_tracker("wandb", unwrap=True) @@ -70,13 +71,14 @@ with accelerator.on_main_process: ## Accelerate articles -Below is an Accelerate article you may enjoy + +For a deeper walkthrough of using Accelerate with W&B, see the following article.
HuggingFace Accelerate Super Charged With W&B -* In this article, we'll look at what HuggingFace Accelerate has to offer and how simple it is to perform distributed training and evaluation, while logging results to W&B. +This article looks at what HuggingFace Accelerate offers and how to perform distributed training and evaluation while logging results to W&B. Read the [Hugging Face Accelerate Super Charged with W&B report](https://wandb.ai/gladiator/HF%20Accelerate%20+%20W&B/reports/Hugging-Face-Accelerate-Super-Charged-with-Weights-Biases--VmlldzoyNzk3MDUx?utm_source=docs&utm_medium=docs&utm_campaign=accelerate-docs).
diff --git a/models/integrations/add-wandb-to-any-library.mdx b/models/integrations/add-wandb-to-any-library.mdx index 54759c11e6..b1ca33ef3a 100644 --- a/models/integrations/add-wandb-to-any-library.mdx +++ b/models/integrations/add-wandb-to-any-library.mdx @@ -1,28 +1,28 @@ --- title: Add W&B to a Python library -description: Best practices for integrating Weights & Biases into your Python library for experiment tracking, system monitoring, and model management. +description: Best practices for integrating W&B into your Python library for experiment tracking, system monitoring, and model management. +keywords: ["library integration", "WandbCallback pattern", "publish integration"] --- import ApiKeyCreateStreamlined from "/snippets/_includes/api-key-create-streamlined.mdx"; -This guide explains how to integrate Weights & Biases (W&B) into a Python library. - -Follow these recommendations if you are integrating W&B into a complex codebase—such as a training framework, SDK, or reusable library. +This guide explains how to integrate W&B into a Python library so that your users can track experiments, monitor system metrics, and manage models when they use your code. It's intended for library authors and maintainers who want to expose W&B functionality through their own framework, SDK, or reusable training code. +Follow these recommendations if you're integrating W&B into a complex codebase (such as a training framework, SDK, or reusable library) where the codebase is more involved than a single Python training script or Jupyter notebook. -If you are new to W&B, review the core guides (for example, [Experiment Tracking](/models/track/)) before continuing. +If you're new to W&B, review the core guides (for example, [Experiment Tracking](/models/track/)) before continuing. -Below we cover best tips and best practices when the codebase you are working on is more complicated than a single Python training script or Jupyter notebook. +The following sections walk through the major integration decisions in order: how users install W&B, how they authenticate, how to start and configure runs, how to log metrics and artifacts, and how to support distributed training and hyperparameter sweeps. ## Decide how users install W&B -Before you start, decide whether W&B should be a required dependency or an optional feature of your library. +Before you start, decide whether W&B should be a required dependency or an optional feature of your library. This choice affects how you import `wandb`, how you document installation, and how you handle environments where `wandb` isn't present. ### Require W&B as a dependency -If W&B is central to your library’s functionality, add the W&B Python SDK (`wandb`) to your dependencies: +If W&B is central to your library's functionality, add the W&B Python SDK (`wandb`) to your dependencies so that it's installed automatically alongside your library: ```txt torch==1.8.0 @@ -32,7 +32,7 @@ wandb==0.13.* ### Make W&B optional on installation -If W&B is an optional feature, allow your library to run without it installed. +If W&B is an optional feature, allow your library to run without it installed so that users who don't need experiment tracking can still use your code. You can either import `wandb` conditionally in Python or declare it as an optional dependency in `pyproject.toml`. @@ -73,11 +73,11 @@ dev = [ ## Authenticate users -W&B uses API keys to authenticate users and machines. +W&B uses API keys to authenticate users and machines. Before users can log runs from your library, they must generate an API key and make it available to the `wandb` client. ### Create an API key -An API key authenticates a client or machine to W&B. You can generate an API key from your user profile. +An API key authenticates a client or machine to W&B. Generate an API key from your user profile so that you can use it for the login steps that follow. @@ -86,17 +86,17 @@ An API key authenticates a client or machine to W&B. You can generate an API key ### Install and log in to W&B -To install the `wandb` library locally and log in: +After you have an API key, install the `wandb` library locally and log in so that subsequent runs can authenticate to W&B. Choose the tab that matches your environment. 1. Set the `WANDB_API_KEY` [environment variable](/models/track/environment-variables/) to your API key: ```bash - export WANDB_API_KEY= + export WANDB_API_KEY=[YOUR-API-KEY] ``` -2. Install the `wandb` library and log in: +1. Install the `wandb` library and log in: ```bash pip install wandb @@ -111,7 +111,7 @@ To install the `wandb` library locally and log in: pip install wandb ``` -1. Log in to W&B from your Python script or notebook. This will prompt you to enter +1. Log in to W&B from your Python script or notebook. W&B prompts you to enter your API key. ```python import wandb @@ -119,9 +119,9 @@ your API key. ``` -Copy and paste the following code snippet into a cell in your Jupyter notebook and run it. This will prompt you to enter your API key. +Copy and paste the following code snippet into a cell in your Jupyter notebook and run it. W&B prompts you to enter your API key. -```notebook +```python !pip install wandb import wandb @@ -132,16 +132,18 @@ wandb.login() ## Start a run +After you set up authentication, the next step is to start a W&B run so that your library has somewhere to log metrics, configs, and artifacts. + A *run* represents a single unit of computation, such as a training experiment. Most libraries create one run per training job. For more information about runs, see [W&B Runs](/models/runs/). -Initialize a run with [`wandb.init()`](/models/ref/python/functions/init) and specify a name for your project and your team entity (team name). If you do not specify a project, W&B stores your run in a default project called "uncategorized".: +Initialize a run with [`wandb.init()`](/models/ref/python/functions/init) and specify a name for your project and your team entity (team name). If you don't specify a project, W&B stores your run in a default project called "uncategorized": ```python -with wandb.init(project="", entity="") as run: +with wandb.init(project="[PROJECT-NAME]", entity="[ENTITY]") as run: ... ``` -W&B recommends that you use a context manager to ensure that your run is properly closed, even if an error occurs. If you do not use a context manager, you must call `run.finish()` to close the run and log all the data to W&B. +W&B recommends that you use a context manager to ensure that your run is properly closed, even if an error occurs. If you don't use a context manager, you must call `run.finish()` to close the run and log all the data to W&B. Closing the run guarantees that all metrics, configs, and artifacts are uploaded before the process exits. **When to call `wandb.init()`** @@ -153,9 +155,13 @@ Wrap your entire training loop in a `wandb.init()` context manager to ensure tha ### Set `wandb` as an optional dependency -If you want to make `wandb` optional when your users use your library, you can either: +If you want to make `wandb` optional at runtime, so that users can run your library without producing W&B runs, use one of the following approaches: + +* Define a `wandb` flag. +* Set `wandb` to be `disabled` in `wandb.init()`. +* Set `wandb` to be offline. This still runs `wandb`, but doesn't communicate back to W&B over the internet. -* Define a `wandb` flag such as: +Define a `wandb` flag such as: @@ -170,7 +176,7 @@ python train.py ... --use-wandb -* Or, set `wandb` to be `disabled` in `wandb.init()`: +Set `wandb` to be `disabled` in `wandb.init()`: @@ -191,7 +197,7 @@ wandb disabled -* Or, set `wandb` to be offline - note this will still run `wandb`, it just won't try and communicate back to W&B over the internet: +Set `wandb` to be offline: @@ -214,17 +220,19 @@ wandb offline ## Define a run config +After you initialize a run, you can attach a configuration dictionary that records the hyperparameters and other metadata associated with that run. Logging a config makes runs easier to compare, filter, and reproduce later. + Provide a configuration dictionary when you initialize your run to log hyperparameters and other metadata to W&B. Use the W&B App to compare runs based on their config parameters and filter them in the Runs table. You can also use these parameters to group runs together in the W&B App. -For example, in the following image, the batch size (bathch_size) was defined as a config parameter and is visible(see first column) in the Runs table. This allows users to filter and compare runs based on their batch size: +For example, in the following image, the batch size (`batch_size`) is defined as a config parameter and is visible (see first column) in the Runs table. This lets users filter and compare runs based on their batch size: W&B Runs table -Typical config parameters values include: +Typical config parameter values include: * Model name, version, architecture parameters, and hyperparameters. * Dataset name, version, number of training or validation examples. @@ -240,7 +248,7 @@ with wandb.init(..., config=config) as run: ### Update the run config -If values are not available at initialization time, update the config later with `wandb.Run.config.update`. For example, you might want to add a model’s parameters after the model is instantiated: +Some configuration values, such as model parameter counts, might not be known when you call `wandb.init()`. If values aren't available at initialization time, update the config later with `wandb.Run.config.update`. For example, you might want to add a model's parameters after you instantiate the model: ```python with wandb.init(...) as run: @@ -248,13 +256,15 @@ with wandb.init(...) as run: run.config.update({"model_parameters": 3500}) ``` -For details, see [Configure experiments](/models/track/config/). +For more information, see [Configure experiments](/models/track/config/). ## Log metrics and data +After you start and configure a run, you can begin logging metrics and other data so that W&B records them against the run. + ### Log metrics -Create a dictionary where the key value is the name of the metric. Pass this dictionary object to [`wandb.Run.log()`](/models/ref/python/experiments/run#method-run-log) to log it to W&B: +To log scalar metrics such as loss or accuracy, create a dictionary where each key is the name of a metric. Pass this dictionary object to [`wandb.Run.log()`](/models/ref/python/experiments/run#method-run-log) to log it to W&B: ```python NUM_EPOCHS = 10 @@ -269,7 +279,7 @@ for epoch in range(NUM_EPOCHS): Use metric name prefixes to group related metrics in the W&B App. Common prefixes include `train/` and `val/` for training and validation metrics, respectively, but you can use any prefix that makes sense for your use case. -This will create separate sections in your project's workspace for your training and validation metrics, or other metric types you'd like to separate: +This creates separate sections in your project's workspace for your training and validation metrics, or other metric types you'd like to separate: ```python with wandb.init(...) as run: @@ -286,11 +296,11 @@ with wandb.init(...) as run: W&B Workspace -See [`wandb.Run.log()`](/models/ref/python/experiments/run#method-run-log) for more details. +For more information, see [`wandb.Run.log()`](/models/ref/python/experiments/run#method-run-log). ### Control the x-axis -If you perform multiple calls to `wandb.Run.log()` for the same training step, the wandb SDK increments an internal step counter for each call to `wandb.Run.log()`. This counter may not align with the training step in your training loop. +By default, the `wandb` SDK manages its own step counter, which might not match the step semantics of your training loop. If you perform multiple calls to `wandb.Run.log()` for the same training step, the `wandb` SDK increments an internal step counter for each call to `wandb.Run.log()`. This counter might not align with the training step in your training loop. To avoid this situation, define your x-axis step explicitly with `wandb.Run.define_metric()`, one time, immediately after you call `wandb.init()`: @@ -299,7 +309,7 @@ with wandb.init(...) as run: run.define_metric("*", step_metric="global_step") ``` -The glob pattern, `*`, means that every metric will use `global_step` as the x-axis in your charts. If you only want certain metrics to be logged against `global_step`, you can specify them instead: +The glob pattern, `*`, means that every metric uses `global_step` as the x-axis in your charts. If you only want certain metrics logged against `global_step`, you can specify them instead: ```python run.define_metric("train/loss", step_metric="global_step") @@ -314,32 +324,34 @@ for step, (input, ground_truth) in enumerate(data): run.log({"global_step": step, "eval/loss": 0.2}) ``` -If you do not have access to the independent step variable, for example "global_step" is not available during your validation loop, the previously logged value for "global_step" is automatically used by wandb. In this case, ensure you log an initial value for the metric so it has been defined when it’s needed. +If you don't have access to the independent step variable (for example, `global_step` isn't available during your validation loop), `wandb` automatically uses the previously logged value for `global_step`. In this case, ensure you log an initial value for the metric so that it's defined when it's needed. ### Log media and structured data -In addition to scalars, you can log images, tables, text, audio, video, and more. +In addition to scalars, you can log images, tables, text, audio, video, and more. Logging media alongside metrics helps users inspect qualitative model behavior over time. Some considerations when logging data include: * How often should the metric be logged? Should it be optional? * What type of data could be helpful in visualizing? - * For images, you can log sample predictions, segmentation masks, etc., to see the evolution over time. + * For images, you can log sample predictions and segmentation masks to see the evolution over time. * For text, you can log tables of sample predictions for later exploration. -See the [Log objects and media](/models/track/log) for examples. +For more information, see [Log objects and media](/models/track/log). ## Support distributed training -For frameworks supporting distributed environments, you can adapt any of the following workflows: +If your library can run training across multiple processes or machines, decide how W&B should behave in that setting so that logs are coherent and not duplicated. For frameworks that support distributed environments, you can adapt any of the following workflows: * Log only from the main process (recommended). * Log from every process and group runs using a shared `group` name. -See [Log Distributed Training Experiments](/models/track/log/distributed-training/) for more details. +For more information, see [Log distributed training experiments](/models/track/log/distributed-training/). ## Track models and datasets with artifacts +In addition to metrics, you can persist the models and datasets your library produces or consumes so that users can reproduce and compare runs. + Use [W&B Artifacts](/models/artifacts/) to track and version models and datasets. Artifacts provide storage and versioning for machine learning assets, and they automatically track lineage to show how data and models are related. @@ -348,13 +360,13 @@ Use [W&B Artifacts](/models/artifacts/) to track and version models and datasets Consider the following when integrating artifacts into your library: -* Whether to log model checkpoints or datasets as artifacts (in case you want to make it optional). +* Whether to log model checkpoints or datasets as artifacts (in case you want to make it optional). * Artifact input references (for example, `entity/project/artifact`). -* Logging frequency of model checkpoints or datasets. For example, every epoch, every 500 steps, and so on. +* Logging frequency of model checkpoints or datasets. For example, every epoch or every 500 steps. ### Log model checkpoints -Log model checkpoints to W&B. A common approach is to log checkpoints as artifacts using the unique run ID generated by W&B as part of the artifact name. +Logging model checkpoints as artifacts lets users recover, version, and share trained weights. A common approach is to log checkpoints as artifacts using the unique run ID that W&B generates as part of the artifact name. ```python metadata = {"eval/accuracy": 0.8, "train/steps": 800} @@ -370,12 +382,12 @@ aliases = ["best", "epoch_10"] run.log_artifact(artifact, aliases=aliases) ``` -The previous code snippet demonstrates how to log a model checkpoint as an artifact and add metadata such as evaluation accuracy and training steps. The artifact is given a name that includes the unique run ID, and it is tagged with [custom aliases](/models/artifacts/create-a-custom-alias/) for easy reference. +The preceding snippet logs a model checkpoint as an artifact with metadata such as evaluation accuracy and training steps. The artifact's name includes the unique run ID, and it's tagged with [custom aliases](/models/artifacts/create-a-custom-alias/) for quick reference. ### Log input artifacts -Log datasets or pretrained models used as inputs: +To capture lineage between data and models, log the datasets or pretrained models that a run consumes as inputs: ```python dataset = wandb.Artifact(name="flowers", type="dataset") @@ -383,13 +395,13 @@ dataset.add_file("flowers.npy") run.use_artifact(dataset) ``` -The previous code snippet creates an artifact for a dataset called "flowers" and adds a file to it. The artifact is then associated with the current run using `run.use_artifact()`, which allows W&B to track the lineage of the dataset used in the run. +The preceding snippet creates an artifact for a dataset called "flowers" and adds a file to it. The `run.use_artifact()` call associates the artifact with the current run so that W&B can track the lineage of the dataset used in the run. ### Download artifacts -Download previously logged artifacts from W&B to use in your training or inference code. +After you log artifacts, your library (or its users) can download previously logged artifacts from W&B to use in training or inference code. The right approach depends on whether you already have an active run. -If you have a run context, use [`wandb.Run.use_artifact()`](/models/ref/python/experiments/run) to reference an artifact in W&B and then call [`wandb.Artifact.download()`](/models/ref/python/experiments/artifact) to download it to a local directory. +If you have a run context, use [`wandb.Run.use_artifact()`](/models/ref/python/experiments/run) to reference an artifact in W&B and then call [`wandb.Artifact.download()`](/models/ref/python/experiments/artifact) to download it to a local directory. Using `use_artifact()` also records the artifact as an input to the current run, preserving lineage. ```python with wandb.init(...) as run: @@ -397,7 +409,7 @@ with wandb.init(...) as run: local_path = artifact.download() ``` -Use the [W&B Public API](/models/ref/python/public-api/) to reference and download an artifact without initializing a run. This is useful in scenarios such as distributed environments or when performing inference, where you may not want to create a new run. +Use the [W&B Public API](/models/ref/python/public-api/) to reference and download an artifact without initializing a run. This is useful in scenarios such as distributed environments or when you perform inference, where you might not want to create a new run. ```python import wandb @@ -405,8 +417,8 @@ artifact = wandb.Api().artifact("user/project/artifact:latest") local_path = artifact.download() ``` -See [Download and use artifacts](/models/artifacts/download-and-use-an-artifact/) for more information. +For more information, see [Download and use artifacts](/models/artifacts/download-and-use-an-artifact/). -## Tune hyper-parameters +## Tune hyperparameters -If your library supports hyperparameter tuning, you can integrate [W&B Sweeps](/models/sweeps/) to manage and visualize experiments. +If your library supports hyperparameter tuning, you can integrate [W&B Sweeps](/models/sweeps/) to manage and visualize experiments. Sweeps coordinate multiple runs across a defined search space and surface the results in the W&B App so users can compare configurations side by side. diff --git a/models/integrations/autotrain.mdx b/models/integrations/autotrain.mdx index b102b99eee..54c847dedd 100644 --- a/models/integrations/autotrain.mdx +++ b/models/integrations/autotrain.mdx @@ -1,10 +1,13 @@ --- title: Hugging Face AutoTrain description: "Use W&B experiment tracking with Hugging Face AutoTrain for no-code model training with a single CLI parameter." +keywords: ["no-code NLP", "tabular AutoML", "AutoML training"] --- -[Hugging Face AutoTrain](https://huggingface.co/docs/autotrain/index) is a no-code tool for training state-of-the-art models for Natural Language Processing (NLP) tasks, for Computer Vision (CV) tasks, and for Speech tasks and even for Tabular tasks. +[Hugging Face AutoTrain](https://huggingface.co/docs/autotrain/index) is a no-code tool for training models for Natural Language Processing (NLP), Computer Vision (CV), Speech, and Tabular tasks. -[W&B](https://wandb.com/) is directly integrated into Hugging Face AutoTrain, providing experiment tracking and config management. It's as easy as using a single parameter in the CLI command for your experiments. +[W&B](https://wandb.com/) is directly integrated into Hugging Face AutoTrain, providing experiment tracking and config management. You only need a single parameter in the CLI command for your experiments. + +This page shows you how to enable W&B experiment tracking when you train a model with Hugging Face AutoTrain. You can capture metrics and configuration for every run without writing additional code. This page is for users who are already familiar with AutoTrain and want to add observability to their training workflows. Experiment metrics logging @@ -12,7 +15,7 @@ description: "Use W&B experiment tracking with Hugging Face AutoTrain for no-cod ## Install prerequisites -Install `autotrain-advanced` and `wandb`. +Before you can train a model and log results to W&B, install the AutoTrain CLI and the W&B client library. Install `autotrain-advanced` and `wandb`. @@ -27,19 +30,21 @@ pip install --upgrade autotrain-advanced wandb -To demonstrate these changes, this page fine-tines an LLM on a math dataset to achieve SoTA result in `pass@1` on the [GSM8k Benchmarks](https://github.com/openai/grade-school-math). +To demonstrate these changes, this page fine-tunes an LLM on a math dataset and evaluates `pass@1` on the [GSM8k Benchmarks](https://github.com/openai/grade-school-math). ## Prepare the dataset -Hugging Face AutoTrain expects your CSV custom dataset to have a specific format to work properly. +Before training, prepare your dataset so it matches the format AutoTrain expects. Hugging Face AutoTrain expects your CSV custom dataset to have a specific format to work properly. -- Your training file must contain a `text` column, which the training uses. For best results, the `text` column's data must conform to the `### Human: Question?### Assistant: Answer.` format. Review a great example in [`timdettmers/openassistant-guanaco`](https://huggingface.co/datasets/timdettmers/openassistant-guanaco). +Your training file must contain a `text` column, which the training uses. The data in the `text` column must conform to the `### Human: Question?### Assistant: Answer.` format. Review an example in [`timdettmers/openassistant-guanaco`](https://huggingface.co/datasets/timdettmers/openassistant-guanaco). - However, the [MetaMathQA dataset](https://huggingface.co/datasets/meta-math/MetaMathQA) includes the columns `query`, `response`, and `type`. First, pre-process this dataset. Remove the `type` column and combine the content of the `query` and `response` columns into a new `text` column in the `### Human: Query?### Assistant: Response.` format. Training uses the resulting dataset, [`rishiraj/guanaco-style-metamath`](https://huggingface.co/datasets/rishiraj/guanaco-style-metamath). +However, the [MetaMathQA dataset](https://huggingface.co/datasets/meta-math/MetaMathQA) includes the columns `query`, `response`, and `type`. First, pre-process this dataset. Remove the `type` column and combine the contents of the `query` and `response` columns into a new `text` column in the `### Human: Query?### Assistant: Response.` format. Training uses the resulting dataset, [`rishiraj/guanaco-style-metamath`](https://huggingface.co/datasets/rishiraj/guanaco-style-metamath). ## Train using `autotrain` -You can start training using the `autotrain` advanced from the command line or a notebook. Use the `--log` argument, or use `--log wandb` to log your results to a [W&B Run](/models/runs/). +With your environment and dataset ready, you can now start training. Start training with the `autotrain` advanced from the command line or a notebook. Use the `--log` argument, or use `--log wandb` to log your results to a [run](/models/runs/). The `--log wandb` argument enables the W&B integration for this run. + +Replace `[HUGGINGFACE-TOKEN]` with your Hugging Face access token and `[HUGGINGFACE-REPOSITORY-ADDRESS]` with the target repository address (for example, `your-username/your-repo`). @@ -67,8 +72,8 @@ autotrain llm \ --use-int4 \ --merge-adapter \ --push-to-hub \ - --token \ - --repo-id + --token [HUGGINGFACE-TOKEN] \ + --repo-id [HUGGINGFACE-REPOSITORY-ADDRESS] ``` @@ -117,11 +122,12 @@ logging_steps = 10 - Experiment config saving +After training starts, AutoTrain logs your run's metrics and configuration to W&B, where you can review them alongside any other runs in your project. + ## More resources * [AutoTrain Advanced now supports Experiment Tracking](https://huggingface.co/blog/rishiraj/log-autotrain) by [Rishiraj Acharya](https://huggingface.co/rishiraj). diff --git a/models/integrations/azure-openai-fine-tuning.mdx b/models/integrations/azure-openai-fine-tuning.mdx index 5127a72161..bbab44bd75 100644 --- a/models/integrations/azure-openai-fine-tuning.mdx +++ b/models/integrations/azure-openai-fine-tuning.mdx @@ -1,46 +1,64 @@ --- description: "Fine-tune Azure OpenAI models with W&B experiment tracking to log metrics, hyperparameters, and training progress." -title: Azure OpenAI Fine-Tuning +title: Azure OpenAI fine-tuning +keywords: ["GPT-3.5", "GPT-4", "fine-tune job tracking"] --- -## Introduction -Fine-tuning GPT-3.5 or GPT-4 models on Microsoft Azure using W&B tracks, analyzes, and improves model performance by automatically capturing metrics and facilitating systematic evaluation through W&B's experiment tracking and evaluation tools. +This guide shows you how to use W&B with Azure OpenAI to track and evaluate fine-tuning jobs for GPT-3.5 or GPT-4 models. When you integrate W&B, experiment tracking captures metrics, hyperparameters, and training artifacts so you can analyze and improve model performance. You can also use W&B's evaluation tools to make data-driven decisions about model selection. + +This guide is for machine learning practitioners who fine-tune Azure OpenAI models and want a systematic way to track and compare runs. Azure OpenAI fine-tuning metrics ## Prerequisites + +Before you begin, complete the following: + - Set up Azure OpenAI service according to [official Azure documentation](https://wandb.me/aoai-wb-int). - Configure a W&B account with an API key. ## Workflow overview -### 1. Fine-tuning setup +The following stages summarize how a typical Azure OpenAI fine-tuning job flows through W&B, from preparing the job through evaluating the resulting model. + +### Fine-tuning setup + +Fine-tuning setup involves the following steps: + - Prepare training data according to Azure OpenAI requirements. - Configure the fine-tuning job in Azure OpenAI. -- W&B automatically tracks the fine-tuning process, logging metrics and hyperparameters. +- W&B automatically tracks the fine-tuning process and logs metrics and hyperparameters. + +### Experiment tracking -### 2. Experiment tracking During fine-tuning, W&B captures: -- Training and validation metrics -- Model hyperparameters -- Resource utilization -- Training artifacts -### 3. Model evaluation +- Training and validation metrics. +- Model hyperparameters. +- Resource usage. +- Training artifacts. + +### Model evaluation + After fine-tuning, use [W&B Weave](https://weave-docs.wandb.ai) to: -- Evaluate model outputs against reference datasets -- Compare performance across different fine-tuning runs -- Analyze model behavior on specific test cases -- Make data-driven decisions for model selection + +- Evaluate model outputs against reference datasets. +- Compare performance across different fine-tuning runs. +- Analyze model behavior on specific test cases. +- Make data-driven decisions for model selection. ## Real-world example -* Explore the [medical note generation demo](https://wandb.me/aoai-ft-colab) to see how this integration facilitates: - - Systematic tracking of fine-tuning experiments - - Model evaluation using domain-specific metrics -* Go through an [interactive demo of fine-tuning a notebook](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/azure/azure_gpt_medical_notes.ipynb) + +To see the integration applied end-to-end, explore the following resources: + +- Explore the [medical note generation demo](https://wandb.me/aoai-ft-colab) to see how this integration facilitates: + - Systematic tracking of fine-tuning experiments. + - Model evaluation using domain-specific metrics. +- Work through an [interactive fine-tuning notebook](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/azure/azure_gpt_medical_notes.ipynb). ## Additional resources + - [Azure OpenAI W&B Integration Guide](https://wandb.me/aoai-wb-int) - [Azure OpenAI Fine-tuning Documentation](https://learn.microsoft.com/azure/ai-services/openai/how-to/fine-tuning?tabs=turbo%2Cpython&pivots=programming-language-python) \ No newline at end of file diff --git a/models/integrations/catalyst.mdx b/models/integrations/catalyst.mdx index fde72d6a96..3ab087ab3d 100644 --- a/models/integrations/catalyst.mdx +++ b/models/integrations/catalyst.mdx @@ -1,14 +1,15 @@ --- description: How to integrate W&B for Catalyst, a PyTorch framework. title: Catalyst +keywords: ["catalyst Runner", "SupervisedRunner", "experiment runner"] --- [Catalyst](https://github.com/catalyst-team/catalyst) is a PyTorch framework for deep learning R&D that focuses on reproducibility, rapid experimentation, and codebase reuse so you can create something new. Catalyst includes a W&B integration for logging parameters, metrics, images, and other artifacts. -Check out their [documentation of the integration](https://catalyst-team.github.io/catalyst/api/loggers.html#catalyst.loggers.wandb.WandbLogger), which includes examples using Python and Hydra. +For more information, including examples that use Python and Hydra, see the [Catalyst integration documentation](https://catalyst-team.github.io/catalyst/api/loggers.html#catalyst.loggers.wandb.WandbLogger). ## Interactive example -Run an [example colab](https://colab.research.google.com/drive/1PD0LnXiADCtt4mu7bzv7VfQkFXVrPxJq?usp=sharing) to see Catalyst and W&B integration in action. \ No newline at end of file +To try the Catalyst and W&B integration, open the [example Colab notebook](https://colab.research.google.com/drive/1PD0LnXiADCtt4mu7bzv7VfQkFXVrPxJq?usp=sharing). \ No newline at end of file diff --git a/models/integrations/cohere-fine-tuning.mdx b/models/integrations/cohere-fine-tuning.mdx index ed88213e37..a5f6feed6b 100644 --- a/models/integrations/cohere-fine-tuning.mdx +++ b/models/integrations/cohere-fine-tuning.mdx @@ -1,18 +1,19 @@ --- description: "Fine-tune Cohere models with W&B experiment tracking to log training metrics and monitor model performance." title: Cohere fine-tuning +keywords: ["command model", "rerank fine-tune", "Cohere train API"] --- -With W&B you can log your Cohere model's fine-tuning metrics and configuration to analyze and understand the performance of your models and share the results with your colleagues. +With W&B, you can log your Cohere model's fine-tuning metrics and configuration to analyze your models' performance and share results with your colleagues. This page shows you how to connect a Cohere fine-tuning run to a W&B project so that W&B captures training and validation metrics, hyperparameters, and run metadata automatically in your workspace. It's intended for users who are already fine-tuning Cohere models and want centralized experiment tracking. -This [guide from Cohere](https://docs.cohere.com/page/convfinqa-finetuning-wandb) has a full example of how to kick off a fine-tuning run and you can find the [Cohere API docs here](https://docs.cohere.com/reference/createfinetunedmodel#request.body.settings.wandb) +For a full example of how to start a fine-tuning run, see the [Cohere fine-tuning guide](https://docs.cohere.com/page/convfinqa-finetuning-wandb), and refer to the [Cohere API reference for the `wandb` setting](https://docs.cohere.com/reference/createfinetunedmodel#request.body.settings.wandb). -## Log your Cohere fine-tuning results +## Log Cohere fine-tuning results -To add Cohere fine-tuning logging to your W&B workspace: +To add Cohere fine-tuning logging to your W&B workspace, do the following: -1. Create a `WandbConfig` with your W&B API key, W&B `entity` and `project` name. Create an API key at https://wandb.ai/settings +1. Create a `WandbConfig` with your W&B API key, W&B `entity`, and `project` name. The API key authenticates the Cohere job with W&B, and the entity and project determine where W&B logs your runs. Create an API key in your [W&B user settings](https://wandb.ai/settings). Replace `[WANDB-API-KEY]` in the following example with your API key. -2. Pass this config to the `FinetunedModel` object along with your model name, dataset and hyperparameters to kick off your fine-tuning run. +2. Pass this config to the `FinetunedModel` object along with your model name, dataset, and hyperparameters to start your fine-tuning run. The `wandb` setting configures Cohere to stream metrics to your W&B project during the run. ```python @@ -20,7 +21,7 @@ To add Cohere fine-tuning logging to your W&B workspace: # create a config with your W&B details wandb_ft_config = WandbConfig( - api_key="", + api_key="[WANDB-API-KEY]", entity="my-entity", # must be a valid enitity associated with the provided API key project="cohere-ft", ) @@ -44,17 +45,18 @@ To add Cohere fine-tuning logging to your W&B workspace: 3. View your model's fine-tuning training and validation metrics and hyperparameters in the W&B project that you created. - Cohere fine-tuning dashboard + Cohere fine-tuning dashboard +After the run starts, your Cohere fine-tuning job reports metrics to W&B in real time, giving you a single place to compare runs and inspect training progress. ## Organize runs -Your W&B runs are automatically organized and can be filtered/sorted based on any configuration parameter such as job type, base model, learning rate and any other hyper-parameter. +W&B organizes your runs automatically. You can filter and sort them by any configuration parameter such as job type, base model, learning rate, and any other hyperparameter. -In addition, you can rename your runs, add notes or create tags to group them. +You can also rename your runs, add notes, or create tags to group them. ## Resources -* [Cohere Fine-tuning Example](https://github.com/cohere-ai/notebooks/blob/kkt_ft_cookbooks/notebooks/finetuning/convfinqa_finetuning_wandb.ipynb) +For a complete example, see the [Cohere fine-tuning example notebook](https://github.com/cohere-ai/notebooks/blob/kkt_ft_cookbooks/notebooks/finetuning/convfinqa_finetuning_wandb.ipynb). diff --git a/models/integrations/composer.mdx b/models/integrations/composer.mdx index b7f1113fea..04ee631129 100644 --- a/models/integrations/composer.mdx +++ b/models/integrations/composer.mdx @@ -1,17 +1,20 @@ --- -description: State of the art algorithms to train your neural networks +description: State-of-the-art algorithms to train your neural networks title: MosaicML Composer +keywords: ["speed method", "Trainer.fit", "MPT training"] --- import { ColabLink } from '/snippets/_includes/colab-link.mdx'; -[Composer](https://github.com/mosaicml/composer) is a library for training neural networks better, faster, and cheaper. It contains many state-of-the-art methods for accelerating neural network training and improving generalization, along with an optional [Trainer](https://docs.mosaicml.com/projects/composer/en/stable/trainer/using_the_trainer.html) API that makes _composing_ many different enhancements easy. +[Composer](https://github.com/mosaicml/composer) is a library for training neural networks better, faster, and cheaper. It contains many state-of-the-art methods for accelerating neural network training and improving generalization, along with an optional [Trainer](https://docs.mosaicml.com/projects/composer/en/stable/trainer/using_the_trainer.html) API for _composing_ many different enhancements. -W&B provides a lightweight wrapper for logging your ML experiments. But you don't need to combine the two yourself: W&B is incorporated directly into the Composer library via the [WandBLogger](https://docs.mosaicml.com/projects/composer/en/stable/trainer/file_uploading.html#weights-biases-artifacts). +This page shows you how to use W&B with Composer so you can track, visualize, and compare your training runs. W&B provides a wrapper for logging your ML experiments, but you don't need to combine the two yourself: the Composer library incorporates W&B directly through the [WandBLogger](https://docs.mosaicml.com/projects/composer/en/stable/trainer/file_uploading.html#weights-biases-artifacts). This guide is for Composer users who want to log metrics, artifacts, and prediction samples to W&B. ## Start logging to W&B +To start logging your Composer training runs to W&B, pass a `WandBLogger` instance to the `Trainer`: + ```python from composer import Trainer from composer.loggers import WandBLogger @@ -25,7 +28,9 @@ trainer = Trainer(..., logger=WandBLogger()) ## Use Composer's `WandBLogger` -The Composer library uses [WandBLogger](https://docs.mosaicml.com/projects/composer/en/stable/trainer/file_uploading.html#weights-biases-artifacts) class in the `Trainer` to log metrics to W&B. It is as simple as instantiating the logger and passing it to the `Trainer`. +The following sections describe how the `WandBLogger` integrates with Composer's `Trainer`. + +The Composer library uses the [WandBLogger](https://docs.mosaicml.com/projects/composer/en/stable/trainer/file_uploading.html#weights-biases-artifacts) class in the `Trainer` to log metrics to W&B. Instantiate the logger and pass it to the `Trainer`: ```python wandb_logger = WandBLogger(project="gpt-5", log_artifacts=True) @@ -34,22 +39,22 @@ trainer = Trainer(logger=wandb_logger) ## Logger arguments -Below the parameters for `WandbLogger`, see the [Composer documentation](https://docs.mosaicml.com/projects/composer/en/stable/api_reference/generated/composer.loggers.WandBLogger.html) for a full list and description. +The following table describes the most common parameters you can use to customize how `WandBLogger` records your runs. For a full list and description, see the [Composer documentation](https://docs.mosaicml.com/projects/composer/en/stable/api_reference/generated/composer.loggers.WandBLogger.html). | Parameter | Description | | ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| `project` | W&B Project name (str, optional) -| `group` | W&B group name (str, optional) -| `name` | W&B Run name. If not specified, the State.run_name is used (str, optional) -| `entity` | W&B entity name, such as your username or W&B Team name (str, optional) -| `tags` | W&B tags (List[str], optional) -| `log_artifacts` | Whether to log checkpoints to wandb, default: `false` (bool, optional)| -| `rank_zero_only` | Whether to log only on the rank-zero process. When logging artifacts, it is highly recommended to log on all ranks. Artifacts from ranks ≥1 are not stored, which may discard pertinent information. For example, when using Deepspeed ZeRO, it would be impossible to restore from checkpoints without artifacts from all ranks, default: `True` (bool, optional) -| `init_kwargs` | Params to pass to `wandb.init()` such as your wandb `config` etc. See the [`wandb.init()` parameters](/models/ref/python/functions/init) for parameters that `wandb.init()` accepts. +| `project` | W&B project name (`str`, optional) +| `group` | W&B group name (`str`, optional) +| `name` | W&B run name. If not specified, uses the `State.run_name` (`str`, optional) +| `entity` | W&B entity name, such as your username or W&B Team name (`str`, optional) +| `tags` | W&B tags (`List[str]`, optional) +| `log_artifacts` | Whether to log checkpoints to W&B. Default: `False` (`bool`, optional)| +| `rank_zero_only` | Whether to log only on the rank-zero process. When logging artifacts, log on all ranks. W&B doesn't store artifacts from ranks 1 and higher, which can discard relevant information. For example, when using Deepspeed ZeRO, you can't restore from checkpoints without artifacts from all ranks. Default: `True` (`bool`, optional) +| `init_kwargs` | Parameters to pass to `wandb.init()`, such as your W&B `config`. For the parameters that `wandb.init()` accepts, see [`wandb.init()` parameters](/models/ref/python/functions/init). -A typical usage would be: +The following example shows a typical usage that passes run notes and a config dictionary through `init_kwargs`: -``` +```python init_kwargs = {"notes":"Testing higher learning rate in this experiment", "config":{"arch":"Llama", "use_mixed_precision":True @@ -61,7 +66,7 @@ wandb_logger = WandBLogger(log_artifacts=True, init_kwargs=init_kwargs) ## Log prediction samples -You can use [Composer's Callbacks](https://docs.mosaicml.com/projects/composer/en/stable/trainer/callbacks.html) system to control when you log to W&B via the `WandBLogger`, in this example a sample of the validation images and predictions is logged: +In addition to scalar metrics, you can log rich media such as model predictions to W&B for qualitative review. You can use [Composer's Callbacks](https://docs.mosaicml.com/projects/composer/en/stable/trainer/callbacks.html) system to control when you log to W&B through the `WandBLogger`. The following example logs a sample of the validation images and predictions: ```python import wandb diff --git a/models/integrations/dagster.mdx b/models/integrations/dagster.mdx index 2143f814f0..874afc210d 100644 --- a/models/integrations/dagster.mdx +++ b/models/integrations/dagster.mdx @@ -1,8 +1,9 @@ --- description: "Integrate W&B with Dagster to track ML experiments and manage data pipelines with automatic logging and monitoring." title: Dagster +keywords: ["dagster ops", "dagster asset", "orchestration"] --- -Use Dagster and W&B (W&B) to orchestrate your MLOps pipelines and maintain ML assets. The integration with W&B makes it easy within Dagster to: +Use Dagster and W&B together to orchestrate your MLOps pipelines and maintain ML assets, so you can track experiments, manage data, and run training jobs from within your existing Dagster workflows. The integration with W&B lets you do the following within Dagster: * Create and use a [W&B Artifact](/models/artifacts). * Use and create Registered Models in [W&B Registry](/models/registry). @@ -11,31 +12,41 @@ Use Dagster and W&B (W&B) to orchestrate your MLOps pipelines and maintain ML as The W&B Dagster integration provides a W&B-specific Dagster resource and IO Manager: -* `wandb_resource`: a Dagster resource used to authenticate and communicate to the W&B API. +* `wandb_resource`: a Dagster resource used to authenticate and communicate with the W&B API. * `wandb_artifacts_io_manager`: a Dagster IO Manager used to consume W&B Artifacts. -The following guide demonstrates how to satisfy prerequisites to use W&B in Dagster, how to create and use W&B Artifacts in ops and assets, how to use W&B Launch and recommended best practices. +This guide is for ML practitioners and platform engineers who already use Dagster and want to add W&B tracking and artifact management. It walks through the prerequisites for using W&B in Dagster, shows how to create and use W&B Artifacts in ops and assets, explains how to use W&B Launch, and describes recommended best practices. ## Before you get started -You will need the following resources to use Dagster within W&B: -1. **W&B API Key**. -2. **W&B entity (user or team)**: An entity is a username or team name where you send W&B Runs and Artifacts. Make sure to create your account or team entity in the W&B App UI before you log runs. If you do not specify ain entity, the run will be sent to your default entity, which is usually your username. Change your default entity in your settings under **Project Defaults**. -3. **W&B project**: The name of the project where [W&B Runs](/models/runs) are stored. -Find your W&B entity by checking the profile page for that user or team in the W&B App. You can use a pre-existing W&B project or create a new one. New projects can be created on the W&B App homepage or on user/team profile page. If a project does not exist it will be automatically created when you first use it. +Before you configure the integration, gather the W&B credentials and identifiers it needs to authenticate and route your runs. + +You need the following resources to use Dagster within W&B: + +- **W&B API Key**. +- **W&B entity (user or team)**: An entity is a username or team name where you send W&B Runs and Artifacts. Create your account or team entity in the W&B App UI before you log runs. If you don't specify an entity, the run goes to your default entity, which is usually your username. Change your default entity in your settings under **Project Defaults**. +- **W&B project**: The name of the project where [W&B Runs](/models/runs) are stored. + +Find your W&B entity by checking the profile page for that user or team in the W&B App. You can use a pre-existing W&B project or create a new one. Create new projects on the W&B App homepage or on the user or team profile page. If a project doesn't exist, W&B creates it automatically when you first use it. ### Set up your API key -1. [Log in to W&B](https://wandb.ai/login). Note: if you are using W&B Server ask your admin for the instance host name. -2. Create an API key at [User Settings](https://wandb.ai/settings). For a production environment we recommend using a [service account](/support/models/articles/what-is-a-service-account-and-why-is-it-) to own that key. -3. Set an environment variable for that API key: `export WANDB_API_KEY=YOUR_KEY`. +The integration authenticates with the W&B API using an API key, which you must make available to Dagster as an environment variable. + +1. [Log in to W&B](https://wandb.ai/login). If you're using W&B Server, ask your admin for the instance host name. +2. Create an API key at [User Settings](https://wandb.ai/settings). For a production environment, use a [service account](/support/models/articles/what-is-a-service-account-and-why-is-it-) to own that key. +3. Set an environment variable for that API key: `export WANDB_API_KEY=[YOUR_KEY]`. -The following examples demonstrate where to specify your API key in your Dagster code. Make sure to specify your entity and project name within the `wandb_config` nested dictionary. You can pass different `wandb_config` values to different ops/assets if you want to use a different W&B Project. For more information about possible keys you can pass, see the Configuration section below. +After completing these steps, Dagster can read your API key from the environment when it loads the `wandb_resource`. + + +The following examples demonstrate where to specify your API key in your Dagster code. Specify your entity and project name within the `wandb_config` nested dictionary. You can pass different `wandb_config` values to different ops or assets if you want to use a different W&B Project. For more information about possible keys you can pass, see the following Configuration section. -Example: configuration for `@job` +Example configuration for `@job`: + ```python # add this to your config.yaml # alternatively you can set the config in Dagit's Launchpad or JobDefinition.execute_in_process @@ -64,7 +75,7 @@ def simple_job_example(): ``` -Example: configuration for `@repository` using assets +Example configuration for `@repository` using assets: ```python from dagster_wandb import wandb_artifacts_io_manager, wandb_resource @@ -105,31 +116,34 @@ def my_repository(): ), ] ``` -Note that we are configuring the IO Manager cache duration in this example contrary to the example for `@job`. +This example configures the IO Manager cache duration, unlike the example for `@job`. ### Configuration -The following configuration options are used as settings on the W&B-specific Dagster resource and IO Manager provided by the integration. -* `wandb_resource`: Dagster [resource](https://docs.dagster.io/guides/build/external-resources) used to communicate with the W&B API. It automatically authenticates using the provided API key. Properties: +The integration provides a W&B-specific Dagster resource and IO Manager with the following configuration options. + +* `wandb_resource`: Dagster [resource](https://docs.dagster.io/guides/build/external-resources) used to communicate with the W&B API. It authenticates automatically using the provided API key. Properties: * `api_key`: (str, required): a W&B API key necessary to communicate with the W&B API. - * `host`: (str, optional): the API host server you wish to use. Only required if you are using W&B Server. It defaults to the Public Cloud host, `https://api.wandb.ai`. + * `host`: (str, optional): the API host server you want to use. Only required if you're using W&B Server. Defaults to the Public Cloud host, `https://api.wandb.ai`. * `wandb_artifacts_io_manager`: Dagster [IO Manager](https://docs.dagster.io/guides/build/io-managers) to consume W&B Artifacts. Properties: - * `base_dir`: (int, optional) Base directory used for local storage and caching. W&B Artifacts and W&B Run logs will be written and read from that directory. By default, it’s using the `DAGSTER_HOME` directory. - * `cache_duration_in_minutes`: (int, optional) to define the amount of time W&B Artifacts and W&B Run logs should be kept in the local storage. Only files and directories that were not opened for that amount of time are removed from the cache. Cache purging happens at the end of an IO Manager execution. You can set it to 0, if you want to turn off caching completely. Caching improves speed when an Artifact is reused between jobs running on the same machine. It defaults to 30 days. - * `run_id`: (str, optional): A unique ID for this run, used for resuming. It must be unique in the project, and if you delete a run you can't reuse the ID. Use the name field for a short descriptive name, or config for saving hyperparameters to compare across runs. The ID cannot contain the following special characters: `/\#?%:..` You need to set the Run ID when you are doing experiment tracking inside Dagster to allow the IO Manager to resume the run. By default it’s set to the Dagster Run ID e.g `7e4df022-1bf2-44b5-a383-bb852df4077e`. - * `run_name`: (str, optional) A short display name for this run to help you identify this run in the UI. By default, it is a string with the following format: `dagster-run-[8 first characters of the Dagster Run ID]`. For example, `dagster-run-7e4df022`. - * `run_tags`: (list[str], optional): A list of strings, which will populate the list of tags on this run in the UI. Tags are useful for organizing runs together, or applying temporary labels like `baseline` or `production`. It's easy to add and remove tags in the UI, or filter down to just runs with a specific tag. Any W&B Run used by the integration will have the `dagster_wandb` tag. + * `base_dir`: (int, optional) Base directory used for local storage and caching. W&B Artifacts and W&B Run logs are written to and read from that directory. By default, it uses the `DAGSTER_HOME` directory. + * `cache_duration_in_minutes`: (int, optional) Defines how long to keep W&B Artifacts and W&B Run logs in local storage. The cache removes only files and directories that haven't been opened for that amount of time. Cache purging happens at the end of an IO Manager execution. Set it to 0 to turn off caching completely. Caching improves speed when an Artifact is reused between jobs running on the same machine. Defaults to 30 days. + * `run_id`: (str, optional): A unique ID for this run, used for resuming. It must be unique in the project, and if you delete a run you can't reuse the ID. Use the name field for a short descriptive name, or config for saving hyperparameters to compare across runs. The ID can't contain the following special characters: `/\#?%:..` Set the Run ID when you're doing experiment tracking inside Dagster to allow the IO Manager to resume the run. By default, it's set to the Dagster Run ID, for example, `7e4df022-1bf2-44b5-a383-bb852df4077e`. + * `run_name`: (str, optional) A short display name for this run to help you identify it in the UI. By default, it is a string with the following format: `dagster-run-[8 first characters of the Dagster Run ID]`. For example, `dagster-run-7e4df022`. + * `run_tags`: (list[str], optional): A list of strings that populates the list of tags on this run in the UI. Tags are useful for organizing runs together or applying temporary labels like `baseline` or `production`. You can add and remove tags in the UI, or filter down to runs with a specific tag. Any W&B Run used by the integration has the `dagster_wandb` tag. ## Use W&B Artifacts +This section explains how the integration uses a Dagster IO Manager to bridge W&B Artifacts and Dagster ops and assets. + The integration with W&B Artifact relies on a Dagster IO Manager. -[IO Managers](https://docs.dagster.io/guides/build/io-managers) are user-provided objects that are responsible for storing the output of an asset or op and loading it as input to downstream assets or ops. For example, an IO Manager might store and load objects from files on a filesystem. +[IO Managers](https://docs.dagster.io/guides/build/io-managers) are user-provided objects that store the output of an asset or op and load it as input to downstream assets or ops. For example, an IO Manager might store and load objects from files on a filesystem. -The integration provides an IO Manager for W&B Artifacts. This allows any Dagster `@op` or `@asset` to create and consume W&B Artifacts natively. Here’s a simple example of an `@asset` producing a W&B Artifact of type dataset containing a Python list. +The integration provides an IO Manager for W&B Artifacts. This lets any Dagster `@op` or `@asset` create and consume W&B Artifacts directly. The following example shows an `@asset` that produces a W&B Artifact of type dataset containing a Python list. ```python @asset( @@ -145,14 +159,18 @@ def create_dataset(): return [1, 2, 3] # this will be stored in an Artifact ``` -You can annotate your `@op`, `@asset` and `@multi_asset` with a metadata configuration in order to write Artifacts. Similarly you can also consume W&B Artifacts even if they were created outside Dagster. +You can annotate your `@op`, `@asset`, and `@multi_asset` with a metadata configuration to write Artifacts. Similarly, you can also consume W&B Artifacts even if they were created outside Dagster. ## Write W&B Artifacts -Before continuing, we recommend you to have a good understanding of how to use W&B Artifacts. Consider reading the [Guide on Artifacts](/models/artifacts). -Return an object from a Python function to write a W&B Artifact. The following objects are supported by W&B: -* Python objects (int, dict, list…) -* W&B objects (Table, Image, Graph…) +The following sections describe how to produce W&B Artifacts from Dagster ops and assets, including the supported return types and how to configure the resulting Artifact. + +Before continuing, make sure you understand how to use W&B Artifacts. See the [Guide on Artifacts](/models/artifacts). + +Return an object from a Python function to write a W&B Artifact. W&B supports the following objects: + +* Python objects (`int`, `dict`, `list`, and so on) +* W&B objects (Table, Image, Graph, and so on) * W&B Artifact objects The following examples demonstrate how to write W&B Artifacts with Dagster assets (`@asset`): @@ -160,7 +178,7 @@ The following examples demonstrate how to write W&B Artifacts with Dagster asset -Anything that can be serialized with the [pickle](https://docs.python.org/3/library/pickle.html) module is pickled and added to an Artifact created by the integration. The content is unpickled when you read that Artifact inside Dagster (see [Read artifacts](#read-wb-artifacts) for more details). +Anything that can be serialized with the [pickle](https://docs.python.org/3/library/pickle.html) module is pickled and added to an Artifact created by the integration. The content is unpickled when you read that Artifact inside Dagster (see [Read artifacts](#read-wb-artifacts) for more details). ```python @asset( @@ -177,7 +195,7 @@ def create_dataset(): ``` -W&B supports multiple Pickle-based serialization modules ([pickle](https://docs.python.org/3/library/pickle.html), [dill](https://github.com/uqfoundation/dill), [cloudpickle](https://github.com/cloudpipe/cloudpickle), [joblib](https://github.com/joblib/joblib)). You can also use more advanced serialization like [ONNX](https://onnx.ai/) or [PMML](https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language). Please refer to the [Serialization](#serialization-configuration) section for more information. +W&B supports multiple Pickle-based serialization modules ([pickle](https://docs.python.org/3/library/pickle.html), [dill](https://github.com/uqfoundation/dill), [cloudpickle](https://github.com/cloudpipe/cloudpickle), [joblib](https://github.com/joblib/joblib)). You can also use more advanced serialization like [ONNX](https://onnx.ai/) or [PMML](https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language). Refer to the [Serialization](#serialization-configuration) section for more information. Any W&B object, such as a [Table](/models/ref/python/data-types/table) or [Image](/models/ref/python/data-types/image), is added to an Artifact created by the integration. This example adds a Table to an Artifact: @@ -199,7 +217,7 @@ def create_dataset_in_table(): ``` -For complex use cases, it might be necessary to build your own Artifact object. The integration still provides useful additional features like augmenting the metadata on both sides of the integration. +For complex use cases, you might need to build your own Artifact object. The integration still provides additional features like augmenting the metadata on both sides of the integration. ```python import wandb @@ -221,18 +239,20 @@ def create_artifact(): ### Configuration -A configuration dictionary called wandb_artifact_configuration can be set on an `@op`, `@asset` and `@multi_asset`. This dictionary must be passed in the decorator arguments as metadata. This configuration is required to control the IO Manager reads and writes of W&B Artifacts. -For `@op`, it’s located in the output metadata through the [Out](https://docs.dagster.io/_apidocs/ops#dagster.Out) metadata argument. -For `@asset`, it’s located in the metadata argument on the asset. -For `@multi_asset`, it’s located in each output metadata through the [AssetOut](https://docs.dagster.io/_apidocs/assets#dagster.AssetOut) metadata arguments. +You can set a configuration dictionary called `wandb_artifact_configuration` on an `@op`, `@asset`, and `@multi_asset`. Pass this dictionary in the decorator arguments as metadata. This configuration is required to control the IO Manager reads and writes of W&B Artifacts. + +For `@op`, it's located in the output metadata through the [Out](https://docs.dagster.io/_apidocs/ops#dagster.Out) metadata argument. +For `@asset`, it's located in the metadata argument on the asset. +For `@multi_asset`, it's located in each output metadata through the [AssetOut](https://docs.dagster.io/_apidocs/assets#dagster.AssetOut) metadata arguments. -The following code examples demonstrate how to configure a dictionary on an `@op`, `@asset` and `@multi_asset` computations: +The following code examples demonstrate how to configure a dictionary on an `@op`, `@asset`, and `@multi_asset` computations: Example for `@op`: -```python + +```python @op( out=Out( metadata={ @@ -249,6 +269,7 @@ def create_dataset(): Example for `@asset`: + ```python @asset( name="my_artifact", @@ -263,7 +284,7 @@ def create_dataset(): return [1, 2, 3] ``` -You do not need to pass a name through the configuration because the @asset already has a name. The integration sets the Artifact name as the asset name. +You don't need to pass a name through the configuration because the `@asset` already has a name. The integration sets the Artifact name as the asset name. Example for `@multi_asset`: @@ -302,16 +323,17 @@ def create_datasets(): -Supported properties: -* `name`: (str) human-readable name for this artifact, which is how you can identify this artifact in the UI or reference it in use_artifact calls. Names can contain letters, numbers, underscores, hyphens, and dots. The name must be unique across a project. Required for `@op`. -* `type`: (str) The type of the artifact, which is used to organize and differentiate artifacts. Common types include dataset or model, but you can use any string containing letters, numbers, underscores, hyphens, and dots. Required when the output is not already an Artifact. -* `description`: (str) Free text that offers a description of the artifact. The description is markdown rendered in the UI, so this is a good place to place tables, links, etc. -* `aliases`: (list[str]) An array containing one or more aliases you want to apply on the Artifact. The integration will also add the “latest” tag to that list whether it’s set or not. This is an effective way for you to manage versioning of models and datasets. +The following properties are supported: + +* `name`: (str) A human-readable name for this artifact that you can use to identify it in the UI or reference it in `use_artifact` calls. Names can contain letters, numbers, underscores, hyphens, and dots. The name must be unique across a project. Required for `@op`. +* `type`: (str) The type of the artifact, used to organize and differentiate artifacts. Common types include dataset or model, but you can use any string containing letters, numbers, underscores, hyphens, and dots. Required when the output isn't already an Artifact. +* `description`: (str) Free text that describes the artifact. The description is rendered as markdown in the UI, so it's a good place for tables, links, and so on. +* `aliases`: (list[str]) An array containing one or more aliases you want to apply on the Artifact. The integration also adds the "latest" tag to that list whether it's set or not. Use aliases to manage versioning of models and datasets. * [`add_dirs`](/models/ref/python/experiments/artifact#add_dir): (list[dict[str, Any]]): An array containing configuration for each local directory to include in the Artifact. * [`add_files`](/models/ref/python/experiments/artifact#add_file): (list[dict[str, Any]]): An array containing configuration for each local file to include in the Artifact. * [`add_references`](/models/ref/python/experiments/artifact#add_reference): (list[dict[str, Any]]): An array containing configuration for each external reference to include in the Artifact. -* `serialization_module`: (dict) Configuration of the serialization module to be used. Refer to the Serialization section for more information. - * `name`: (str) Name of the serialization module. Accepted values: `pickle`, `dill`, `cloudpickle`, `joblib`. The module needs to be available locally. +* `serialization_module`: (dict) Configuration of the serialization module to use. Refer to the Serialization section for more information. + * `name`: (str) Name of the serialization module. Accepted values: `pickle`, `dill`, `cloudpickle`, `joblib`. The module must be available locally. * `parameters`: (dict[str, Any]) Optional arguments passed to the serialization function. It accepts the same parameters as the dump method for that module. For example, `{"compress": 3, "protocol": 4}`. Advanced example: @@ -360,8 +382,9 @@ def create_advanced_artifact(): -The asset is materialized with useful metadata on both sides of the integration: -* W&B side: the source integration name and version, the python version used, the pickle protocol version and more. +The asset is materialized with metadata on both sides of the integration: + +* W&B side: the source integration name and version, the Python version used, the pickle protocol version, and more. * Dagster side: * Dagster Run ID * W&B Run: ID, name, path, URL @@ -369,13 +392,13 @@ The asset is materialized with useful metadata on both sides of the integration: * W&B Entity * W&B Project -The following image demonstrates the metadata from W&B that was added to the Dagster asset. This information is propagated to Dagster by the integration. +The following image demonstrates the metadata from W&B that the integration adds to the Dagster asset. The integration propagates this information to Dagster. Dagster's UI with an asset details view with attached W&B metadata, including references to a W&B project and run -The following image demonstrates how the provided configuration was enriched with useful metadata on the W&B Artifact. This information should help for reproducibility and maintenance. It would not be available without the integration. +The following image demonstrates how the provided configuration is enriched with metadata on the W&B Artifact. This information helps with reproducibility and maintenance. It isn't available without the integration. W&B Artifact page with enriched configuration metadata from Dagster @@ -389,18 +412,19 @@ The following image demonstrates how the provided configuration was enriched wi -If you use a static type checker like mypy, import the configuration type definition object using: +If you use a static type checker like mypy, import the configuration type definition object using the following: ```python from dagster_wandb import WandbArtifactConfiguration ``` -### Using partitions +### Use partitions -The integration natively supports [Dagster partitions](https://docs.dagster.io/guides/build/partitions-and-backfills). +The integration directly supports [Dagster partitions](https://docs.dagster.io/guides/build/partitions-and-backfills). + +The following is an example with a partitioned using `DailyPartitionsDefinition`: -The following is an example with a partitioned using `DailyPartitionsDefinition`. ```python @asset( partitions_def=DailyPartitionsDefinition(start_date="2023-01-01", end_date="2023-02-01"), @@ -417,10 +441,10 @@ def create_my_daily_partitioned_asset(context): context.log.info(f"Creating partitioned asset for {partition_key}") return random.randint(0, 100) ``` -This code will produce one W&B Artifact for each partition. View artifacts in the Artifact panel (UI) under the asset name, which has the partition key appended. For example, `my_daily_partitioned_asset.2023-01-01`, `my_daily_partitioned_asset.2023-01-02`, or`my_daily_partitioned_asset.2023-01-03`. Assets that are partitioned across multiple dimensions shows each dimension in dot-delimited format. For example, `my_asset.car.blue`. +This code produces one W&B Artifact for each partition. View artifacts in the Artifact panel (UI) under the asset name, which has the partition key appended. For example, `my_daily_partitioned_asset.2023-01-01`, `my_daily_partitioned_asset.2023-01-02`, or `my_daily_partitioned_asset.2023-01-03`. Assets that are partitioned across multiple dimensions show each dimension in dot-delimited format. For example, `my_asset.car.blue`. -The integration does not allow for the materialization of multiple partitions within one run. You will need to carry out multiple runs to materialize your assets. This can be executed in Dagit when you're materializing your assets. +The integration doesn't allow for the materialization of multiple partitions within one run. You need to perform multiple runs to materialize your assets. You can execute this in Dagit when you're materializing your assets. Dagster UI with multiple runs for partitioned assets, each partition as a separate run @@ -428,6 +452,9 @@ The integration does not allow for the materialization of multiple partitions wi #### Advanced usage + +For advanced usage, see the following examples: + - [Partitioned job](https://github.com/dagster-io/dagster/blob/master/examples/with_wandb/with_wandb/ops/partitioned_job.py) - [Simple partitioned asset](https://github.com/wandb/dagster/blob/master/examples/with_wandb/with_wandb/assets/simple_partitions_example.py) - [Multi-partitioned asset](https://github.com/wandb/dagster/blob/master/examples/with_wandb/with_wandb/assets/multi_partitions_example.py) @@ -435,20 +462,23 @@ The integration does not allow for the materialization of multiple partitions wi ## Read W&B Artifacts -Reading W&B Artifacts is similar to writing them. A configuration dictionary called `wandb_artifact_configuration` can be set on an `@op` or `@asset`. The only difference is that we must set the configuration on the input instead of the output. -For `@op`, it’s located in the input metadata through the [In](https://docs.dagster.io/_apidocs/ops#dagster.In) metadata argument. You need to -explicitly pass the name of the Artifact. +Now that you can write Artifacts from Dagster, the following sections describe how to consume them as inputs to downstream ops and assets. -For `@asset`, it’s located in the input metadata through the [Asset](https://docs.dagster.io/_apidocs/assets#dagster.AssetIn) In metadata argument. You should not pass an Artifact name because the name of the parent asset should match it. +Reading W&B Artifacts is similar to writing them. You can set a configuration dictionary called `wandb_artifact_configuration` on an `@op` or `@asset`. The only difference is that you set the configuration on the input instead of the output. -If you want to have a dependency on an Artifact created outside the integration you will need to use [SourceAsset](https://docs.dagster.io/_apidocs/assets#dagster.SourceAsset). It will always read the latest version of that asset. +For `@op`, it's located in the input metadata through the [In](https://docs.dagster.io/_apidocs/ops#dagster.In) metadata argument. You need to explicitly pass the name of the Artifact. + +For `@asset`, it's located in the input metadata through the [Asset](https://docs.dagster.io/_apidocs/assets#dagster.AssetIn) In metadata argument. Don't pass an Artifact name because the name of the parent asset should match it. + +If you want a dependency on an Artifact created outside the integration, use [SourceAsset](https://docs.dagster.io/_apidocs/assets#dagster.SourceAsset). It always reads the latest version of that asset. The following examples demonstrate how to read an Artifact from various ops. -Reading an artifact from an `@op` +Reading an artifact from an `@op`: + ```python @op( ins={ @@ -467,7 +497,8 @@ def read_artifact(context, artifact): ``` -Reading an artifact created by another `@asset` +Reading an artifact created by another `@asset`: + ```python @asset( name="my_asset", @@ -503,9 +534,10 @@ def read_artifact(context, my_artifact): ### Configuration -The following configuration is used to indicate what the IO Manager should collect and provide as inputs to the decorated functions. The following read patterns are supported. -1. To get an named object contained within an Artifact use get: +The following configuration indicates what the IO Manager should collect and provide as inputs to the decorated functions. The following read patterns are supported: + +- To get a named object contained within an Artifact, use `get`: ```python @asset( @@ -526,7 +558,7 @@ def get_table(context, table): ``` -2. To get the local path of a downloaded file contained within an Artifact use get_path: +- To get the local path of a downloaded file contained within an Artifact, use `get_path`: ```python @asset( @@ -546,7 +578,8 @@ def get_path(context, path): context.log.info(path) ``` -3. To get the entire Artifact object (with the content downloaded locally): +- To get the entire Artifact object (with the content downloaded locally): + ```python @asset( ins={ @@ -561,28 +594,30 @@ def get_artifact(context, artifact): ``` -Supported properties +The following properties are supported: + * `get`: (str) Gets the W&B object located at the artifact relative name. * `get_path`: (str) Gets the path to the file located at the artifact relative name. ### Serialization configuration -By default, the integration will use the standard [pickle](https://docs.python.org/3/library/pickle.html) module, but some objects are not compatible with it. For example, functions with yield will raise an error if you try to pickle them. -We support more Pickle-based serialization modules ([dill](https://github.com/uqfoundation/dill), [cloudpickle](https://github.com/cloudpipe/cloudpickle), [joblib](https://github.com/joblib/joblib)). You can also use more advanced serialization like [ONNX](https://onnx.ai/) or [PMML](https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language) by returning a serialized string or creating an Artifact directly. The right choice will depend on your use case, please refer to the available literature on this subject. +By default, the integration uses the standard [pickle](https://docs.python.org/3/library/pickle.html) module, but some objects aren't compatible with it. For example, functions with yield raise an error if you try to pickle them. + +W&B supports more Pickle-based serialization modules ([dill](https://github.com/uqfoundation/dill), [cloudpickle](https://github.com/cloudpipe/cloudpickle), [joblib](https://github.com/joblib/joblib)). You can also use more advanced serialization like [ONNX](https://onnx.ai/) or [PMML](https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language) by returning a serialized string or creating an Artifact directly. The right choice depends on your use case. Refer to the available literature on this subject. ### Pickle-based serialization modules -Pickling is known to be insecure. If security is a concern please only use W&B objects. We recommend signing your data and storing the hash keys in your own systems. For more complex use cases don’t hesitate to contact us, we will be happy to help. +Pickling is known to be insecure. If security is a concern, use only W&B objects. Sign your data and store the hash keys in your own systems. For more complex use cases, contact W&B Support. -You can configure the serialization used through the `serialization_module` dictionary in the `wandb_artifact_configuration`. Please make sure the module is available on the machine running Dagster. +You can configure the serialization used through the `serialization_module` dictionary in the `wandb_artifact_configuration`. Make sure the module is available on the machine running Dagster. -The integration will automatically know which serialization module to use when you read that Artifact. +The integration automatically detects which serialization module to use when you read that Artifact. -The currently supported modules are `pickle`, `dill`, `cloudpickle`, and `joblib`. +The supported modules are `pickle`, `dill`, `cloudpickle`, and `joblib`. -Here’s a simplified example where we create a “model” serialized with joblib and then use it for inference. +Here's a simplified example that creates a "model" serialized with joblib and then uses it for inference. ```python @asset( @@ -626,13 +661,15 @@ def use_model_serialized_with_joblib( ``` ### Advanced serialization formats (ONNX, PMML) -It’s common to use interchange file formats like ONNX and PMML. The integration supports those formats but it requires a bit more work than for Pickle-based serialization. -There are two different methods to use those formats. -1. Convert your model to the selected format, then return the string representation of that format as if it were a normal Python objects. The integration will pickle that string. You can then rebuild your model using that string. -2. Create a new local file with your serialized model, then build a custom Artifact with that file using the add_file configuration. +Interchange file formats like ONNX and PMML are common. The integration supports those formats, but it requires a bit more work than Pickle-based serialization. + +You can use these formats with one of the following methods: + +- Convert your model to the selected format, then return the string representation of that format as if it were a normal Python object. The integration pickles that string. You can then rebuild your model using that string. +- Create a new local file with your serialized model, then build a custom Artifact with that file using the `add_file` configuration. -Here’s an example of a Scikit-learn model being serialized using ONNX. +Here's an example of a Scikit-learn model serialized using ONNX. ```python import numpy @@ -711,9 +748,9 @@ def use_onnx_model(context, my_onnx_model, my_test_set): return pred_onx ``` -### Using partitions +### Use partitions -The integration natively supports [Dagster partitions](https://docs.dagster.io/guides/build/partitions-and-backfills). +The integration directly supports [Dagster partitions](https://docs.dagster.io/guides/build/partitions-and-backfills). You can selectively read one, multiple, or all partitions of an asset. @@ -722,7 +759,8 @@ All partitions are provided in a dictionary, with the key and value representing -It reads all partitions of the upstream `@asset`, which are given as a dictionary. In this dictionary, the key and value correlate to the partition key and the Artifact content, respectively. +This reads all partitions of the upstream `@asset`, which are given as a dictionary. In this dictionary, the key and value correlate to the partition key and the Artifact content, respectively. + ```python @asset( compute_kind="wandb", @@ -735,7 +773,8 @@ def read_all_partitions(context, my_daily_partitioned_asset): ``` -The `AssetIn`'s `partition_mapping` configuration allows you to choose specific partitions. In this case, we are employing the `TimeWindowPartitionMapping`. +The `AssetIn`'s `partition_mapping` configuration lets you choose specific partitions. In this case, the example uses the `TimeWindowPartitionMapping`. + ```python @asset( partitions_def=DailyPartitionsDefinition(start_date="2023-01-01", end_date="2023-02-01"), @@ -760,20 +799,19 @@ The object `metadata` contains a key named `wandb_artifact_configuration` which The `partitions` object maps the name of each partition to its configuration. The configuration for each partition can specify how to retrieve data from it. These configurations can contain different keys, namely `get`, `version`, and `alias`, depending on the requirements of each partition. -**Configuration keys** +#### Configuration keys + +The following configuration keys are supported: -1. `get`: -The `get` key specifies the name of the W&B Object (Table, Image...) where to fetch the data. -2. `version`: -The `version` key is used when you want to fetch a specific version for the Artifact. -3. `alias`: -The `alias` key allows you to get the Artifact by its alias. +- `get`: The `get` key specifies the name of the W&B Object (Table, Image, and so on) to fetch the data from. +- `version`: Use the `version` key when you want to fetch a specific version of the Artifact. +- `alias`: The `alias` key lets you get the Artifact by its alias. -**Wildcard configuration** +#### Wildcard configuration -The wildcard `"*"` stands for all non-configured partitions. This provides a default configuration for partitions that are not explicitly mentioned in the `partitions` object. +The wildcard `"*"` stands for all non-configured partitions. This provides a default configuration for partitions that aren't explicitly mentioned in the `partitions` object. -For example, +For example: ```python "*": { @@ -782,11 +820,11 @@ For example, ``` This configuration means that for all partitions not explicitly configured, data is fetched from the table named `default_table_name`. -**Specific partition configuration** +#### Specific partition configuration You can override the wildcard configuration for specific partitions by providing their specific configurations using their keys. -For example, +For example: ```python "yellow": { @@ -794,13 +832,13 @@ For example, }, ``` -This configuration means that for the partition named `yellow`, data will be fetched from the table named `custom_table_name`, overriding the wildcard configuration. +This configuration means that for the partition named `yellow`, data is fetched from the table named `custom_table_name`, overriding the wildcard configuration. -**Versioning and aliasing** +#### Versioning and aliasing For versioning and aliasing purposes, you can provide specific `version` and `alias` keys in your configuration. -For versions, +For versions: ```python "orange": { @@ -808,9 +846,9 @@ For versions, }, ``` -This configuration will fetch data from the version `v0` of the `orange` Artifact partition. +This configuration fetches data from version `v0` of the `orange` Artifact partition. -For aliases, +For aliases: ```python "blue": { @@ -818,44 +856,51 @@ For aliases, }, ``` -This configuration will fetch data from the table `default_table_name` of the Artifact partition with the alias `special_alias` (referred to as `blue` in the configuration). +This configuration fetches data from the table `default_table_name` of the Artifact partition with the alias `special_alias` (referred to as `blue` in the configuration). ### Advanced usage -To view advanced usage of the integration please refer to the following full code examples: + +To view advanced usage of the integration, refer to the following full code examples: + * [Advanced usage example for assets](https://github.com/dagster-io/dagster/blob/master/examples/with_wandb/with_wandb/assets/advanced_example.py) * [Partitioned job example](https://github.com/dagster-io/dagster/blob/master/examples/with_wandb/with_wandb/ops/partitioned_job.py) * [Linking a model to the Model Registry](https://github.com/dagster-io/dagster/blob/master/examples/with_wandb/with_wandb/assets/model_registry_example.py) -## Using W&B Launch +## Use W&B Launch + +The following sections describe how to combine the Dagster integration with W&B Launch to run training jobs on dedicated compute, either locally or remotely. -Beta product in active development +Beta product in active development. Interested in Launch? Reach out to your account team to talk about joining the customer pilot program for W&B Launch. -Pilot customers need to use AWS EKS or SageMaker to qualify for the beta program. We ultimately plan to support additional platforms. +Pilot customers need to use AWS EKS or SageMaker to qualify for the beta program. Additional platforms are planned. -Before continuing, we recommend you to have a good understanding of how to use W&B Launch. Consider reading the [Guide on Launch](/platform/launch). +Before continuing, make sure you understand how to use W&B Launch. See the [Guide on Launch](/platform/launch). The Dagster integration helps with: + * Running one or multiple Launch agents in your Dagster instance. * Executing local Launch jobs within your Dagster instance. * Remote Launch jobs on-prem or in a cloud. ### Launch agents -The integration provides an importable `@op` called `run_launch_agent`. It starts a Launch Agent and runs it as a long running process until stopped manually. + +The integration provides an importable `@op` called `run_launch_agent`. It starts a Launch Agent and runs it as a long-running process until stopped manually. Agents are processes that poll launch queues and execute the jobs (or dispatch them to external services to be executed) in order. Refer to the [Launch page](/platform/launch). -You can also view useful descriptions for all properties in Launchpad. +You can also view descriptions for all properties in Launchpad. W&B Launchpad interface with agent configuration options and descriptions for Dagster integration -Simple example +Example: + ```python # add this to your config.yaml # alternatively you can set the config in Dagit's Launchpad or JobDefinition.execute_in_process @@ -893,20 +938,22 @@ def run_launch_agent_example(): ``` ### Launch jobs + The integration provides an importable `@op` called `run_launch_job`. It executes your Launch job. -A Launch job is assigned to a queue in order to be executed. You can create a queue or use the default one. Make sure you have an active agent listening to that queue. You can run an agent inside your Dagster instance but can also consider using a deployable agent in Kubernetes. +A Launch job is assigned to a queue to be executed. You can create a queue or use the default one. Make sure you have an active agent listening to that queue. You can run an agent inside your Dagster instance, or consider using a deployable agent in Kubernetes. Refer to the [Launch page](/platform/launch). -You can also view useful descriptions for all properties in Launchpad. +You can also view descriptions for all properties in Launchpad. W&B Launchpad interface with job configuration options and descriptions for Dagster integration -Simple example +Example: + ```python # add this to your config.yaml # alternatively you can set the config in Dagit's Launchpad or JobDefinition.execute_in_process @@ -943,35 +990,45 @@ from dagster import job, make_values_resource }, ) def run_launch_job_example(): - run_launch_job.alias("my_launched_job")() # we rename the job with an alias + run_launch_job.alias("my_launched_job")() # rename the job with an alias ``` ## Best practices -1. Use the IO Manager to read and write Artifacts. -Avoid using [`Artifact.download()`](/models/ref/python/experiments/artifact#download) or [`Run.log_artifact()`](/models/ref/python/experiments/run#log_artifact) directly. Those methods are handled by integration. Instead, return the data you want to store in the Artifact and let the integration do the rest. This approach provides better lineage for the Artifact. +The following recommendations help you get the most out of the integration once you have it working end to end. + +### Use the IO Manager to read and write Artifacts + +Avoid using [`Artifact.download()`](/models/ref/python/experiments/artifact#download) or [`Run.log_artifact()`](/models/ref/python/experiments/run#log_artifact) directly. The integration handles those methods. Instead, return the data you want to store in the Artifact and let the integration do the rest. This approach provides better lineage for the Artifact. + +### Only build an Artifact object yourself for complex use cases + +Return Python objects and W&B objects from your ops and assets. The integration handles bundling the Artifact. +For complex use cases, you can build an Artifact directly in a Dagster job. Pass an Artifact object to the integration for metadata enrichment, such as the source integration name and version, the Python version used, the pickle protocol version, and more. + +### Add files, directories, and external references to your Artifacts through the metadata + +Use the integration `wandb_artifact_configuration` object to add any file, directory, or external references (Amazon S3, GCS, HTTP, and so on). See the advanced example in the [Artifact configuration section](#configuration-1) for more information. -2. Only build an Artifact object yourself for complex use cases. -Python objects and W&B objects should be returned from your ops/assets. The integration handles bundling the Artifact. -For complex use cases, you can build an Artifact directly in a Dagster job. We recommend you pass an Artifact object to the integration for metadata enrichment such as the source integration name and version, the python version used, the pickle protocol version and more. +### Use an @asset instead of an @op when an Artifact is produced -3. Add files, directories and external references to your Artifacts through the metadata. -Use the integration `wandb_artifact_configuration` object to add any file, directory or external references (Amazon S3, GCS, HTTP…). See the advanced example in the [Artifact configuration section](#configuration-1) for more information. +Artifacts are assets. Use an asset when Dagster maintains that asset. This provides better observability in the Dagit Asset Catalog. -4. Use an @asset instead of an @op when an Artifact is produced. -Artifacts are assets. It is recommended to use an asset when Dagster maintains that asset. This will provide better observability in the Dagit Asset Catalog. +### Use a SourceAsset to consume an Artifact created outside Dagster -5. Use a SourceAsset to consume an Artifact created outside Dagster. -This allows you to take advantage of the integration to read externally created Artifacts. Otherwise, you can only use Artifacts created by the integration. +This lets you take advantage of the integration to read externally created Artifacts. Otherwise, you can only use Artifacts created by the integration. -6. Use W&B Launch to orchestrate training on dedicated compute for large models. -You can train small models inside your Dagster cluster and you can run Dagster in a Kubernetes cluster with GPU nodes. We recommend using W&B Launch for large model training. This will prevent overloading your instance and provide access to more adequate compute. +### Use W&B Launch to orchestrate training on dedicated compute for large models -7. When experiment tracking within Dagster, set your W&B Run ID to the value of your Dagster Run ID. -We recommend that you both: make the [Run resumable](/models/runs/resuming) and set the W&B Run ID to the Dagster Run ID or to a string of your choice. Following this recommendation ensures your W&B metrics and W&B Artifacts are stored in the same W&B Run when you train models inside of Dagster. +You can train small models inside your Dagster cluster, and you can run Dagster in a Kubernetes cluster with GPU nodes. Use W&B Launch for large model training. This prevents overloading your instance and provides access to more appropriate compute. +### Set your W&B Run ID to the value of your Dagster Run ID when experiment tracking within Dagster + +Make the [Run resumable](/models/runs/resuming) and set the W&B Run ID to the Dagster Run ID or to a string of your choice. Following this recommendation ensures your W&B metrics and W&B Artifacts are stored in the same W&B Run when you train models inside Dagster. + + +Either set the W&B Run ID to the Dagster Run ID: -Either set the W&B Run ID to the Dagster Run ID. ```python wandb.init( id=context.run_id, @@ -981,7 +1038,8 @@ wandb.init( ``` -Or choose your own W&B Run ID and pass it to the IO Manager configuration. +Or choose your own W&B Run ID and pass it to the IO Manager configuration: + ```python wandb.init( id="my_resumable_run_id", @@ -998,10 +1056,12 @@ wandb.init( ) ``` -8. Only collect data you need with get or get_path for large W&B Artifacts. -By default, the integration will download an entire Artifact. If you are using very large artifacts you might want to only collect the specific files or objects you need. This will improve speed and resource utilization. +### Only collect data you need with get or get_path for large W&B Artifacts + +By default, the integration downloads an entire Artifact. If you're using large artifacts, you might want to collect only the specific files or objects you need. This improves speed and resource utilization. + +### For Python objects, adapt the pickling module to your use case -9. For Python objects adapt the pickling module to your use case. -By default, the W&B integration will use the standard [pickle](https://docs.python.org/3/library/pickle.html) module. But some objects are not compatible with it. For example, functions with yield will raise an error if you try to pickle them. W&B supports other Pickle-based serialization modules ([dill](https://github.com/uqfoundation/dill), [cloudpickle](https://github.com/cloudpipe/cloudpickle), [joblib](https://github.com/joblib/joblib)). +By default, the W&B integration uses the standard [pickle](https://docs.python.org/3/library/pickle.html) module. But some objects aren't compatible with it. For example, functions with yield raise an error if you try to pickle them. W&B supports other Pickle-based serialization modules ([dill](https://github.com/uqfoundation/dill), [cloudpickle](https://github.com/cloudpipe/cloudpickle), [joblib](https://github.com/joblib/joblib)). -You can also use more advanced serialization like [ONNX](https://onnx.ai/) or [PMML](https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language) by returning a serialized string or creating an Artifact directly. The right choice will depend on your use case, refer to the available literature on this subject. +You can also use more advanced serialization like [ONNX](https://onnx.ai/) or [PMML](https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language) by returning a serialized string or creating an Artifact directly. The right choice depends on your use case. Refer to the available literature on this subject. diff --git a/models/integrations/databricks.mdx b/models/integrations/databricks.mdx index 63f2950e9c..a889829763 100644 --- a/models/integrations/databricks.mdx +++ b/models/integrations/databricks.mdx @@ -1,19 +1,22 @@ --- description: "Integrate W&B with Databricks for experiment tracking, metric logging, and model management on Spark clusters." title: Databricks +keywords: ["dbfs", "cluster init script", "notebook tracking"] --- -W&B integrates with [Databricks](https://www.databricks.com/) by customizing the W&B Jupyter notebook experience in the Databricks environment. +W&B integrates with [Databricks](https://www.databricks.com/) by customizing the W&B Jupyter notebook experience in the Databricks environment. This page shows you how to install and authenticate W&B on a Databricks cluster so that you can track experiments and log metrics from notebooks running on Spark. ## Configure Databricks -1. Install wandb in the cluster +To use W&B from a Databricks notebook, you must install the `wandb` package on the cluster and configure authentication so your notebooks can log to W&B. - Navigate to your cluster configuration, choose your cluster, click **Libraries**. Click **Install New**, choose **PyPI**, and add the package `wandb`. +1. Install `wandb` in the cluster + + In your cluster configuration, choose your cluster, then click **Libraries** > **Install New** > **PyPI**, and add the package `wandb`. 2. Set up authentication - To authenticate your W&B account you can add a Databricks secret which your notebooks can query. + To authenticate your W&B account, add a Databricks secret that your notebooks can query at runtime. This avoids hard-coding your API key in notebooks. ```bash # install databricks cli @@ -34,7 +37,9 @@ W&B integrates with [Databricks](https://www.databricks.com/) by customizing the ## Examples -### Simple example +The following examples show how to use the preceding secret to log in and begin logging from a Databricks notebook. + +### Basic example ```python import os @@ -49,12 +54,11 @@ with wandb.init() as run: ### Sweeps -Setup required (temporary) for notebooks attempting to use wandb.sweep() or wandb.agent(): +Notebooks that use `wandb.sweep()` or `wandb.agent()` must set the entity and project as environment variables: ```python import os -# These will not be necessary in the future os.environ["WANDB_ENTITY"] = "my-entity" os.environ["WANDB_PROJECT"] = "my-project-that-exists" ``` diff --git a/models/integrations/deepchecks.mdx b/models/integrations/deepchecks.mdx index 70f42f568c..991da6c6ac 100644 --- a/models/integrations/deepchecks.mdx +++ b/models/integrations/deepchecks.mdx @@ -1,18 +1,19 @@ --- description: "Integrate W&B with Deepchecks to validate ML models and datasets with automated testing and experiment tracking." title: DeepChecks +keywords: ["data integrity", "train test validation", "deepchecks suite"] --- import { ColabLink } from '/snippets/_includes/colab-link.mdx'; -DeepChecks helps you validate your machine learning models and data, such as verifying your data’s integrity, inspecting its distributions, validating data splits, evaluating your model and comparing between different models, all with minimal effort. +DeepChecks helps you validate your machine learning models and data with minimal effort. You can verify your data's integrity, inspect its distributions, validate data splits, evaluate your model, and compare different models. This page shows how to use the DeepChecks integration with W&B so you can log validation results and test suites alongside your experiments. -[Read more about DeepChecks and the wandb integration ->](https://docs.deepchecks.com/stable/general/usage/exporting_results/auto_examples/plot_exports_output_to_wandb.html) +For more information, see the [DeepChecks W&B integration guide](https://docs.deepchecks.com/stable/general/usage/exporting_results/auto_examples/plot_exports_output_to_wandb.html). -## Getting started +## Get started -To use DeepChecks with W&B you will first need to sign up for a [W&B account](https://wandb.ai/site). With the W&B integration in DeepChecks you can quickly get started like so: +To use DeepChecks with W&B, first sign up for a [W&B account](https://wandb.ai/site). With the W&B integration in DeepChecks, you can get started by running a single check and pushing the result to W&B: ```python import wandb @@ -29,7 +30,7 @@ result = ModelErrorAnalysis() result.to_wandb() ``` -You can also log an entire DeepChecks test suite to W&B. +In addition to logging individual checks, you can log an entire DeepChecks test suite to W&B: ```python import wandb @@ -49,10 +50,10 @@ suite_result.to_wandb(project="my-suite-project", config={"suite-name": "full-su ## Example -[This Report](https://wandb.ai/cayush/deepchecks/reports/Validate-your-Data-and-Models-with-Deepchecks-and-W-B--VmlldzoxNjY0ODc5) shows off the power of using DeepChecks and W&B. +To see what the integration looks like in practice, explore the [Validate your data and models with Deepchecks and W&B report](https://wandb.ai/cayush/deepchecks/reports/Validate-your-Data-and-Models-with-Deepchecks-and-W-B--VmlldzoxNjY0ODc5), which demonstrates how to use DeepChecks and W&B together. Deepchecks data validation results -Any questions or issues about this W&B integration? Open an issue in the [DeepChecks github repository](https://github.com/deepchecks/deepchecks) and we'll catch it and get you an answer. \ No newline at end of file +If you have questions or issues about this W&B integration, open an issue in the [DeepChecks GitHub repository](https://github.com/deepchecks/deepchecks) and we'll get you an answer. \ No newline at end of file diff --git a/models/integrations/deepchem.mdx b/models/integrations/deepchem.mdx index a53e6021c4..e8c712ee54 100644 --- a/models/integrations/deepchem.mdx +++ b/models/integrations/deepchem.mdx @@ -1,11 +1,14 @@ --- description: "Integrate W&B with the DeepChem library for experiment tracking and visualization of molecular ML models." title: DeepChem +keywords: ["cheminformatics", "molecular property prediction"] --- import ApiKeyCreateStreamlined from "/snippets/_includes/api-key-create-streamlined.mdx"; -The [DeepChem library](https://github.com/deepchem/deepchem) provides open source tools that democratize the use of deep-learning in drug discovery, materials science, chemistry, and biology. This W&B integration adds simple and easy-to-use experiment tracking and model checkpointing while training models using DeepChem. +The [DeepChem library](https://github.com/deepchem/deepchem) provides open source tools that democratize the use of deep learning in drug discovery, materials science, chemistry, and biology. This W&B integration adds experiment tracking and model checkpointing while training models with DeepChem. + +Use this page to add W&B logging to your DeepChem training workflow so that you can track training loss, evaluation metrics, and model checkpoints across experiments. This guide is for users who already train models with DeepChem and want to add experiment tracking with minimal code changes. ## DeepChem logging in 3 lines of code @@ -15,18 +18,23 @@ model = TorchModel(…, wandb_logger=logger) model.fit(…) ``` +Passing a `WandbLogger` instance into a DeepChem model attaches W&B logging to the training run, so metrics produced during `fit` automatically stream to your W&B project. + DeepChem molecular analysis ## Report and Google Colab -Explore the Using [W&B with DeepChem: Molecular Graph Convolutional Networks](https://wandb.ai/kshen/deepchem_graphconv/reports/Using-W-B-with-DeepChem-Molecular-Graph-Convolutional-Networks--Vmlldzo4MzU5MDc?galleryTag=) article for an example charts generated using the W&B DeepChem integration. +The following resources show the integration in practice before you wire it into your own code: -To dive straight into working code, check out this [Google Colab](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/deepchem/W%26B_x_DeepChem.ipynb). +- Explore the [W&B with DeepChem: Molecular Graph Convolutional Networks](https://wandb.ai/kshen/deepchem_graphconv/reports/Using-W-B-with-DeepChem-Molecular-Graph-Convolutional-Networks--Vmlldzo4MzU5MDc?galleryTag=) article for example charts generated using the W&B DeepChem integration. +- To dive straight into working code, check out this [Google Colab](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/deepchem/W%26B_x_DeepChem.ipynb). ## Track experiments +The remainder of this page walks through how to set up an API key, install the `wandb` library, and enable logging for either a `TorchModel` or `KerasModel`. + Set up W&B for DeepChem models of type [KerasModel](https://deepchem.readthedocs.io/en/latest/api_reference/models.html#keras-models) or [TorchModel](https://deepchem.readthedocs.io/en/latest/api_reference/models.html#pytorch-models). ### Sign up and create an API key @@ -35,6 +43,8 @@ An API key authenticates your machine to W&B. You can generate an API key from y +To find your API key in the W&B app: + 1. Click your user profile icon in the upper right corner. 1. Select **User Settings**, then scroll to the **API Keys** section. @@ -47,7 +57,7 @@ To install the `wandb` library locally and log in: 1. Set the `WANDB_API_KEY` [environment variable](/models/track/environment-variables/) to your API key. ```bash - export WANDB_API_KEY= + export WANDB_API_KEY=[YOUR_API_KEY] ``` 1. Install the `wandb` library and log in. @@ -81,7 +91,9 @@ wandb.login() ### Log your training and evaluation data to W&B -Training loss and evaluation metrics can be automatically logged to W&B. Optional evaluation can be enabled using the DeepChem [ValidationCallback](https://github.com/deepchem/deepchem/blob/master/deepchem/models/callbacks.py), the `WandbLogger` will detect ValidationCallback callback and log the metrics generated. +With `wandb` installed and authenticated, you can now attach a `WandbLogger` to your DeepChem model so that training and evaluation data flow to W&B. + +Training loss and evaluation metrics log to W&B automatically. To enable optional evaluation, use the DeepChem [`ValidationCallback`](https://github.com/deepchem/deepchem/blob/master/deepchem/models/callbacks.py). The `WandbLogger` detects the `ValidationCallback` and logs the metrics it produces. @@ -105,3 +117,5 @@ logger.finish() ``` + +After `model.fit` runs, training loss and any evaluation metrics emitted by the `ValidationCallback` appear in your W&B project under the run created by `WandbLogger`. diff --git a/models/integrations/diffusers.mdx b/models/integrations/diffusers.mdx index 124e7e8188..0c408cf591 100644 --- a/models/integrations/diffusers.mdx +++ b/models/integrations/diffusers.mdx @@ -1,16 +1,17 @@ --- title: Hugging Face Diffusers description: "Use W&B autolog with Hugging Face Diffusers to track prompts, generated media, configs, and pipeline architecture." +keywords: ["stable diffusion", "text-to-image", "ControlNet", "LoRA training"] --- import { ColabLink } from '/snippets/_includes/colab-link.mdx'; -[Hugging Face Diffusers](https://huggingface.co/docs/diffusers/index) is the go-to library for state-of-the-art pre-trained diffusion models for generating images, audio, and even 3D structures of molecules. The W&B integration adds rich, flexible experiment tracking, media visualization, pipeline architecture, and configuration management to interactive centralized dashboards without compromising that ease of use. +[Hugging Face Diffusers](https://huggingface.co/docs/diffusers/index) is a library of pre-trained diffusion models for generating images, audio, and 3D structures of molecules. The W&B integration adds experiment tracking, media visualization, pipeline architecture tracking, and configuration management to interactive centralized dashboards. -## Next-level logging in just two lines +## Log experiments in two lines -Log all the prompts, negative prompts, generated media, and configs associated with your experiment by simply including 2 lines of code. Here are the 2 lines of code to begin logging: +To log all the prompts, negative prompts, generated media, and configs associated with your experiment, add the following two lines of code: ```python # import the autolog function @@ -30,7 +31,7 @@ autolog(init=dict(project="diffusers_logging")) - Command line: - ```shell + ```bash pip install --upgrade diffusers transformers accelerate wandb ``` @@ -41,26 +42,24 @@ autolog(init=dict(project="diffusers_logging")) ``` -2. Use `autolog` to initialize a W&B Run and automatically track the inputs and the outputs from [all supported pipeline calls](https://github.com/wandb/wandb/blob/main/wandb/integration/diffusers/autologger.py#L12-L72). - - You can call the `autolog()` function with the `init` parameter, which accepts a dictionary of parameters required by [`wandb.init()`](/models/ref/python/functions/init). - - When you call `autolog()`, it initializes a W&B Run and automatically tracks the inputs and the outputs from [all supported pipeline calls](https://github.com/wandb/wandb/blob/main/wandb/integration/diffusers/autologger.py#L12-L72). +2. Call `autolog()` with the `init` parameter, which accepts a dictionary of parameters required by [`wandb.init()`](/models/ref/python/functions/init). `autolog()` initializes a W&B run and automatically tracks the inputs and outputs from [all supported pipeline calls](https://github.com/wandb/wandb/blob/main/wandb/integration/diffusers/autologger.py#L12-L72): - - Each pipeline call is tracked into its own [table](/models/tables/) in the workspace, and the configs associated with the pipeline call is appended to the list of workflows in the configs for that run. - - The prompts, negative prompts, and the generated media are logged in a [`wandb.Table`](/models/tables/). - - All other configs associated with the experiment including seed and the pipeline architecture are stored in the config section for the run. + - Each pipeline call is tracked into its own [table](/models/tables/) in the workspace, and the configs associated with the pipeline call are appended to the list of workflows in the configs for that run. + - The prompts, negative prompts, and generated media are logged in a [`wandb.Table`](/models/tables/). + - All other configs associated with the experiment, including seed and pipeline architecture, are stored in the config section for the run. - The generated media for each pipeline call are also logged in [media panels](/models/track/log/media/) in the run. - You can find a [list of supported pipeline calls](https://github.com/wandb/wandb/blob/main/wandb/integration/diffusers/autologger.py#L12-L72). In case, you want to request a new feature of this integration or report a bug associated with it, open an issue on the [W&B GitHub issues page](https://github.com/wandb/wandb/issues). + Find a [list of supported pipeline calls](https://github.com/wandb/wandb/blob/main/wandb/integration/diffusers/autologger.py#L12-L72). To request a new feature of this integration or report a bug, open an issue on the [W&B GitHub issues page](https://github.com/wandb/wandb/issues). ## Examples -### Autologging +The following examples show `autolog` in typical diffusion workflows so you can adapt them to your own pipelines. -Here is a brief end-to-end example of the autolog in action: +### Autolog example + +The following is an end-to-end example of `autolog`: @@ -132,6 +131,7 @@ run.finish() +The following images show what gets logged to W&B: - The results of a single experiment: @@ -152,15 +152,15 @@ run.finish() -You need to explicitly call [`wandb.Run.finish()`](/models/ref/python/functions/finish) when executing the code in IPython notebook environments after calling the pipeline. This is not necessary when executing python scripts. +You must explicitly call [`wandb.Run.finish()`](/models/ref/python/functions/finish) when you run the code in IPython notebook environments after calling the pipeline. This is not necessary when you run Python scripts. -### Tracking multi-pipeline workflows +### Track multi-pipeline workflows -This section demonstrates the autolog with a typical [Stable Diffusion XL + Refiner](https://huggingface.co/docs/diffusers/using-diffusers/sdxl#base-to-refiner-model) workflow, in which the latents generated by the [`StableDiffusionXLPipeline`](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/stable_diffusion_xl) is refined by the corresponding refiner. +The following example demonstrates `autolog` with a typical [Stable Diffusion XL + Refiner](https://huggingface.co/docs/diffusers/using-diffusers/sdxl#base-to-refiner-model) workflow, in which the refiner refines the latents generated by the [`StableDiffusionXLPipeline`](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/stable_diffusion_xl). - + ```python import torch from diffusers import StableDiffusionXLImg2ImgPipeline, StableDiffusionXLPipeline @@ -285,10 +285,11 @@ run.finish() -- Example of a Stable Diffisuion XL + Refiner experiment: - - Stable Diffusion XL experiment tracking - +The following image shows an example of a Stable Diffusion XL + Refiner experiment: + + +Stable Diffusion XL experiment tracking + ## More resources diff --git a/models/integrations/docker.mdx b/models/integrations/docker.mdx index e382429670..e81b013e9d 100644 --- a/models/integrations/docker.mdx +++ b/models/integrations/docker.mdx @@ -1,24 +1,29 @@ --- description: "Run W&B inside Docker containers by configuring API keys, environment variables, and local file storage." title: Docker +keywords: ["WANDB_API_KEY", "container image", "volume mount"] --- ## Docker integration -W&B can store a pointer to the Docker image that your code ran in, giving you the ability to restore a previous experiment to the exact environment it was run in. The wandb library looks for the **WANDB_DOCKER** environment variable to persist this state. We provide a few helpers that automatically set this state. +W&B can store a pointer to the Docker image that your code ran in, letting you restore a previous experiment to the exact environment it ran in. The wandb library looks for the `WANDB_DOCKER` environment variable to persist this state. W&B provides a few helpers that automatically set this state. + +The following sections describe how to set the `WANDB_DOCKER` environment variable in different environments, from local development through Kubernetes-based training. ### Local development -`wandb docker` is a command that starts a docker container, passes in wandb environment variables, mounts your code, and ensures wandb is installed. By default the command uses a docker image with TensorFlow, PyTorch, Keras, and Jupyter installed. You can use the same command to start your own docker image: `wandb docker my/image:latest`. The command mounts the current directory into the "/app" directory of the container, you can change this with the "--dir" flag. +`wandb docker` is a command that starts a Docker container, passes in wandb environment variables, mounts your code, and ensures wandb is installed. By default, the command uses a Docker image with TensorFlow, PyTorch, Keras, and Jupyter installed. You can use the same command to start your own Docker image: `wandb docker my/image:latest`. The command mounts the current directory into the `/app` directory of the container. You can change this with the `--dir` flag. ### Production -The `wandb docker-run` command is provided for production workloads. It's meant to be a drop in replacement for `nvidia-docker`. It's a simple wrapper to the `docker run` command that adds your credentials and the **WANDB_DOCKER** environment variable to the call. If you do not pass the "--runtime" flag and `nvidia-docker` is available on the machine, this also ensures the runtime is set to nvidia. +The `wandb docker-run` command is provided for production workloads. It's a drop-in replacement for `nvidia-docker` that wraps the `docker run` command and adds your credentials and the `WANDB_DOCKER` environment variable to the call. If you don't pass the `--runtime` flag and `nvidia-docker` is available on the machine, this also ensures the runtime is set to nvidia. ### Kubernetes -If you run your training workloads in Kubernetes and the k8s API is exposed to your pod \(which is the case by default\). wandb will query the API for the digest of the docker image and automatically set the **WANDB_DOCKER** environment variable. +If you run your training workloads in Kubernetes and the Kubernetes API is exposed to your pod \(which is the case by default\), wandb queries the API for the digest of the Docker image and automatically sets the `WANDB_DOCKER` environment variable. + +## Restore the training environment -## Restoring +Once the `WANDB_DOCKER` environment variable is set during a run, you can use it to reproduce the original training environment later. -If a run was instrumented with the **WANDB_DOCKER** environment variable, calling `wandb restore username/project:run_id` will checkout a new branch restoring your code then launch the exact docker image used for training pre-populated with the original command. \ No newline at end of file +If a run was instrumented with the `WANDB_DOCKER` environment variable, calling `wandb restore username/project:run_id` checks out a new branch restoring your code, then launches the exact Docker image used for training pre-populated with the original command. \ No newline at end of file diff --git a/models/integrations/dspy.mdx b/models/integrations/dspy.mdx index 4b9d166603..c2f1279dcc 100644 --- a/models/integrations/dspy.mdx +++ b/models/integrations/dspy.mdx @@ -1,16 +1,18 @@ --- description: "Track and optimize DSPy programs with W&B to log prompts, evaluations, and compiled module performance." title: DSPy +keywords: ["BootstrapFewShot", "MIPRO optimizer", "signature tuning"] --- +This guide shows how to use W&B with DSPy to track and optimize your language model programs, so you can monitor evaluation metrics, inspect how program signatures evolve during optimization, and version the resulting programs as reproducible artifacts. It's intended for DSPy users who want experiment tracking and observability for their compiled modules. -Use W&B with DSPy to track and optimize your language model programs. W&B complements the [Weave DSPy integration](/weave/guides/integrations/dspy) by providing: +W&B complements the [Weave DSPy integration](/weave/guides/integrations/dspy) by providing: - Evaluation metrics tracking over time - W&B Tables for program signature evolution - Integration with DSPy optimizers like MIPROv2 -For comprehensive observability when optimizing DSPy modules, enable the integration in both W&B and Weave. +For full observability when optimizing DSPy modules, enable the integration in both W&B and Weave. **Note** @@ -28,17 +30,17 @@ No explicit `weave.init(...)` call is required. Install the required libraries and authenticate with W&B: - + 1. Install the required libraries: ```shell pip install wandb weave dspy ``` -1. Set the `WANDB_API_KEY` [environment variable](/models/track/environment-variables/) and log in: +1. Set the `WANDB_API_KEY` [environment variable](/models/track/environment-variables/) and log in. Replace `[YOUR-API-KEY]` with your W&B API key: ```bash - export WANDB_API_KEY= + export WANDB_API_KEY=[YOUR-API-KEY] wandb login ``` @@ -66,13 +68,13 @@ wandb.login() -New to W&B? See our [quickstart guide](/models/quickstart/). +New to W&B? See the [Quickstart](/models/quickstart/). +With the libraries installed and authentication in place, you're ready to instrument a DSPy optimization run. ## Track program optimization (experimental) - -For DSPy optimizers that use `dspy.Evaluate` (such as MIPROv2), use the `WandbDSPyCallback` to log evaluation metrics over time and track program signature evolution in W&B Tables. +For DSPy optimizers that use `dspy.Evaluate` (such as MIPROv2), use the `WandbDSPyCallback` to log evaluation metrics over time and track program signature evolution in W&B Tables. Attaching the callback lets you observe how the optimizer's score changes and how the program's prompts and signatures evolve across iterations. ```python import dspy @@ -118,17 +120,17 @@ with wandb.init(project=project_name) as run: After running this code, you receive both a W&B Run URL and a Weave URL. W&B displays evaluation metrics over time, along with Tables that show the evolution of program signatures. The run's **Overview** tab includes links to Weave traces for detailed inspection. -If a `run` object is not passed to `WandbDSPyCallback`, the global `run` object is used. +If you don't pass a `run` object to `WandbDSPyCallback`, the callback uses the global `run` object. - DSPy optimization run in W&B + DSPy optimization run in W&B -For comprehensive details about Weave tracing, evaluation, and optimization with DSPy, see the [Weave DSPy integration guide](/weave/guides/integrations/dspy). +For details about Weave tracing, evaluation, and optimization with DSPy, see the [Weave DSPy integration guide](/weave/guides/integrations/dspy). ## Log predictions to W&B Tables -Enable detailed prediction logging to inspect individual examples during optimization. The callback creates a W&B Tables for each evaluation step, which can help you to analyze specific successes and failures. +In addition to aggregate metrics, you can enable detailed prediction logging to inspect individual examples during optimization. The callback creates a W&B Table for each evaluation step, which helps you analyze specific successes and failures. ```python from wandb.integration.dspy import WandbDSPyCallback @@ -149,7 +151,7 @@ optimized_program = optimizer.compile(program, trainset=train_data) After optimization, find your prediction data in W&B: 1. Navigate to your run's **Overview** page. -2. Look for Table panels named with a pattern like `predictions_0`, `predictions_1`, and so forth. +2. Look for Table panels named with a pattern like `predictions_0` or `predictions_1`. 3. Filter by `is_correct` to analyze failures. 4. Compare tables across runs in the project workspace. @@ -162,7 +164,7 @@ Learn more in the [W&B Tables guide](/models/tables/visualize-tables/). ## Save and version DSPy programs -To reproduce and version your best DSpy programs, save them as W&B Artifacts. Choose between saving the complete program or only the state. +Once you've identified a high-performing optimized program, save it as a W&B Artifact so you can reproduce results and track versions over time. Choose between saving the complete program or only the state, depending on whether you need the full architecture or a lighter-weight checkpoint. ```python from wandb.integration.dspy import WandbDSPyCallback diff --git a/models/integrations/farama-gymnasium.mdx b/models/integrations/farama-gymnasium.mdx index c7bc990856..e953054111 100644 --- a/models/integrations/farama-gymnasium.mdx +++ b/models/integrations/farama-gymnasium.mdx @@ -1,14 +1,15 @@ --- description: "Integrate W&B with Farama Gymnasium to track reinforcement learning experiments and record episode videos." title: Farama Gymnasium +keywords: ["RL environment", "monitor_gym", "Atari wrapper"] --- -If you're using [Farama Gymnasium](https://gymnasium.farama.org/#) we will automatically log videos of your environment generated by `gymnasium.wrappers.Monitor`. Just set the `monitor_gym` keyword argument to [`wandb.init()`](/models/ref/python/functions/init) to `True`. +If you're using [Farama Gymnasium](https://gymnasium.farama.org/#), W&B automatically logs videos of your environment generated by `gymnasium.wrappers.Monitor`. To enable video logging, set the `monitor_gym` keyword argument to [`wandb.init()`](/models/ref/python/functions/init) to `True`. -Our gymnasium integration is very light. We simply [look at the name of the video file](https://github.com/wandb/wandb/blob/c5fe3d56b155655980611d32ef09df35cd336872/wandb/integration/gym/__init__.py#LL69C67-L69C67) being logged from `gymnasium` and name it after that or fall back to `"videos"` if we don't find a match. If you want more control, you can always just manually [log a video](/models/track/log/media/). +The Gymnasium integration is lightweight. W&B [reads the name of the video file](https://github.com/wandb/wandb/blob/c5fe3d56b155655980611d32ef09df35cd336872/wandb/integration/gym/__init__.py#LL69C67-L69C67) logged from `gymnasium` and uses that name. If no match is found, the integration falls back to `"videos"`. For more control, you can manually [log a video](/models/track/log/media/). -Check out this [report](https://wandb.ai/raph-test/cleanrltest/reports/Mario-Bros-but-with-AI-Gymnasium-and-CleanRL---Vmlldzo0NTcxNTcw) to learn more on how to use Gymnasium with the CleanRL library. +For more information about using Gymnasium with the CleanRL library, see the [Mario Bros, but with AI: Gymnasium and CleanRL](https://wandb.ai/raph-test/cleanrltest/reports/Mario-Bros-but-with-AI-Gymnasium-and-CleanRL---Vmlldzo0NTcxNTcw) report. - Gymnasium RL environment + Gymnasium RL environment \ No newline at end of file diff --git a/models/integrations/fastai.mdx b/models/integrations/fastai.mdx index 4653b85822..b8ab7489eb 100644 --- a/models/integrations/fastai.mdx +++ b/models/integrations/fastai.mdx @@ -1,11 +1,12 @@ --- title: fastai description: "Integrate W&B with fastai using the WandbCallback to track experiments, log metrics, and visualize model performance." +keywords: ["Learner callback", "vision learner", "tabular learner"] --- import ApiKeyCreateStreamlined from "/snippets/_includes/api-key-create-streamlined.mdx"; -You can integrate **fastai** with W&B using the `WandbCallback` class. Check out these [interactive docs with examples](https://app.wandb.ai/borisd13/demo_config/reports/Visualize-track-compare-Fastai-models--Vmlldzo4MzAyNA) for more details. +You can integrate **fastai** with W&B using the `WandbCallback` class to track experiments, log metrics, and visualize model performance during training. This page shows how to set up authentication, add the callback to your training loop, and configure logging for both single-process and distributed training. Check out these [interactive docs with examples](https://app.wandb.ai/borisd13/demo_config/reports/Visualize-track-compare-Fastai-models--Vmlldzo4MzAyNA) for more details. ## Sign up and create an API key @@ -25,7 +26,7 @@ To install the `wandb` library locally and log in: 1. Set the `WANDB_API_KEY` [environment variable](/models/track/environment-variables/) to your API key. ```bash - export WANDB_API_KEY= + export WANDB_API_KEY=[YOUR-API-KEY] ``` 1. Install the `wandb` library and log in. @@ -48,7 +49,7 @@ wandb.login() ``` -```notebook +```python !pip install wandb import wandb @@ -59,6 +60,8 @@ wandb.login() ## Add the `WandbCallback` to the `learner` or `fit` method +To start logging your fastai training runs to W&B, attach the `WandbCallback` to either a single `fit` call or the `learner` itself. + ```python import wandb from fastai.callback.wandb import * @@ -74,36 +77,36 @@ learn = learner(..., cbs=WandbCallback()) ``` -If you use version 1 of Fastai, refer to the [Fastai v1 docs](/models/integrations/fastai/v1/). +If you use version 1 of fastai, refer to the [fastai v1 docs](/models/integrations/fastai/v1/). -## WandbCallback Arguments +## WandbCallback arguments -`WandbCallback` accepts the following arguments: +Use the following arguments to control what `WandbCallback` logs during training: | Args | Description | | ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| log | Whether to log the model's: `gradients` , `parameters`, `all` or `None` (default). Losses & metrics are always logged. | -| log_preds | whether we want to log prediction samples (default to `True`). | -| log_preds_every_epoch | whether to log predictions every epoch or at the end (default to `False`) | -| log_model | whether we want to log our model (default to False). This also requires `SaveModelCallback` | -| model_name | The name of the `file` to save, overrides `SaveModelCallback` | -| log_dataset |
  • False (default)
  • True will log folder referenced by learn.dls.path.
  • a path can be defined explicitly to reference which folder to log.

Note: subfolder "models" is always ignored.

| -| dataset_name | name of logged dataset (default to `folder name`). | -| valid_dl | `DataLoaders` containing items used for prediction samples (default to random items from `learn.dls.valid`. | -| n_preds | number of logged predictions (default to 36). | -| seed | used for defining random samples. | +| `log` | Whether to log the model's: `gradients`, `parameters`, `all`, or `None` (default). Losses and metrics are always logged. | +| `log_preds` | Whether to log prediction samples (default to `True`). | +| `log_preds_every_epoch` | Whether to log predictions every epoch or at the end (default to `False`). | +| `log_model` | Whether to log the model (default to `False`). This also requires `SaveModelCallback`. | +| `model_name` | The name of the `file` to save, overrides `SaveModelCallback`. | +| `log_dataset` |
  • False (default).
  • True logs the folder referenced by `learn.dls.path`.
  • A path can be defined explicitly to reference which folder to log.

Note: subfolder "models" is always ignored.

| +| `dataset_name` | Name of the logged dataset (default to `folder name`). | +| `valid_dl` | `DataLoaders` containing items used for prediction samples (default to random items from `learn.dls.valid`). | +| `n_preds` | Number of logged predictions (default to 36). | +| `seed` | Used for defining random samples. | For custom workflows, you can manually log your datasets and models: * `log_dataset(path, name=None, metadata={})` * `log_model(path, name=None, metadata={})` -_Note: any subfolder "models" will be ignored._ +_Note: any subfolder "models" is ignored._ ## Distributed training -`fastai` supports distributed training by using the context manager `distrib_ctx`. W&B supports this automatically and enables you to track your Multi-GPU experiments out of the box. +`fastai` supports distributed training by using the context manager `distrib_ctx`. W&B supports this automatically and enables you to track your multi-GPU experiments without additional configuration. The following sections describe how to integrate W&B with distributed training and how to limit logging to the main process. Review this minimal example: @@ -136,13 +139,13 @@ if __name__ == "__main__": train() ``` -Then, in your terminal you will execute: +Then, in your terminal, execute: ```shell -$ torchrun --nproc_per_node 2 train.py +torchrun --nproc_per_node 2 train.py ``` -in this case, the machine has 2 GPUs. +In this case, the machine has 2 GPUs.
You can now run distributed training directly inside a notebook. @@ -179,7 +182,7 @@ notebook_launcher(train, num_processes=2) ### Log only on the main process -In the examples above, `wandb` launches one run per process. At the end of the training, you will end up with two runs. This can sometimes be confusing, and you may want to log only on the main process. To do so, you will have to detect in which process you are manually and avoid creating runs (calling `wandb.init()` in all other processes) +In the preceding examples, `wandb` launches one run per process. At the end of the training, you have two runs. This can sometimes be confusing, and you may want to log only on the main process. To do so, you must manually detect which process you are in and avoid creating runs (calling `wandb.init()` in all other processes). @@ -211,10 +214,10 @@ def train(): if __name__ == "__main__": train() ``` -in your terminal call: +In your terminal, call: -``` -$ torchrun --nproc_per_node 2 train.py +```shell +torchrun --nproc_per_node 2 train.py ``` @@ -252,5 +255,7 @@ notebook_launcher(train, num_processes=2) ## Examples -* [Visualize, track, and compare Fastai models](https://app.wandb.ai/borisd13/demo_config/reports/Visualize-track-compare-Fastai-models--Vmlldzo4MzAyNA): A thoroughly documented walkthrough. -* [Image Segmentation on CamVid](https://colab.research.google.com/drive/1IWrhwcJoncCKHm6VXsNwOr9Yukhz3B49?usp=sharing): A sample use case of the integration. +For end-to-end demonstrations of the fastai integration, see the following references: + +* [Visualize, track, and compare fastai models](https://app.wandb.ai/borisd13/demo_config/reports/Visualize-track-compare-Fastai-models--Vmlldzo4MzAyNA): A documented walkthrough. +* [Image segmentation on CamVid](https://colab.research.google.com/drive/1IWrhwcJoncCKHm6VXsNwOr9Yukhz3B49?usp=sharing): A sample use case of the integration. diff --git a/models/integrations/fastai/v1.mdx b/models/integrations/fastai/v1.mdx index 51b3182745..b80b783e1f 100644 --- a/models/integrations/fastai/v1.mdx +++ b/models/integrations/fastai/v1.mdx @@ -1,14 +1,15 @@ --- title: fastai v1 description: "Use the W&B callback with fastai v1 to log model topology, losses, metrics, weights, gradients, and predictions." +keywords: ["fastai 1.x", "basic_train", "cnn_learner"] --- This documentation is for fastai v1. -If you use the current version of fastai, you should refer to [fastai page](../). +If you use the current version of fastai, see the [fastai page](../). -For scripts using fastai v1, we have a callback that can automatically log model topology, losses, metrics, weights, gradients, sample predictions and best trained model. +For fastai v1 scripts, W&B provides a callback that automatically logs model topology, losses, metrics, weights, gradients, sample predictions, and the best trained model. ```python import wandb @@ -20,7 +21,7 @@ learn = cnn_learner(data, model, callback_fns=WandbCallback) learn.fit(epochs) ``` -Requested logged data is configurable through the callback constructor. +You can configure what data to log through the callback constructor. ```python from functools import partial @@ -30,13 +31,13 @@ learn = cnn_learner( ) ``` -It is also possible to use WandbCallback only when starting training. In this case it must be instantiated. +You can also use `WandbCallback` only when starting training. In this case, you must instantiate it. ```python learn.fit(epochs, callbacks=WandbCallback(learn)) ``` -Custom parameters can also be given at that stage. +You can also pass custom parameters at that stage. ```python learn.fit(epochs, callbacks=WandbCallback(learn, input_type="images")) @@ -44,25 +45,23 @@ learn.fit(epochs, callbacks=WandbCallback(learn, input_type="images")) ## Example code -We've created a few examples for you to see how the integration works: +The following examples show how the integration works: -**Fastai v1** - -* [Classify Simpsons characters](https://github.com/borisdayma/simpsons-fastai)[: ](https://app.wandb.ai/jxmorris12/huggingface-demo/reports/A-Step-by-Step-Guide-to-Tracking-Hugging-Face-Model-Performance--VmlldzoxMDE2MTU)A simple demo to track and compare Fastai models -* [Semantic Segmentation with Fastai](https://github.com/borisdayma/semantic-segmentation): Optimize neural networks on self-driving cars +* [Classify Simpsons characters](https://github.com/borisdayma/simpsons-fastai): A demo to track and compare fastai models. See the [step-by-step guide](https://app.wandb.ai/jxmorris12/huggingface-demo/reports/A-Step-by-Step-Guide-to-Tracking-Hugging-Face-Model-Performance--VmlldzoxMDE2MTU). +* [Semantic segmentation with fastai](https://github.com/borisdayma/semantic-segmentation): Optimize neural networks on self-driving cars. ## Options -`WandbCallback()` class supports a number of options: - -| Keyword argument | Default | Description | -| ---------------- | --------- | -------------------------------------------------------------------------------------------------------- | -| learn | N/A | the fast.ai learner to hook. | -| save_model | True | save the model if it's improved at each step. It will also load best model at the end of training. | -| mode | auto | `min`, `max`, or `auto`: How to compare the training metric specified in `monitor` between steps. | -| monitor | None | training metric used to measure performance for saving the best model. None defaults to validation loss. | -| log | gradients | `gradients`, `parameters`, `all`, or None. Losses & metrics are always logged. | -| input_type | None | `images` or `None`. Used to display sample predictions. | -| validation_data | None | data used for sample predictions if `input_type` is set. | -| predictions | 36 | number of predictions to make if `input_type` is set and `validation_data` is `None`. | -| seed | 12345 | initialize random generator for sample predictions if `input_type` is set and `validation_data` is `None`. | \ No newline at end of file +The `WandbCallback()` class supports several options: + +| Keyword argument | Default | Description | +| ---------------- | ----------- | -------------------------------------------------------------------------------------------------------- | +| `learn` | N/A | The fast.ai learner to hook. | +| `save_model` | `True` | Save the model if it's improved at each step. It also loads the best model at the end of training. | +| `mode` | `auto` | `min`, `max`, or `auto`. How to compare the training metric specified in `monitor` between steps. | +| `monitor` | `None` | Training metric used to measure performance for saving the best model. `None` defaults to validation loss. | +| `log` | `gradients` | `gradients`, `parameters`, `all`, or `None`. Losses and metrics are always logged. | +| `input_type` | `None` | `images` or `None`. Used to display sample predictions. | +| `validation_data` | `None` | Data used for sample predictions if `input_type` is set. | +| `predictions` | `36` | Number of predictions to make if `input_type` is set and `validation_data` is `None`. | +| `seed` | `12345` | Initialize random generator for sample predictions if `input_type` is set and `validation_data` is `None`. | \ No newline at end of file diff --git a/models/integrations/huggingface.mdx b/models/integrations/huggingface.mdx index cf697c2cc3..4f8db6f0c6 100644 --- a/models/integrations/huggingface.mdx +++ b/models/integrations/huggingface.mdx @@ -1,43 +1,43 @@ --- title: Hugging Face description: "Visualize and track Hugging Face model performance with W&B, logging hyperparameters, metrics, and GPU utilization." +keywords: ["model hub", "HF datasets", "transformers logging"] --- import { ColabLink } from '/snippets/_includes/colab-link.mdx'; -Visualize your [Hugging Face](https://github.com/huggingface/transformers) model's performance quickly with a seamless [W&B](https://wandb.ai/site) integration. +This tutorial shows you how to use the W&B integration with [Hugging Face Transformers](https://github.com/huggingface/transformers) to automatically track training and evaluation metrics, hyperparameters, and system stats while fine-tuning a model. By following this tutorial, you learn how to visualize your model's performance through the [W&B](https://wandb.ai/site) dashboard so you can compare experiments and iterate on your models with confidence. -Compare hyperparameters, output metrics, and system stats like GPU utilization across your models. +You can compare hyperparameters, output metrics, and system stats like GPU utilization across your models. -## Why should I use W&B? +## Why use W&B Benefits of using W&B -- **Unified dashboard**: Central repository for all your model metrics and predictions -- **Lightweight**: No code changes required to integrate with Hugging Face -- **Accessible**: Free for individuals and academic teams -- **Secure**: All projects are private by default -- **Trusted**: Used by machine learning teams at OpenAI, Toyota, Lyft and more +- **Unified dashboard**: Central repository for all your model metrics and predictions. +- **Lightweight**: No code changes required to integrate with Hugging Face. +- **Accessible**: Free for individuals and academic teams. +- **Secure**: All projects are private by default. +- **Trusted**: Used by machine learning teams at OpenAI, Toyota, Lyft, and more. -Think of W&B like GitHub for machine learning models— save machine learning experiments to your private, hosted dashboard. Experiment quickly with the confidence that all the versions of your models are saved for you, no matter where you're running your scripts. +W&B works like GitHub for machine learning models. Save machine learning experiments to your private, hosted dashboard. Experiment with the confidence that all versions of your models are saved for you, no matter where you run your scripts. -W&B lightweight integrations works with any Python script, and all you need to do is sign up for a free W&B account to start tracking and visualizing your models. +W&B lightweight integrations work with any Python script. Sign up for a free W&B account to start tracking and visualizing your models. -In the Hugging Face Transformers repo, we've instrumented the Trainer to automatically log training and evaluation metrics to W&B at each logging step. +In the Hugging Face Transformers repository, W&B has instrumented the Trainer to automatically log training and evaluation metrics to W&B at each logging step. -Here's an in depth look at how the integration works: [Hugging Face + W&B Report](https://app.wandb.ai/jxmorris12/huggingface-demo/reports/Train-a-model-with-Hugging-Face-and-Weights-%26-Biases--VmlldzoxMDE2MTU). +Here's an in-depth look at how the integration works: [Hugging Face + W&B Report](https://app.wandb.ai/jxmorris12/huggingface-demo/reports/Train-a-model-with-Hugging-Face-and-Weights-%26-Biases--VmlldzoxMDE2MTU). ## Install, import, and log in +This section sets up the environment you need to run the tutorial. Install the Hugging Face and W&B libraries, and download the GLUE dataset and training script for this tutorial: - -Install the Hugging Face and W&B libraries, and the GLUE dataset and training script for this tutorial. -- [Hugging Face Transformers](https://github.com/huggingface/transformers): Natural language models and datasets -- [W&B](/): Experiment tracking and visualization -- [GLUE dataset](https://gluebenchmark.com/): A language understanding benchmark dataset -- [GLUE script](https://raw.githubusercontent.com/huggingface/transformers/refs/heads/main/examples/pytorch/text-classification/run_glue.py): Model training script for sequence classification +- [Hugging Face Transformers](https://github.com/huggingface/transformers): Natural language models and datasets. +- [W&B](/): Experiment tracking and visualization. +- [GLUE dataset](https://gluebenchmark.com/): A language understanding benchmark dataset. +- [GLUE script](https://raw.githubusercontent.com/huggingface/transformers/refs/heads/main/examples/pytorch/text-classification/run_glue.py): Model training script for sequence classification. ```notebook @@ -51,11 +51,11 @@ Install the Hugging Face and W&B libraries, and the GLUE dataset and training sc !pip install -q git+https://github.com/huggingface/transformers ``` -Before continuing, [sign up for a free account](https://app.wandb.ai/login?signup=true). +Before continuing, you must [sign up for a free account](https://app.wandb.ai/login?signup=true). An account is required to send your run data to a W&B dashboard. -## Put in your API key +## Add your API key -Once you've signed up, run the next cell and click on the link to get your API key and authenticate this notebook. +Authenticating with your API key links this notebook to your W&B account so that runs are logged to your projects. After you sign up, run the next cell and click the link to get your API key and authenticate this notebook. ```python @@ -63,7 +63,7 @@ import wandb wandb.login() ``` -Optionally, we can set environment variables to customize W&B logging. See the [Hugging Face integration guide](/models/integrations/huggingface/). +Optionally, you can set environment variables to customize what W&B logs during training. For example, you can log both gradients and parameters by setting `WANDB_WATCH=all`. See the [Hugging Face integration guide](/models/integrations/huggingface/) for the full list of options. ```python @@ -72,7 +72,8 @@ Optionally, we can set environment variables to customize W&B logging. See the [ ``` ## Train the model -Next, call the downloaded training script [run_glue.py](https://huggingface.co/transformers/examples.html#glue) and see training automatically get tracked to the W&B dashboard. This script fine-tunes BERT on the Microsoft Research Paraphrase Corpus— pairs of sentences with human annotations indicating whether they are semantically equivalent. + +With the environment configured and authentication complete, you're ready to start a training run. Call the downloaded training script [`run_glue.py`](https://huggingface.co/transformers/examples.html#glue) and see training automatically get tracked to the W&B dashboard. This script fine-tunes BERT on the Microsoft Research Paraphrase Corpus (pairs of sentences with human annotations indicating whether they're semantically equivalent). ```python @@ -93,29 +94,35 @@ Next, call the downloaded training script [run_glue.py](https://huggingface.co/t --logging_steps 50 ``` -## Visualize results in dashboard -Click the link printed out above, or go to [wandb.ai](https://app.wandb.ai) to see your results stream in live. The link to see your run in the browser will appear after all the dependencies are loaded. Look for the following output: "**wandb**: View run at [URL to your unique run]" +## Visualize results in the dashboard + +After training starts, you can monitor metrics in real time. Click the link printed out by the preceding cell, or go to [wandb.ai](https://app.wandb.ai) to see your results stream in live. The link to see your run in the browser appears after all the dependencies are loaded. Look for the following output: "**wandb**: View run at [URL to your unique run]" + +### Visualize model performance -**Visualize Model Performance** -It's easy to look across dozens of experiments, zoom in on interesting findings, and visualize highly dimensional data. +Look across experiments, zoom in on findings, and visualize high-dimensional data. Model metrics dashboard -**Compare Architectures** -Here's an example comparing [BERT vs DistilBERT](https://app.wandb.ai/jack-morris/david-vs-goliath/reports/Does-model-size-matter%3F-Comparing-BERT-and-DistilBERT-using-Sweeps--VmlldzoxMDUxNzU). It's easy to see how different architectures effect the evaluation accuracy throughout training with automatic line plot visualizations. +### Compare architectures + +Here's an example comparing [BERT versus DistilBERT](https://app.wandb.ai/jack-morris/david-vs-goliath/reports/Does-model-size-matter%3F-Comparing-BERT-and-DistilBERT-using-Sweeps--VmlldzoxMDUxNzU). The automatic line plot visualizations show how different architectures affect the evaluation accuracy throughout training. - BERT vs DistilBERT comparison + BERT versus DistilBERT comparison -## Track key information effortlessly by default -W&B saves a new run for each experiment. Here's the information that gets saved by default: -- **Hyperparameters**: Settings for your model are saved in Config -- **Model Metrics**: Time series data of metrics streaming in are saved in Log -- **Terminal Logs**: Command line outputs are saved and available in a tab -- **System Metrics**: GPU and CPU utilization, memory, temperature etc. +## Track key information by default + +This section describes what W&B captures automatically so you know what data is available in your dashboard without additional configuration. W&B saves a new run for each experiment. Here's the information saved by default: + +- **Hyperparameters**: Settings for your model are saved in Config. +- **Model metrics**: Time series data of metrics streaming in are saved in Log. +- **Terminal logs**: Command line outputs are saved and available in a tab. +- **System metrics**: GPU and CPU utilization, memory, and temperature. ## Learn more + - [Video walkthroughs on YouTube](http://wandb.me/youtube) diff --git a/models/integrations/huggingface_transformers.mdx b/models/integrations/huggingface_transformers.mdx index 33c8ab5181..8bd25e6c20 100644 --- a/models/integrations/huggingface_transformers.mdx +++ b/models/integrations/huggingface_transformers.mdx @@ -1,6 +1,7 @@ --- title: Hugging Face Transformers description: "Use W&B with Hugging Face Transformers Trainer for experiment tracking, model checkpointing, and dataset versioning." +keywords: ["TrainingArguments report_to", "Trainer callback", "HF Trainer wandb"] --- import { ColabLink } from '/snippets/_includes/colab-link.mdx'; @@ -8,12 +9,14 @@ import ApiKeyCreateStreamlined from "/snippets/_includes/api-key-create-streamli -The [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) library makes state-of-the-art NLP models like BERT and training techniques like mixed precision and gradient checkpointing easy to use. The [W&B integration](https://huggingface.co/transformers/main_classes/callback.html#transformers.integrations.WandbCallback) adds rich, flexible experiment tracking and model versioning to interactive centralized dashboards without compromising that ease of use. +The [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) library makes NLP models like BERT and training techniques like mixed precision and gradient checkpointing easy to use. The [W&B integration](https://huggingface.co/transformers/main_classes/callback.html#transformers.integrations.WandbCallback) adds experiment tracking and model versioning to centralized dashboards. -## Next-level logging in few lines +This guide shows you how to connect the Hugging Face `Trainer` to W&B. Your training runs then automatically log metrics, model checkpoints, and evaluation outputs to a centralized dashboard. By the end, you'll be able to compare runs, save and reload model checkpoints from W&B Artifacts, and customize logging for your own workflows. This guide assumes you're already familiar with training models using the Hugging Face Transformers `Trainer`. + +## Quick start ```python -os.environ["WANDB_PROJECT"] = "" # name your W&B project +os.environ["WANDB_PROJECT"] = "[MY-PROJECT-NAME]" # name your W&B project os.environ["WANDB_LOG_MODEL"] = "checkpoint" # log all model checkpoints from transformers import TrainingArguments, Trainer @@ -22,7 +25,7 @@ args = TrainingArguments(..., report_to="wandb") # turn on W&B logging trainer = Trainer(..., args=args) ``` - HuggingFace dashboard + Hugging Face dashboard @@ -31,6 +34,8 @@ If you'd rather dive straight into working code, check out this [Google Colab](h ## Get started: track experiments +This section walks you through authenticating to W&B, installing the client library, naming your project, and turning on logging in your `Trainer` so that your first training run shows up in the W&B Dashboard. + ### Sign up and create an API key An API key authenticates your machine to W&B. You can generate an API key from your user profile. @@ -49,14 +54,14 @@ To install the `wandb` library locally and log in: 1. Set the `WANDB_API_KEY` [environment variable](/models/track/environment-variables/) to your API key. ```bash - export WANDB_API_KEY= + export WANDB_API_KEY=[YOUR-API-KEY] ``` 1. Install the `wandb` library and log in. - ```shell + ```bash pip install wandb wandb login @@ -81,14 +86,14 @@ wandb.login() -If you are using W&B for the first time you might want to check out our [quickstart](/models/quickstart/) +If you're using W&B for the first time, check out the [quickstart](/models/quickstart/). ### Name the project -A W&B Project is where all of the charts, data, and models logged from related runs are stored. Naming your project helps you organize your work and keep all the information about a single project in one place. +A W&B Project stores all of the charts, data, and models logged from related runs. Naming your project helps you organize your work and keep all the information about a single project in one place. -To add a run to a project simply set the `WANDB_PROJECT` environment variable to the name of your project. The `WandbCallback` will pick up this project name environment variable and use it when setting up your run. +To add a run to a project, set the `WANDB_PROJECT` environment variable to the name of your project. The `WandbCallback` picks up this project name environment variable and uses it when setting up your run. @@ -113,15 +118,15 @@ os.environ["WANDB_PROJECT"]="amazon_sentiment_analysis" Make sure you set the project name _before_ you initialize the `Trainer`. -If a project name is not specified the project name defaults to `huggingface`. +If you don't specify a project name, the project name defaults to `huggingface`. ### Log your training runs to W&B -This is **the most important step** when defining your `Trainer` training arguments, either inside your code or from the command line, is to set `report_to` to `"wandb"` in order enable logging with W&B. +When you define your `Trainer` training arguments, either inside your code or from the command line, set `report_to` to `"wandb"` to enable logging with W&B. Without this setting, the `Trainer` doesn't send any data to W&B. -The `logging_steps` argument in `TrainingArguments` will control how often training metrics are pushed to W&B during training. You can also give a name to the training run in W&B using the `run_name` argument. +The `logging_steps` argument in `TrainingArguments` controls how often training metrics are pushed to W&B during training. You can also give a name to the training run in W&B using the `run_name` argument. -That's it. Now your models will log losses, evaluation metrics, model topology, and gradients to W&B while they train. +That's it. Your models now log losses, evaluation metrics, model topology, and gradients to W&B while they train. @@ -154,19 +159,20 @@ trainer.train() # start training and logging to W&B -Using TensorFlow? Just swap the PyTorch `Trainer` for the TensorFlow `TFTrainer`. +Using TensorFlow? Swap the PyTorch `Trainer` for the TensorFlow `TFTrainer`. ### Turn on model checkpointing +In addition to logging metrics, you can save the trained model weights themselves to W&B so they can be versioned, downloaded, and shared across your team. -Using [Artifacts](/models/artifacts/), you can store up to 100GB of models and datasets for free and then use the W&B [Registry](/models/registry/). Using Registry, you can register models to explore and evaluate them, prepare them for staging, or deploy them in your production environment. +With [Artifacts](/models/artifacts/), you can store up to 100 GB of models and datasets for free and then use the W&B [Registry](/models/registry/). With Registry, you can register models to explore and evaluate them, prepare them for staging, or deploy them in your production environment. To log your Hugging Face model checkpoints to Artifacts, set the `WANDB_LOG_MODEL` environment variable to _one_ of: - **`checkpoint`**: Upload a checkpoint every `args.save_steps` from the [`TrainingArguments`](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments). - **`end`**: Upload the model at the end of training, if `load_best_model_at_end` is also set. -- **`false`**: Do not upload the model. +- **`false`**: Don't upload the model. @@ -189,37 +195,38 @@ os.environ["WANDB_LOG_MODEL"] = "checkpoint" -Any Transformers `Trainer` you initialize from now on will upload models to your W&B project. The model checkpoints you log will be viewable through the [Artifacts](/models/artifacts/) UI, and include the full model lineage (see an example model checkpoint in the UI [here](https://wandb.ai/wandb/arttest/artifacts/model/iv3_trained/5334ab69740f9dda4fed/lineage?_gl=1*yyql5q*_ga*MTQxOTYyNzExOS4xNjg0NDYyNzk1*_ga_JH1SJHJQXJ*MTY5MjMwNzI2Mi4yNjkuMS4xNjkyMzA5NjM2LjM3LjAuMA..)). +Any Transformers `Trainer` you initialize from now on uploads models to your W&B project. The model checkpoints you log are viewable through the [Artifacts](/models/artifacts/) UI, and include the full model lineage. See an [example model checkpoint in the Artifacts UI](https://wandb.ai/wandb/arttest/artifacts/model/iv3_trained/5334ab69740f9dda4fed/lineage?_gl=1*yyql5q*_ga*MTQxOTYyNzExOS4xNjg0NDYyNzk1*_ga_JH1SJHJQXJ*MTY5MjMwNzI2Mi4yNjkuMS4xNjkyMzA5NjM2LjM3LjAuMA..). -By default, your model will be saved to W&B Artifacts as `model-{run_id}` when `WANDB_LOG_MODEL` is set to `end` or `checkpoint-{run_id}` when `WANDB_LOG_MODEL` is set to `checkpoint`. -However, If you pass a [`run_name`](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments.run_name) in your `TrainingArguments`, the model will be saved as `model-{run_name}` or `checkpoint-{run_name}`. +By default, your model saves to W&B Artifacts as `model-{run_id}` when `WANDB_LOG_MODEL` is set to `end` or `checkpoint-{run_id}` when `WANDB_LOG_MODEL` is set to `checkpoint`. +However, if you pass a [`run_name`](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments.run_name) in your `TrainingArguments`, the model saves as `model-{run_name}` or `checkpoint-{run_name}`. #### W&B Registry -Once you have logged your checkpoints to Artifacts, you can then register your best model checkpoints and centralize them across your team with [Registry](/models/registry/). Using Registry, you can organize your best models by task, manage the lifecycles of models, track and audit the entire ML lifecyle, and [automate](/models/automations/) downstream actions. + +After you log your checkpoints to Artifacts, you can register your best model checkpoints and centralize them across your team with [Registry](/models/registry/). With Registry, you can organize your best models by task, manage the lifecycles of models, track and audit the entire ML lifecycle, and [automate](/models/automations/) downstream actions. To link a model Artifact, refer to [Registry](/models/registry/). -### Visualise evaluation outputs during training +### Visualize evaluation outputs during training -Visualing your model outputs during training or evaluation is often essential to really understand how your model is training. +Visualizing your model outputs during training or evaluation is often essential to understand how your model trains. Inspecting concrete predictions alongside loss curves helps you spot quality issues that aggregate metrics can hide. -By using the callbacks system in the Transformers Trainer, you can log additional helpful data to W&B such as your models' text generation outputs or other predictions to W&B Tables. +Using the callbacks system in the Transformers Trainer, you can log more helpful data to W&B Tables. This includes your models' text generation outputs or other predictions. -See the [Custom logging section](#custom-logging-log-and-view-evaluation-samples-during-training) below for a full guide on how to log evaluation outputs while training to log to a W&B Table like this: +For a full guide on how to log evaluation outputs while training to a W&B Table like the following, see [Log and view evaluation samples during training](#log-and-view-evaluation-samples-during-training). Shows a W&B Table with evaluation outputs -### Finish your W&B Run (Notebook only) +### Finish your W&B run (notebook only) -If your training is encapsulated in a Python script, the W&B run will end when your script finishes. +If your training is encapsulated in a Python script, the W&B run ends when your script finishes. -If you are using a Jupyter or Google Colab notebook, you'll need to tell us when you're done with training by calling `run.finish()`. +If you're using a Jupyter or Google Colab notebook, call `run.finish()` to signal that training is complete. ```python run = wandb.init() @@ -232,11 +239,16 @@ run.finish() ### Visualize your results -Once you have logged your training results you can explore your results dynamically in the [W&B Dashboard](/models/track/workspaces/). It's easy to compare across dozens of runs at once, zoom in on interesting findings, and coax insights out of complex data with flexible, interactive visualizations. +After you log your training results, you can explore them in the [W&B Dashboard](/models/track/workspaces/). You can compare runs, zoom in on findings, and explore your data with interactive visualizations. + +At this point you have a working integration: your `Trainer` logs metrics to a named project, optionally saves checkpoints to Artifacts, and surfaces evaluation outputs in the W&B Dashboard. ## Advanced features and FAQs -### How do I save the best model? +The following sections cover common follow-up tasks, such as saving the best model, resuming training from a checkpoint, customizing logging callbacks, and configuring W&B behavior through environment variables. + +### Save the best model + If you pass `TrainingArguments` with `load_best_model_at_end=True` to your `Trainer`, W&B saves the best performing model checkpoint to Artifacts. If you save your model checkpoints as Artifacts, you can promote them to the [Registry](/models/registry/). In Registry, you can: @@ -245,9 +257,9 @@ If you save your model checkpoints as Artifacts, you can promote them to the [Re - Stage models for production or bookmark them for further evaluation. - Trigger downstream CI/CD processes. -### How do I load a saved model? +### Load a saved model -If you saved your model to W&B Artifacts with `WANDB_LOG_MODEL`, you can download your model weights for additional training or to run inference. You just load them back into the same Hugging Face architecture that you used before. +If you saved your model to W&B Artifacts with `WANDB_LOG_MODEL`, you can download your model weights for more training or to run inference. Load them back into the same Hugging Face architecture that you used before. ```python # Create a new run @@ -268,8 +280,9 @@ with wandb.init(project="amazon_sentiment_analysis") as run: # Do additional training, or run inference ``` -### How do I resume training from a checkpoint? -If you had set `WANDB_LOG_MODEL='checkpoint'` you can also resume training by you can using the `model_dir` as the `model_name_or_path` argument in your `TrainingArguments` and pass `resume_from_checkpoint=True` to `Trainer`. +### Resume training from a checkpoint + +If you set `WANDB_LOG_MODEL='checkpoint'`, you can resume training by using the `model_dir` as the `model_name_or_path` argument in your `TrainingArguments` and passing `resume_from_checkpoint=True` to `Trainer`. ```python last_run_id = "xxxxxxxx" # fetch the run_id from your wandb workspace @@ -289,9 +302,9 @@ with wandb.init( # reinitialize your model and trainer model = AutoModelForSequenceClassification.from_pretrained( - "", num_labels=num_labels + "[MODEL-NAME]", num_labels=num_labels ) - # your awesome training arguments here. + # your training arguments here. training_args = TrainingArguments() trainer = Trainer(model=model, args=training_args) @@ -300,11 +313,11 @@ with wandb.init( trainer.train(resume_from_checkpoint=checkpoint_dir) ``` -### How do I log and view evaluation samples during training +### Log and view evaluation samples during training -Logging to W&B via the Transformers `Trainer` is taken care of by the [`WandbCallback`](https://huggingface.co/transformers/main_classes/callback.html#transformers.integrations.WandbCallback) in the Transformers library. If you need to customize your Hugging Face logging you can modify this callback by subclassing `WandbCallback` and adding additional functionality that leverages additional methods from the Trainer class. +The [`WandbCallback`](https://huggingface.co/transformers/main_classes/callback.html#transformers.integrations.WandbCallback) in the Transformers library handles logging to W&B through the Transformers `Trainer`. You can customize this callback to log model predictions, confusion matrices, or other custom data. To do so, subclass `WandbCallback` and add functionality that uses additional methods from the Trainer class. -Below is the general pattern to add this new callback to the HF Trainer, and further down is a code-complete example to log evaluation outputs to a W&B Table: +The following is the general pattern to add this new callback to the Hugging Face Trainer, followed by a code-complete example to log evaluation outputs to a W&B Table: ```python @@ -323,17 +336,17 @@ trainer.train() #### View evaluation samples during training -The following section shows how to customize the `WandbCallback` to run model predictions and log evaluation samples to a W&B Table during training. We will every `eval_steps` using the `on_evaluate` method of the Trainer callback. +The following section shows how to customize the `WandbCallback` to run model predictions and log evaluation samples to a W&B Table during training. This runs every `eval_steps` using the `on_evaluate` method of the Trainer callback. -Here, we wrote a `decode_predictions` function to decode the predictions and labels from the model output using the tokenizer. +The `decode_predictions` function decodes the predictions and labels from the model output using the tokenizer. -Then, we create a pandas DataFrame from the predictions and labels and add an `epoch` column to the DataFrame. +Then, the code creates a pandas DataFrame from the predictions and labels and adds an `epoch` column to the DataFrame. -Finally, we create a `wandb.Table` from the DataFrame and log it to wandb. -Additionally, we can control the frequency of logging by logging the predictions every `freq` epochs. +Finally, the code creates a `wandb.Table` from the DataFrame and logs it to W&B. You can control the frequency of logging by logging the predictions every `freq` epochs. -**Note**: Unlike the regular `WandbCallback` this custom callback needs to be added to the trainer **after** the `Trainer` is instantiated and not during initialization of the `Trainer`. -This is because the `Trainer` instance is passed to the callback during initialization. + +Unlike the regular `WandbCallback`, this custom callback needs to be added to the trainer **after** the `Trainer` is instantiated, not during initialization of the `Trainer`. This is because the `Trainer` instance is passed to the callback during initialization. + ```python from transformers.integrations import WandbCallback @@ -421,21 +434,21 @@ progress_callback = WandbPredictionProgressCallback( trainer.add_callback(progress_callback) ``` -For a more detailed example please refer to this [colab](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/huggingface/Custom_Progress_Callback.ipynb) +For a more detailed example, see this [Colab](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/huggingface/Custom_Progress_Callback.ipynb). -### What additional W&B settings are available? +### Additional W&B settings -Further configuration of what is logged with `Trainer` is possible by setting environment variables. A full list of W&B environment variables [can be found here](/platform/hosting/env-vars). +You can further configure what is logged with `Trainer` by setting environment variables. For a full list of W&B environment variables, see the [environment variables reference](/platform/hosting/env-vars). | Environment Variable | Usage | | -------------------- |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `WANDB_PROJECT` | Give your project a name (`huggingface` by default) | -| `WANDB_LOG_MODEL` |

Log the model checkpoint as a W&B Artifact (`false` by default)

  • false (default): No model checkpointing
  • checkpoint: A checkpoint will be uploaded every args.save_steps (set in the Trainer's TrainingArguments).
  • end: The final model checkpoint will be uploaded at the end of training.
| -| `WANDB_WATCH` |

Set whether you'd like to log your models gradients, parameters or neither

  • false (default): No gradient or parameter logging
  • gradients: Log histograms of the gradients
  • all: Log histograms of gradients and parameters
| +| `WANDB_LOG_MODEL` |

Log the model checkpoint as a W&B Artifact (`false` by default)

  • false (default): No model checkpointing
  • checkpoint: Upload a checkpoint every `args.save_steps` (set in the Trainer's `TrainingArguments`).
  • end: Upload the final model checkpoint at the end of training.
| +| `WANDB_WATCH` |

Set whether to log your model's gradients, parameters, or neither.

  • false (default): No gradient or parameter logging
  • gradients: Log histograms of the gradients
  • all: Log histograms of gradients and parameters
| | `WANDB_DISABLED` | Set to `true` to turn off logging entirely (`false` by default) | -| `WANDB_QUIET`. | Set to `true` to limit statements logged to standard output to critical statements only (`false` by default) | -| `WANDB_SILENT` | Set to `true` to silence the output printed by wandb (`false` by default) | +| `WANDB_QUIET` | Set to `true` to limit statements logged to standard output to critical statements only (`false` by default) | +| `WANDB_SILENT` | Set to `true` to silence the output printed by `wandb` (`false` by default) | @@ -453,11 +466,11 @@ WANDB_SILENT=true -### How do I customize `wandb.init()`? +### Customize `wandb.init()` -The `WandbCallback` that `Trainer` uses will call `wandb.init()` under the hood when `Trainer` is initialized. You can alternatively set up your runs manually by calling `wandb.init()` before the`Trainer` is initialized. This gives you full control over your W&B run configuration. +The `WandbCallback` that `Trainer` uses calls `wandb.init()` under the hood when `Trainer` is initialized. Alternatively, you can set up your runs manually by calling `wandb.init()` before the `Trainer` is initialized. This gives you full control over your W&B run configuration. -An example of what you might want to pass to `init` is below. For `wandb.init()` details, see the [`wandb.init()` reference](/models/ref/python/functions/init). +The following is an example of what you might pass to `init`. For `wandb.init()` details, see the [`wandb.init()` reference](/models/ref/python/functions/init). ```python wandb.init( @@ -471,15 +484,15 @@ wandb.init( ## Additional resources -Below are 6 Transformers and W&B related articles you might enjoy +The following are six Transformers and W&B related articles for further reading.
Hyperparameter Optimization for Hugging Face Transformers -* Three strategies for hyperparameter optimization for Hugging Face Transformers are compared: Grid Search, Bayesian Optimization, and Population Based Training. -* We use a standard uncased BERT model from Hugging Face transformers, and we want to fine-tune on the RTE dataset from the SuperGLUE benchmark -* Results show that Population Based Training is the most effective approach to hyperparameter optimization of our Hugging Face transformer model. +* Compares three strategies for hyperparameter optimization for Hugging Face Transformers: Grid Search, Bayesian Optimization, and Population Based Training. +* Uses a standard uncased BERT model from Hugging Face transformers, fine-tuned on the RTE dataset from the SuperGLUE benchmark. +* Results show that Population Based Training is the most effective approach to hyperparameter optimization of the Hugging Face transformer model. Read the [Hyperparameter Optimization for Hugging Face Transformers report](https://wandb.ai/amogkam/transformers/reports/Hyperparameter-Optimization-for-Hugging-Face-Transformers--VmlldzoyMTc2ODI).
@@ -488,53 +501,53 @@ Read the [Hyperparameter Optimization for Hugging Face Transformers report](http Hugging Tweets: Train a Model to Generate Tweets -* In the article, the author demonstrates how to fine-tune a pre-trained GPT2 HuggingFace Transformer model on anyone's Tweets in five minutes. -* The model uses the following pipeline: Downloading Tweets, Optimizing the Dataset, Initial Experiments, Comparing Losses Between Users, Fine-Tuning the Model. +* In the article, the author demonstrates how to fine-tune a pre-trained GPT2 Hugging Face Transformer model on anyone's Tweets in five minutes. +* The model uses the following pipeline: downloading Tweets, optimizing the dataset, initial experiments, comparing losses between users, and fine-tuning the model. -Read the full report [here](https://wandb.ai/wandb/huggingtweets/reports/HuggingTweets-Train-a-Model-to-Generate-Tweets--VmlldzoxMTY5MjI). +Read the [HuggingTweets report](https://wandb.ai/wandb/huggingtweets/reports/HuggingTweets-Train-a-Model-to-Generate-Tweets--VmlldzoxMTY5MjI).
Sentence Classification With Hugging Face BERT and WB -* In this article, we'll build a sentence classifier leveraging the power of recent breakthroughs in Natural Language Processing, focusing on an application of transfer learning to NLP. -* We'll be using The Corpus of Linguistic Acceptability (CoLA) dataset for single sentence classification, which is a set of sentences labeled as grammatically correct or incorrect that was first published in May 2018. -* We'll use Google's BERT to create high performance models with minimal effort on a range of NLP tasks. +* This article builds a sentence classifier using the power of recent breakthroughs in Natural Language Processing, focusing on an application of transfer learning to NLP. +* Uses The Corpus of Linguistic Acceptability (CoLA) dataset for single sentence classification, a set of sentences labeled as grammatically correct or incorrect that was first published in May 2018. +* Uses Google's BERT to create high-performance models with minimal effort on a range of NLP tasks. -Read the full report [here](https://wandb.ai/cayush/bert-finetuning/reports/Sentence-Classification-With-Huggingface-BERT-and-W-B--Vmlldzo4MDMwNA). +Read the [Sentence Classification With Hugging Face BERT and W&B report](https://wandb.ai/cayush/bert-finetuning/reports/Sentence-Classification-With-Huggingface-BERT-and-W-B--Vmlldzo4MDMwNA).
A Step by Step Guide to Tracking Hugging Face Model Performance -* We use W&B and Hugging Face transformers to train DistilBERT, a Transformer that's 40% smaller than BERT but retains 97% of BERT's accuracy, on the GLUE benchmark -* The GLUE benchmark is a collection of nine datasets and tasks for training NLP models +* Uses W&B and Hugging Face transformers to train DistilBERT, a Transformer that's 40% smaller than BERT but retains 97% of BERT's accuracy, on the GLUE benchmark. +* The GLUE benchmark is a collection of nine datasets and tasks for training NLP models. -Read the full report [here](https://wandb.ai/jxmorris12/huggingface-demo/reports/A-Step-by-Step-Guide-to-Tracking-HuggingFace-Model-Performance--VmlldzoxMDE2MTU). +Read the [Tracking Hugging Face Model Performance report](https://wandb.ai/jxmorris12/huggingface-demo/reports/A-Step-by-Step-Guide-to-Tracking-HuggingFace-Model-Performance--VmlldzoxMDE2MTU).
Examples of Early Stopping in HuggingFace -* Fine-tuning a Hugging Face Transformer using Early Stopping regularization can be done natively in PyTorch or TensorFlow. -* Using the EarlyStopping callback in TensorFlow is straightforward with the `tf.keras.callbacks.EarlyStopping`callback. -* In PyTorch, there is not an off-the-shelf early stopping method, but there is a working early stopping hook available on GitHub Gist. +* You can fine-tune a Hugging Face Transformer using Early Stopping regularization natively in PyTorch or TensorFlow. +* The `tf.keras.callbacks.EarlyStopping` callback makes using EarlyStopping in TensorFlow straightforward. +* PyTorch doesn't provide an off-the-shelf early stopping method, but a working early stopping hook is available on GitHub Gist. -Read the full report [here](https://wandb.ai/ayush-thakur/huggingface/reports/Early-Stopping-in-HuggingFace-Examples--Vmlldzo0MzE2MTM). +Read the [Early Stopping in Hugging Face report](https://wandb.ai/ayush-thakur/huggingface/reports/Early-Stopping-in-HuggingFace-Examples--Vmlldzo0MzE2MTM).
How to Fine-Tune Hugging Face Transformers on a Custom Dataset -We fine tune a DistilBERT transformer for sentiment analysis (binary classification) on a custom IMDB dataset. +Fine-tunes a DistilBERT transformer for sentiment analysis (binary classification) on a custom IMDB dataset. -Read the full report [here](https://wandb.ai/ayush-thakur/huggingface/reports/How-to-Fine-Tune-HuggingFace-Transformers-on-a-Custom-Dataset--Vmlldzo0MzQ2MDc). +Read the [Fine-Tune Hugging Face Transformers on a Custom Dataset report](https://wandb.ai/ayush-thakur/huggingface/reports/How-to-Fine-Tune-HuggingFace-Transformers-on-a-Custom-Dataset--Vmlldzo0MzQ2MDc).
## Get help or request features -For any issues, questions, or feature requests for the Hugging Face W&B integration, feel free to post in [this thread on the Hugging Face forums](https://discuss.huggingface.co/t/logging-experiment-tracking-with-w-b/498) or open an issue on the Hugging Face [Transformers GitHub repo](https://github.com/huggingface/transformers). +For any issues, questions, or feature requests for the Hugging Face W&B integration, post in [this thread on the Hugging Face forums](https://discuss.huggingface.co/t/logging-experiment-tracking-with-w-b/498) or open an issue on the Hugging Face [Transformers GitHub repo](https://github.com/huggingface/transformers). diff --git a/models/integrations/hydra.mdx b/models/integrations/hydra.mdx index 45b3061829..93743c56f0 100644 --- a/models/integrations/hydra.mdx +++ b/models/integrations/hydra.mdx @@ -1,15 +1,16 @@ --- description: "Integrate W&B with Hydra to manage complex configurations for ML experiments and log hyperparameters automatically." title: Hydra +keywords: ["omegaconf", "@hydra.main", "config override"] --- -> [Hydra](https://hydra.cc) is an open-source Python framework that simplifies the development of research and other complex applications. The key feature is the ability to dynamically create a hierarchical configuration by composition and override it through config files and the command line. +> [Hydra](https://hydra.cc) is an open source Python framework that simplifies the development of research and other complex applications. The key feature is the ability to dynamically create a hierarchical configuration by composition and override it through config files and the command line. -You can continue to use Hydra for configuration management while taking advantage of the power of W&B. +This page shows how to combine Hydra-based configuration management with W&B experiment tracking, so you can keep Hydra's composable configs while gaining W&B's visualization, hyperparameter optimization, and run comparison capabilities. The following sections cover tracking metrics, logging hyperparameters from Hydra configs, troubleshooting multiprocessing, and optimizing hyperparameters with W&B Sweeps. ## Track metrics -Track your metrics as normal with `wandb.init()` and `wandb.Run.log()` . Here, `wandb.entity` and `wandb.project` are defined within a hydra configuration file. +To send metrics from a Hydra-configured run to W&B, use `wandb.init()` and `wandb.Run.log()` as you normally would. In the following example, `wandb.entity` and `wandb.project` are defined within a Hydra configuration file so that the same config drives both Hydra and W&B. ```python import wandb @@ -24,7 +25,9 @@ def run_experiment(cfg): ## Track hyperparameters -Hydra uses [omegaconf](https://omegaconf.readthedocs.io/en/2.1_branch/) as the default way to interface with configuration dictionaries. `OmegaConf`'s dictionary are not a subclass of primitive dictionaries so directly passing Hydra's `Config` to `wandb.Run.config` leads to unexpected results on the dashboard. It's necessary to convert `omegaconf.DictConfig` to the primitive `dict` type before passing to `wandb.Run.config`. +Logging Hydra's configuration to W&B lets you see every hyperparameter alongside the run's metrics, making experiments easier to compare and reproduce. + +Hydra uses [omegaconf](https://omegaconf.readthedocs.io/en/2.1_branch/) as the default way to interface with configuration dictionaries. `OmegaConf`'s dictionary isn't a subclass of primitive dictionaries, so directly passing Hydra's `Config` to `wandb.Run.config` leads to unexpected results on the dashboard. Convert `omegaconf.DictConfig` to the primitive `dict` type before passing it to `wandb.Run.config`. ```python @hydra.main(config_path="configs/", config_name="defaults") @@ -40,7 +43,7 @@ def run_experiment(cfg): ## Troubleshoot multiprocessing -If your process hangs when started, this may be caused by [this known issue](/models/track/log/distributed-training). To solve this, try to changing wandb's multiprocessing protocol either by adding an extra settings parameter to `wandb.init()` as: +If your process stops responding when started, the [known multiprocessing issue in distributed training](/models/track/log/distributed-training) might be the cause. To resolve it, change W&B's multiprocessing protocol by either adding an extra settings parameter to `wandb.init()`: ```python wandb.init(settings=wandb.Settings(start_method="thread")) @@ -49,14 +52,14 @@ wandb.init(settings=wandb.Settings(start_method="thread")) or by setting a global environment variable from your shell: ```bash -$ export WANDB_START_METHOD=thread +export WANDB_START_METHOD=thread ``` ## Optimize hyperparameters -[W&B Sweeps](/models/sweeps) is a highly scalable hyperparameter search platform, which provides interesting insights and visualization about W&B experiments with minimal requirements code real-estate. Sweeps integrates seamlessly with Hydra projects with no-coding requirements. The only thing needed is a configuration file describing the various parameters to sweep over as normal. +[W&B Sweeps](/models/sweeps) is a hyperparameter search platform that provides insights and visualizations for W&B experiments with minimal code overhead. Sweeps integrates with Hydra projects without requiring code changes. You only need a configuration file that describes the parameters to sweep over. -A simple example `sweep.yaml` file would be: +The following `sweep.yaml` file is an example: ```yaml program: main.py @@ -81,16 +84,16 @@ Invoke the sweep: wandb sweep sweep.yaml ``` -W&B automatically creates a sweep inside your project and returns a `wandb agent` command for you to run on each machine you want to run your sweep. +W&B automatically creates a sweep inside your project and returns a `wandb agent` command. Run that command on each machine on which you want to execute the sweep. ### Pass parameters not present in Hydra defaults -Hydra supports passing extra parameters through the command line which aren't present in the default configuration file, by using a `+` before command. For example, you can pass an extra parameter with some value by simply calling: +Hydra supports passing extra parameters through the command line that aren't present in the default configuration file, by using a `+` before the command. For example, pass an extra parameter with some value by calling: ```bash -$ python program.py +experiment=some_experiment +python program.py +experiment=some_experiment ``` -You cannot sweep over such `+` configurations similar to what one does while configuring [Hydra Experiments](https://hydra.cc/docs/patterns/configuring_experiments/). To work around this, you can initialize the experiment parameter with a default empty file and use W&B Sweep to override those empty configs on each call. For more information, read [this W&B Report](https://wandb.ai/adrishd/hydra-example/reports/Configuring-W-B-Projects-with-Hydra--VmlldzoxNTA2MzQw?galleryTag=posts&utm_source=fully_connected&utm_medium=blog&utm_campaign=hydra). +You can't sweep over such `+` configurations the same way you would when configuring [Hydra Experiments](https://hydra.cc/docs/patterns/configuring_experiments/). To work around this, initialize the experiment parameter with a default empty file and use a W&B Sweep to override those empty configs on each call. For more information, read the W&B report [Configuring W&B Projects with Hydra](https://wandb.ai/adrishd/hydra-example/reports/Configuring-W-B-Projects-with-Hydra--VmlldzoxNTA2MzQw?galleryTag=posts&utm_source=fully_connected&utm_medium=blog&utm_campaign=hydra). diff --git a/models/integrations/ignite.mdx b/models/integrations/ignite.mdx index 6f42ee0b87..1a9fe239c8 100644 --- a/models/integrations/ignite.mdx +++ b/models/integrations/ignite.mdx @@ -1,15 +1,22 @@ --- description: "Integrate W&B with PyTorch Ignite to automatically log training metrics, model parameters, and experiment configs." title: PyTorch Ignite +keywords: ["Engine handler", "create_supervised_trainer", "ignite events"] --- -* See the resulting visualizations in this [example W&B report →](https://app.wandb.ai/example-team/pytorch-ignite-example/reports/PyTorch-Ignite-with-W%26B--Vmlldzo0NzkwMg) -* Try running the code yourself in this [example hosted notebook →](https://colab.research.google.com/drive/15e-yGOvboTzXU4pe91Jg-Yr7sae3zBOJ#scrollTo=ztVifsYAmnRr) +This page shows how to use the W&B handler with PyTorch Ignite to automatically log training and validation metrics, model and optimizer parameters, gradients, and model checkpoints during your experiments. -Ignite supports W&B handler to log metrics, model/optimizer parameters, gradients during training and validation. It can also be used to log model checkpoints to the W&B cloud. This class is also a wrapper for the wandb module. This means that you can call any wandb function using this wrapper. See examples on how to save model parameters and gradients. +Ignite supports a W&B handler to log metrics, model and optimizer parameters, and gradients during training and validation. You can also use it to log model checkpoints to the W&B cloud. This class wraps the `wandb` module, so you can call any `wandb` function using this wrapper. See examples on how to save model parameters and gradients. + +For additional context, see the following resources: + +* See the resulting visualizations in this [example W&B report](https://app.wandb.ai/example-team/pytorch-ignite-example/reports/PyTorch-Ignite-with-W%26B--Vmlldzo0NzkwMg). +* Try running the code yourself in this [example hosted notebook](https://colab.research.google.com/drive/15e-yGOvboTzXU4pe91Jg-Yr7sae3zBOJ#scrollTo=ztVifsYAmnRr). ## Basic setup +The following example defines a simple convolutional model and data loaders for MNIST. The logging examples that follow use these pieces. + ```python from argparse import ArgumentParser import wandb @@ -57,11 +64,11 @@ def get_data_loaders(train_batch_size, val_batch_size): return train_loader, val_loader ``` -Using `WandBLogger` in ignite is a modular process. First, you create a `WandBLogger` object. Next, you attach it to a trainer or evaluator to automatically log the metrics. This example shows: +Using `WandBLogger` in Ignite is a modular process. First, create a `WandBLogger` object. Next, attach it to a trainer or evaluator to automatically log the metrics. This example shows: * Logs training loss, attached to the trainer object. * Logs validation loss, attached to the evaluator. -* Logs optional Parameters, such as learning rate. +* Logs optional parameters, such as learning rate. * Watches the model. ```python @@ -119,7 +126,9 @@ def run(train_batch_size, val_batch_size, epochs, lr, momentum, log_interval): wandb_logger.watch(model) ``` -You can optionally utilize ignite `EVENTS` to log the metrics directly to the terminal +With the logger attached, Ignite streams training and validation metrics, optimizer parameters, and model gradients to your W&B project automatically. + +You can optionally use Ignite `EVENTS` to log the metrics directly to the terminal. ```python @trainer.on(Events.ITERATION_COMPLETED(every=log_interval)) @@ -174,7 +183,7 @@ if __name__ == "__main__": run(args.batch_size, args.val_batch_size, args.epochs, args.lr, args.momentum, args.log_interval) ``` -This code generates these visualizations:: +This code generates these visualizations: PyTorch Ignite training dashboard @@ -192,4 +201,4 @@ This code generates these visualizations:: PyTorch Ignite model comparison dashboard -Refer to the [Ignite Docs](https://pytorch.org/ignite/contrib/handlers.html#module-ignite.contrib.handlers.wandb_logger) for more details. \ No newline at end of file +Refer to the [Ignite Docs](https://pytorch.org/ignite/contrib/handlers.html#module-ignite.contrib.handlers.wandb_logger) for more details. \ No newline at end of file diff --git a/models/integrations/keras.mdx b/models/integrations/keras.mdx index 2e30e4c202..038da00c01 100644 --- a/models/integrations/keras.mdx +++ b/models/integrations/keras.mdx @@ -1,17 +1,18 @@ --- title: Keras description: "Use W&B Keras callbacks to track experiments, checkpoint models, and visualize predictions during training." +keywords: ["tf.keras callback", "WandbMetricsLogger", "model graph logging"] --- import { ColabLink } from '/snippets/_includes/colab-link.mdx'; {/* */} -Use Keras callbacks to track experiments, log model checkpoints, and visualize model predictions. Keras callbacks are available in the `wandb.integration.keras` module with Pyhon SDK versions `0.13.4` and above. +Use W&B Keras callbacks to track experiments, log model checkpoints, and visualize model predictions during training. This integration is for Keras users who want to add experiment tracking and model versioning to their training workflows without rewriting their training loop. -W&B Keras integration provides the following callbacks: +Keras callbacks are available in the `wandb.integration.keras` module with Python SDK versions `0.13.4` and above. The W&B Keras integration provides the following callbacks: -- **`WandbMetricsLogger`** : Use this callback for [Experiment Tracking](/models/track/). It logs your training and validation metrics along with system metrics to W&B. -- **`WandbModelCheckpoint`** : Use this callback to log your model checkpoints to W&B [Artifacts](/models/artifacts/). +- **`WandbMetricsLogger`**: Use this callback for [experiment tracking](/models/track/). It logs your training and validation metrics along with system metrics to W&B. +- **`WandbModelCheckpoint`**: Use this callback to log your model checkpoints to W&B [Artifacts](/models/artifacts/). - **`WandbEvalCallback`**: This base callback logs model predictions to W&B [Tables](/models/tables/) for interactive visualization. ## Install and import Keras integration @@ -22,8 +23,7 @@ Install the latest version of W&B. pip install -U wandb ``` -To use the Keras integration, import required classes from `wandb.integration.keras`: - +To use the Keras integration, import required classes from `wandb.integration.keras`. ```python import wandb @@ -36,9 +36,9 @@ The following sections describe each callback in detail with code examples. -`wandb.integration.keras.WandbMetricsLogger()` automatically logs Keras' `logs` dictionary that callback methods such as `on_epoch_end`, `on_batch_end` etc, take as an argument. +`wandb.integration.keras.WandbMetricsLogger()` logs Keras' `logs` dictionary that callback methods such as `on_epoch_end` and `on_batch_end` take as an argument. -The partial example below shows how to use `WandbMetricsLogger()` in a Keras workflow. First, compile the model with desired optimizer, loss function, and metrics. Then, initialize a W&B run using `wandb.init()`. Finally, pass the `WandbMetricsLogger()` callback to `model.fit()`. +The following partial example shows how to use `WandbMetricsLogger()` in a Keras workflow. First, compile the model with the desired optimizer, loss function, and metrics. Then, initialize a W&B run using `wandb.init()`. Finally, pass the `WandbMetricsLogger()` callback to `model.fit()`. ```python import wandb @@ -60,7 +60,7 @@ with wandb.init(config={"batch_size": 64}) as run: ) ``` -The previous example logs training and validation metrics such as `loss`, `accuracy`, and `top@5_accuracy` to W&B at the end of each epoch. It also logs: +The previous example logs training and validation metrics such as `loss`, `accuracy`, and `top@5_accuracy` to W&B at the end of each epoch. ### `WandbMetricsLogger` reference @@ -74,17 +74,17 @@ The previous example logs training and validation metrics such as `loss`, `accur -Use `WandbModelCheckpoint` callback to save the Keras model (`SavedModel` format) or model weights periodically and uploads them to W&B as a `wandb.Artifact` for model versioning. +Use the `WandbModelCheckpoint` callback to periodically save the Keras model (`SavedModel` format) or model weights and upload them to W&B as a `wandb.Artifact` for model versioning. -This callback is subclassed from [`tf.keras.callbacks.ModelCheckpoint()`](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint) ,thus the checkpointing logic is taken care of by the parent callback. +This callback subclasses [`tf.keras.callbacks.ModelCheckpoint()`](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint), so the parent callback handles the checkpointing logic. This callback saves: -* The model that has achieved best performance based on the monitor. -* The model at the end of every epoch regardless of the performance. -* The model at the end of the epoch or after a fixed number of training batches. -* Only model weights or the whole model. -* The model either in `SavedModel` format or in `.h5` format. +- The model that has achieved best performance based on the monitor. +- The model at the end of every epoch regardless of the performance. +- The model at the end of the epoch or after a fixed number of training batches. +- Only model weights or the whole model. +- The model either in `SavedModel` format or in `.h5` format. Use this callback in conjunction with `WandbMetricsLogger()`. @@ -117,11 +117,11 @@ with wandb.init(config={"bs": 12}) as run: | `save_best_only` | (Boolean): if `save_best_only=True`, it only saves the latest model or the model it considers the best, according to the defined by the `monitor` and `mode` attributes. | | `save_weights_only` | (Boolean): if True, saves only the model's weights. | | `mode` | (`auto`, `min`, or `max`): For `val_acc`, set it to `max`, for `val_loss`, set it to `min`, and so on | | -| `save_freq` | ("epoch" or int): When using ‘epoch’, the callback saves the model after each epoch. When using an integer, the callback saves the model at end of this many batches. Note that when monitoring validation metrics such as `val_acc` or `val_loss`, `save_freq` must be set to "epoch" as those metrics are only available at the end of an epoch. | +| `save_freq` | ("epoch" or int): When using "epoch", the callback saves the model after each epoch. When using an integer, the callback saves the model at end of this many batches. When monitoring validation metrics such as `val_acc` or `val_loss`, `save_freq` must be set to "epoch" as those metrics are only available at the end of an epoch. | | `options` | (str): Optional `tf.train.CheckpointOptions` object if `save_weights_only` is true or optional `tf.saved_model.SaveOptions` object if `save_weights_only` is false. | | `initial_value_threshold` | (float): Floating point initial "best" value of the metric to be monitored. | -### Log checkpoints after N epochs +### Log checkpoints after `N` epochs By default (`save_freq="epoch"`), the callback creates a checkpoint and uploads it as an artifact after each epoch. To create a checkpoint after a specific number of batches, set `save_freq` to an integer. To checkpoint after `N` epochs, compute the cardinality of the `train` dataloader and pass it to `save_freq`: @@ -132,9 +132,9 @@ WandbModelCheckpoint( ) ``` -### Efficiently log checkpoints on a TPU architecture +### Log checkpoints efficiently on a TPU architecture -While checkpointing on TPUs you might encounter `UnimplementedError: File system scheme '[local]' not implemented` error message. This happens because the model directory (`filepath`) must use a cloud storage bucket path (`gs://bucket-name/...`), and this bucket must be accessible from the TPU server. Instead, W&B uses the local path for checkpointing which in turn is uploaded as an artifact. +While checkpointing on TPUs, you might encounter the `UnimplementedError: File system scheme '[local]' not implemented` error message. This happens because the model directory (`filepath`) must use a cloud storage bucket path (`gs://bucket-name/...`), and this bucket must be accessible from the TPU server. Instead, W&B uses the local path for checkpointing, which W&B then uploads as an artifact. ```python checkpoint_options = tf.saved_model.SaveOptions(experimental_io_device="/job:localhost") @@ -149,16 +149,16 @@ WandbModelCheckpoint( -The `WandbEvalCallback()` is an abstract base class to build Keras callbacks primarily for model prediction and, secondarily, dataset visualization. +`WandbEvalCallback()` is an abstract base class for building Keras callbacks, primarily for model prediction and, secondarily, dataset visualization. -This abstract callback is agnostic with respect to the dataset and the task. To use this, inherit from this base `WandbEvalCallback()` callback class and implement the `add_ground_truth` and `add_model_prediction` methods. +This abstract callback is independent of the dataset and the task. To use it, inherit from this base `WandbEvalCallback()` callback class and implement the `add_ground_truth` and `add_model_prediction` methods. -The `WandbEvalCallback()` is a utility class that provides methods to: +`WandbEvalCallback()` is a utility class that provides methods to: -* Create data and prediction `wandb.Table()` instances. -* Log data and prediction Tables as `wandb.Artifact()`. -* Log the data table `on_train_begin`. -* log the prediction table `on_epoch_end`. +- Create data and prediction `wandb.Table()` instances. +- Log data and prediction Tables as `wandb.Artifact()`. +- Log the data table `on_train_begin`. +- Log the prediction table `on_epoch_end`. The following example uses `WandbClfEvalCallback` for an image classification task. This example callback logs the validation data (`data_table`) to W&B, performs inference, and logs the prediction (`pred_table`) to W&B at the end of every epoch. @@ -228,19 +228,19 @@ with wandb.init(config={"hyper": "parameter"}) as run: ### Memory footprint details -We log the `data_table` to W&B when the `on_train_begin` method is invoked. Once it's uploaded as a W&B Artifact, we get a reference to this table which can be accessed using `data_table_ref` class variable. The `data_table_ref` is a 2D list that can be indexed like `self.data_table_ref[idx][n]`, where `idx` is the row number while `n` is the column number. Let's see the usage in the example below. +W&B logs the `data_table` when invoking the `on_train_begin` method. After W&B uploads it as a W&B Artifact, you get a reference to this table, which you can access using the `data_table_ref` class variable. The `data_table_ref` is a 2D list that you can index like `self.data_table_ref[idx][n]`, where `idx` is the row number and `n` is the column number. See the usage in the following example. ### Customize the callback -You can override the `on_train_begin` or `on_epoch_end` methods to have more fine-grained control. If you want to log the samples after `N` batches, you can implement `on_train_batch_end` method. +For more control over when data and predictions are logged, you can override the default callback methods. Override the `on_train_begin` or `on_epoch_end` methods to have more fine-grained control. If you want to log the samples after `N` batches, you can implement the `on_train_batch_end` method. -If you are implementing a callback for model prediction visualization by inheriting `WandbEvalCallback` and something needs to be clarified or fixed, let us know by opening an [issue](https://github.com/wandb/wandb/issues). +If you're implementing a callback for model prediction visualization by inheriting `WandbEvalCallback` and something needs to be clarified or fixed, open an [issue](https://github.com/wandb/wandb/issues). -## `WandbCallback` [legacy] +## Legacy `WandbCallback` -Use the W&B library `WandbCallback()` Class to automatically save all the metrics and the loss values tracked in `model.fit()`. +`WandbCallback` is the legacy all-in-one callback. For new projects, use the dedicated callbacks described in the previous sections (`WandbMetricsLogger`, `WandbModelCheckpoint`, and `WandbEvalCallback`). Use the W&B library `WandbCallback()` class to save all metrics and loss values tracked in `model.fit()`. ```python import wandb @@ -260,20 +260,18 @@ You can watch the short video [Get Started with Keras and W&B in Less Than a Min For a more detailed video, watch [Integrate W&B with Keras](https://www.youtube.com/watch?v=Bsudo7jbMow\&ab_channel=Weights%26Biases). You can review the [Colab Jupyter Notebook](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/keras/Keras_pipeline_with_Weights_and_Biases.ipynb). - -See our [example repo](https://github.com/wandb/examples) for scripts, including a [Fashion MNIST example](https://github.com/wandb/examples/blob/master/examples/keras/keras-cnn-fashion/train.py) and the [W&B Dashboard](https://wandb.ai/wandb/keras-fashion-mnist/runs/5z1d85qs) it generates. - +For additional sample scripts, see the [W&B example repo](https://github.com/wandb/examples), including a [Fashion MNIST example](https://github.com/wandb/examples/blob/master/examples/keras/keras-cnn-fashion/train.py) and the [W&B Dashboard](https://wandb.ai/wandb/keras-fashion-mnist/runs/5z1d85qs) it generates. -The `WandbCallback` class supports a wide variety of logging configuration options: specifying a metric to monitor, tracking of weights and gradients, logging of predictions on training_data and validation_data, and more. +The `WandbCallback` class supports logging configuration options: specifying a metric to monitor, tracking of weights and gradients, logging of predictions on `training_data` and `validation_data`, and more. -Check out the reference documentation for the `keras.WandbCallback` for full details. +See the reference documentation for `keras.WandbCallback` for full details. -The `WandbCallback` +`WandbCallback`: -* Automatically logs history data from any metrics collected by Keras: loss and anything passed into `keras_model.compile()`. -* Sets summary metrics for the run associated with the "best" training step, as defined by the `monitor` and `mode` attributes. This defaults to the epoch with the minimum `val_loss`. `WandbCallback` by default saves the model associated with the best `epoch`. -* Optionally logs gradient and parameter histogram. -* Optionally saves training and validation data for wandb to visualize. +- Logs history data from any metrics collected by Keras: loss and anything passed into `keras_model.compile()`. +- Sets summary metrics for the run associated with the "best" training step, as defined by the `monitor` and `mode` attributes. This defaults to the epoch with the minimum `val_loss`. By default, `WandbCallback` saves the model associated with the best `epoch`. +- Optionally logs gradient and parameter histograms. +- Optionally saves training and validation data for wandb to visualize. ### `WandbCallback` reference @@ -287,13 +285,13 @@ The `WandbCallback` | `log_weights` | (boolean) if True save histograms of the model's layer's weights. | | `log_gradients` | (boolean) if True log histograms of the training gradients | | `training_data` | (tuple) Same format `(X,y)` as passed to `model.fit`. This is needed for calculating gradients - this is mandatory if `log_gradients` is `True`. | -| `validation_data` | (tuple) Same format `(X,y)` as passed to `model.fit`. A set of data for wandb to visualize. If you set this field, every epoch, wandb makes a small number of predictions and saves the results for later visualization. | +| `validation_data` | (tuple) Same format `(X,y)` as passed to `model.fit`. A set of data for wandb to visualize. If you set this field, wandb makes a small number of predictions every epoch and saves the results for later visualization. | | `generator` | (generator) a generator that returns validation data for wandb to visualize. This generator should return tuples `(X,y)`. Either `validate_data` or generator should be set for wandb to visualize specific data examples. | | `validation_steps` | (int) if `validation_data` is a generator, how many steps to run the generator for the full validation set. | | `labels` | (list) If you are visualizing your data with wandb this list of labels converts numeric output to understandable string if you are building a classifier with multiple classes. For a binary classifier, you can pass in a list of two labels \[`label for false`, `label for true`]. If `validate_data` and `generator` are both false, this does nothing. | | `predictions` | (int) the number of predictions to make for visualization each epoch, max is 100. | | `input_type` | (string) type of the model input to help visualization. can be one of: (`image`, `images`, `segmentation_mask`). | -| `output_type` | (string) type of the model output to help visualziation. can be one of: (`image`, `images`, `segmentation_mask`). | +| `output_type` | (string) type of the model output to help visualization. can be one of: (`image`, `images`, `segmentation_mask`). | | `log_evaluation` | (boolean) if True, save a Table containing validation data and the model's predictions at each epoch. See `validation_indexes`, `validation_row_processor`, and `output_row_processor` for additional details. | | `class_colors` | (\[float, float, float]) if the input or output is a segmentation mask, an array containing an rgb tuple (range 0-1) for each class. | | `log_batch_frequency` | (integer) if None, callback logs every epoch. If set to integer, callback logs training metrics every `log_batch_frequency` batches. | @@ -306,9 +304,9 @@ The `WandbCallback` ## Frequently asked questions -### How do I use `Keras` multiprocessing with `wandb`? +### Use Keras multiprocessing with wandb -When setting `use_multiprocessing=True`, this error may occur: +When you set `use_multiprocessing=True`, this error might occur: ```python Error("You must call wandb.init() before wandb.config.batch_size") @@ -317,4 +315,4 @@ Error("You must call wandb.init() before wandb.config.batch_size") To work around it: 1. In the `Sequence` class construction, add: `wandb.init(group='...')`. -2. In `main`, make sure you're using `if __name__ == "__main__":` and put the rest of your script logic inside it. \ No newline at end of file +2. In `main`, make sure you use `if __name__ == "__main__":` and put the rest of your script logic inside it. \ No newline at end of file diff --git a/models/integrations/kubeflow-pipelines-kfp.mdx b/models/integrations/kubeflow-pipelines-kfp.mdx index 6570a21cdc..485cdb6451 100644 --- a/models/integrations/kubeflow-pipelines-kfp.mdx +++ b/models/integrations/kubeflow-pipelines-kfp.mdx @@ -1,15 +1,16 @@ --- description: "Integrate W&B with Kubeflow Pipelines to track experiments and visualize metrics across ML pipeline components." title: Kubeflow Pipelines (kfp) +keywords: ["KFP component", "kubeflow run", "pipeline step tracking"] --- import ApiKeyCreateStreamlined from "/snippets/_includes/api-key-create-streamlined.mdx"; -[Kubeflow Pipelines (kfp) ](https://www.kubeflow.org/docs/components/pipelines/overview/)is a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers. +[Kubeflow Pipelines (kfp)](https://www.kubeflow.org/docs/components/pipelines/overview/) is a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers. -This integration lets users apply decorators to kfp python functional components to automatically log parameters and artifacts to W&B. +This guide shows you how to integrate W&B with Kubeflow Pipelines so that parameters and artifacts from your pipeline components are automatically tracked in W&B. By the end, you can apply a decorator to kfp Python functional components to log inputs, outputs, and artifacts to W&B without modifying the body of each component. -This feature was enabled in `wandb==0.12.11` and requires `kfp<2.0.0` +This feature was enabled in `wandb==0.12.11` and requires `kfp<2.0.0`. ## Sign up and create an API key @@ -29,7 +30,7 @@ To install the `wandb` library locally and log in: 1. Set the `WANDB_API_KEY` [environment variable](/models/track/environment-variables/) to your API key. ```bash - export WANDB_API_KEY= + export WANDB_API_KEY=[YOUR-API-KEY] ``` 1. Install the `wandb` library and log in. @@ -64,7 +65,7 @@ wandb.login() ## Decorate your components -Add the `@wandb_log` decorator and create your components as usual. This will automatically log the input/outputs parameters and artifacts to W&B each time you run your pipeline. +With the `wandb` library installed, you can now enable W&B tracking on individual pipeline components. Add the `@wandb_log` decorator and create your components as usual. This automatically logs the input and output parameters and artifacts to W&B each time you run your pipeline. ```python from kfp import components @@ -81,7 +82,7 @@ add = components.create_component_from_func(add) ## Pass environment variables to containers -You may need to explicitly pass [environment variables](/models/track/environment-variables/) to your containers. For two-way linking, you should also set the environment variables `WANDB_KUBEFLOW_URL` to the base URL of your Kubeflow Pipelines instance. For example, `https://kubeflow.mysite.com`. +Each pipeline component runs in its own container, so the W&B credentials available on your local machine aren't automatically propagated to the component. You may need to explicitly pass [environment variables](/models/track/environment-variables/) to your containers so that each component can authenticate to W&B. For two-way linking, you should also set the environment variables `WANDB_KUBEFLOW_URL` to the base URL of your Kubeflow Pipelines instance. For example, `https://kubeflow.mysite.com`. ```python import os @@ -107,20 +108,22 @@ def example_pipeline(param1: str, param2: int): ## Access your data programmatically -### Via the Kubeflow Pipelines UI +Once your pipeline runs are logging to W&B, you can review and retrieve the tracked data in several ways. The following sections describe how to access your runs from the Kubeflow Pipelines UI, the W&B web app, and the W&B Public API. -Click on any Run in the Kubeflow Pipelines UI that has been logged with W&B. +### Kubeflow Pipelines UI -* Find details about inputs and outputs in the `Input/Output` and `ML Metadata` tabs. -* View the W&B web app from the `Visualizations` tab. +Click any run in the Kubeflow Pipelines UI that has been logged with W&B, then: + +* Find details about inputs and outputs in the **Input/Output** and **ML Metadata** tabs. +* View the W&B web app from the **Visualizations** tab. W&B in Kubeflow UI -### Via the web app UI +### W&B web app UI -The web app UI has the same content as the `Visualizations` tab in Kubeflow Pipelines, but with more space. Learn [more about the web app UI here](/models/app). +The web app UI has the same content as the **Visualizations** tab in Kubeflow Pipelines, but with more space. For details, see the [W&B web app documentation](/models/app). Run details @@ -130,13 +133,13 @@ The web app UI has the same content as the `Visualizations` tab in Kubeflow Pipe Pipeline DAG -### Via the Public API (for programmatic access) +### Public API -* For programmatic access, [see our Public API](/models/ref/python/public-api/). +For programmatic access, [see the Public API](/models/ref/python/public-api/). -### Concept mapping from Kubeflow Pipelines to W&B +## Concept mapping from Kubeflow Pipelines to W&B -Here's a mapping of Kubeflow Pipelines concepts to W&B +The following table maps Kubeflow Pipelines concepts to W&B. | Kubeflow Pipelines | W&B | Location in W&B | | ------------------ | --- | --------------- | @@ -147,11 +150,11 @@ Here's a mapping of Kubeflow Pipelines concepts to W&B ## Fine-grain logging -If you want finer control of logging, you can sprinkle in `wandb.log()` and `wandb.log_artifact()` calls in the component. +The `@wandb_log` decorator handles inputs and outputs automatically, but it doesn't capture intermediate values such as training metrics across epochs. If you want finer control of logging, you can add `wandb.log()` and `wandb.log_artifact()` calls in the component. ### With explicit `wandb.log_artifact()` calls -In this example below, we are training a model. The `@wandb_log` decorator will automatically track the relevant inputs and outputs. If you want to log the training process, you can explicitly add that logging like so: +In the following example, you train a model. The `@wandb_log` decorator automatically tracks the relevant inputs and outputs. If you want to log the training process, you can explicitly add that logging like so: ```python @wandb_log @@ -173,9 +176,9 @@ def train_model( run.log_artifact(model_artifact) ``` -### With implicit wandb integrations +### With implicit `wandb` integrations -If you're using a [framework integration we support](/models/integrations), you can also pass in the callback directly: +If you're using a [supported framework integration](/models/integrations), you can also pass in the callback directly: ```python @wandb_log diff --git a/models/integrations/lightgbm.mdx b/models/integrations/lightgbm.mdx index 2515f221b5..bfdef151be 100644 --- a/models/integrations/lightgbm.mdx +++ b/models/integrations/lightgbm.mdx @@ -1,12 +1,15 @@ --- description: "Integrate W&B with LightGBM to log gradient boosting metrics, feature importance, and model performance automatically." title: LightGBM +keywords: ["LGBMClassifier", "lightgbm plot", "boosting rounds"] --- import { ColabLink } from '/snippets/_includes/colab-link.mdx'; -The `wandb` library includes a special callback for [LightGBM](https://lightgbm.readthedocs.io/en/latest/). It's also easy to use the generic logging features of W&B to track large experiments, like hyperparameter sweeps. +The `wandb` library includes a special callback for [LightGBM](https://lightgbm.readthedocs.io/en/latest/) that automatically logs training metrics, feature importance, and model checkpoints to W&B. You can also use the generic logging features of W&B to track large experiments, such as hyperparameter sweeps. + +Use this integration to monitor gradient boosting model performance, compare runs, and analyze feature contributions without writing custom logging code. ```python from wandb.integration.lightgbm import wandb_callback, log_summary @@ -20,14 +23,14 @@ log_summary(gbm, save_model_checkpoint=True) ``` -Looking for working code examples? Check out [our repository of examples on GitHub](https://github.com/wandb/examples/tree/master/examples/boosting-algorithms). +For working code examples, see the [repository of examples on GitHub](https://github.com/wandb/examples/tree/master/examples/boosting-algorithms). -## Tuning your hyperparameters with Sweeps +## Tune your hyperparameters with Sweeps -Attaining the maximum performance out of models requires tuning hyperparameters, like tree depth and learning rate. W&B [Sweeps](/models/sweeps/) is a powerful toolkit for configuring, orchestrating, and analyzing large hyperparameter testing experiments. +To get the best performance from models, tune hyperparameters such as tree depth and learning rate. W&B [Sweeps](/models/sweeps/) is a toolkit that configures, orchestrates, and analyzes large hyperparameter testing experiments. -To learn more about these tools and see an example of how to use Sweeps with XGBoost, check out this interactive Colab notebook. +To learn more about these tools and see an example of how to use Sweeps with XGBoost, open the following interactive Colab notebook. diff --git a/models/integrations/lightning.mdx b/models/integrations/lightning.mdx index 52a8bc7027..9b4d67c730 100644 --- a/models/integrations/lightning.mdx +++ b/models/integrations/lightning.mdx @@ -1,6 +1,7 @@ --- title: PyTorch Lightning description: "Use W&B with PyTorch Lightning through the built-in WandbLogger for experiment tracking and model checkpointing." +keywords: ["pl.LightningModule", "log_hyperparameters", "checkpoint callback"] --- import { ColabLink } from '/snippets/_includes/colab-link.mdx'; @@ -10,10 +11,14 @@ import ApiKeyCreateStreamlined from "/snippets/_includes/api-key-create-streamli {/* Colab link broken. Removing for now */} {/* */} -PyTorch Lightning provides a lightweight wrapper for organizing your PyTorch code and easily adding advanced features such as distributed training and 16-bit precision. W&B provides a lightweight wrapper for logging your ML experiments. But you don't need to combine the two yourself: W&B is incorporated directly into the PyTorch Lightning library via the [`WandbLogger`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.loggers.wandb.html#module-lightning.pytorch.loggers.wandb). +PyTorch Lightning provides a lightweight wrapper for organizing your PyTorch code and adding advanced features such as distributed training and 16-bit precision. W&B provides a lightweight wrapper for logging your ML experiments. You don't need to combine the two yourself: W&B is incorporated directly into the PyTorch Lightning library through the [`WandbLogger`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.loggers.wandb.html#module-lightning.pytorch.loggers.wandb). + +This page shows you how to use `WandbLogger` to track metrics, log hyperparameters, save model checkpoints as artifacts, log media, and run multi-GPU training with PyTorch Lightning and W&B. ## Integrate with Lightning +The following sections show how to authenticate with W&B, install the `wandb` library, and attach a `WandbLogger` to your Lightning `Trainer` or `Fabric` instance. + ```python @@ -25,7 +30,7 @@ trainer = Trainer(logger=wandb_logger) ``` -**Using wandb.log():** The `WandbLogger` logs to W&B using the Trainer's `global_step`. If you make additional calls to `wandb.log()` directly in your code, **do not** use the `step` argument in `wandb.log()`. +**Using `wandb.log()`:** The `WandbLogger` logs to W&B using the Trainer's `global_step`. If you make additional calls to `wandb.log()` directly in your code, don't use the `step` argument in `wandb.log()`. Instead, log the Trainer's `global_step` like your other metrics: @@ -48,7 +53,7 @@ fabric.log_dict({"important_metric": important_metric}) - Interactive dashboards + Interactive dashboards ### Sign up and create an API key @@ -57,8 +62,10 @@ An API key authenticates your machine to W&B. You can generate an API key from y +To generate an API key from your user profile: + 1. Click your user profile icon in the upper right corner. -1. Select **User Settings**, then scroll to the **API Keys** section. +2. Select **User Settings**, then scroll to the **API Keys** section. ### Install the `wandb` library and log in @@ -69,10 +76,10 @@ To install the `wandb` library locally and log in: 1. Set the `WANDB_API_KEY` [environment variable](/models/track/environment-variables/) to your API key. ```bash - export WANDB_API_KEY= + export WANDB_API_KEY=[YOUR-API-KEY] ``` -1. Install the `wandb` library and log in. +2. Install the `wandb` library and log in. @@ -104,7 +111,7 @@ wandb.login() ## Use PyTorch Lightning's `WandbLogger` -PyTorch Lightning has multiple `WandbLogger` classes to log metrics and model weights, media, and more. +PyTorch Lightning has multiple `WandbLogger` classes to log metrics, model weights, and media. Choose the class that matches your training setup: - [`PyTorch`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.loggers.wandb.html#module-lightning.pytorch.loggers.wandb) - [`Fabric`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.loggers.wandb.html#module-lightning.pytorch.loggers.wandb) @@ -131,20 +138,22 @@ fabric.log_dict({ ### Common logger arguments -Below are some of the most used parameters in `WandbLogger`. Review the PyTorch Lightning documentation for details about all logger arguments. +The following table lists common parameters for `WandbLogger`. Review the PyTorch Lightning documentation for details about all logger arguments. - [`PyTorch`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.loggers.wandb.html#module-lightning.pytorch.loggers.wandb) - [`Fabric`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.loggers.wandb.html#module-lightning.pytorch.loggers.wandb) | Parameter | Description | | ----------- | ----------------------------------------------------------------------------- | -| `project` | Define what wandb Project to log to | -| `name` | Give a name to your wandb run | -| `log_model` | Log all models if `log_model="all"` or at end of training if `log_model=True` | +| `project` | Defines which W&B project to log to | +| `name` | Names your W&B run | +| `log_model` | Logs all models if `log_model="all"` or at end of training if `log_model=True` | | `save_dir` | Path where data is saved | ## Log your hyperparameters +Logging hyperparameters with W&B lets you compare runs and reproduce results. Use the method that matches your logger: + ```python @@ -167,6 +176,8 @@ wandb_logger.log_hyperparams( ## Log additional config parameters +To capture extra configuration values alongside your hyperparameters, update the run config directly: + ```python # add one parameter wandb_logger.experiment.config["key"] = value @@ -181,15 +192,15 @@ wandb.config.update() ## Log gradients, parameter histogram and model topology -You can pass your model object to `wandblogger.watch()` to monitor your models's gradients and parameters as you train. See the PyTorch Lightning `WandbLogger` documentation +Pass your model object to `wandblogger.watch()` to monitor your model's gradients and parameters as you train. See the PyTorch Lightning `WandbLogger` documentation. ## Log metrics -You can log your metrics to W&B when using the `WandbLogger` by calling `self.log('my_metric_name', metric_vale)` within your `LightningModule`, such as in your `training_step` or `validation_step methods.` +To log your metrics to W&B when using the `WandbLogger`, call `self.log('my_metric_name', metric_vale)` within your `LightningModule`, such as in your `training_step` or `validation_step` methods. -The code snippet below shows how to define your `LightningModule` to log your metrics and your `LightningModule` hyperparameters. This example uses the [`torchmetrics`](https://github.com/Lightning-AI/torchmetrics) library to calculate your metrics +The following code snippet shows how to define your `LightningModule` to log your metrics and your `LightningModule` hyperparameters. This example uses the [`torchmetrics`](https://github.com/Lightning-AI/torchmetrics) library to calculate your metrics. ```python import torch @@ -222,7 +233,7 @@ class My_LitModule(LightningModule): batch_size, channels, width, height = x.size() x = x.view(batch_size, -1) - # let's do 3 x (linear + relu) + # apply 3 x (linear + relu) x = F.relu(self.layer_1(x)) x = F.relu(self.layer_2(x)) x = self.layer_3(x) @@ -293,9 +304,9 @@ for epoch in range(num_epochs): ## Log the min/max of a metric -Using wandb's [`define_metric`](/models/ref/python/experiments/run#define_metric) function you can define whether you'd like your W&B summary metric to display the min, max, mean or best value for that metric. If `define`_`metric` _ isn't used, then the last value logged with appear in your summary metrics. See the `define_metric` [reference docs here](/models/ref/python/experiments/run#define_metric) and the [guide here](/models/track/log/customize-logging-axes/) for more. +Using W&B's [`define_metric`](/models/ref/python/experiments/run#define_metric) function, you can define whether your W&B summary metric displays the min, max, mean, or best value for that metric. If `define_metric` isn't used, the last value logged appears in your summary metrics. For more information, see the [customize logging axes guide](/models/track/log/customize-logging-axes/). -To tell W&B to keep track of the max validation accuracy in the W&B summary metric, call `wandb.define_metric()` only once, at the beginning of training: +To track the max validation accuracy in the W&B summary metric, call `wandb.define_metric()` only once, at the beginning of training: @@ -327,6 +338,8 @@ fabric.log_dict({"val_accuracy": val_accuracy}) ## Checkpoint a model +Saving checkpoints as W&B artifacts gives you versioned model files you can retrieve later by run, alias, or version. + To save model checkpoints as W&B [Artifacts](/models/artifacts/), use the Lightning [`ModelCheckpoint`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.callbacks.ModelCheckpoint.html) callback and set the `log_model` argument in the `WandbLogger`. @@ -343,22 +356,22 @@ fabric = L.Fabric(loggers=[wandb_logger], callbacks=[checkpoint_callback]) -The _latest_ and _best_ aliases are automatically set to easily retrieve a model checkpoint from a W&B [Artifact](/models/artifacts/): +The `latest` and `best` aliases are set automatically to make it easier to retrieve a model checkpoint from a W&B Artifact: ```python # reference can be retrieved in artifacts panel -# "VERSION" can be a version (ex: "v2") or an alias ("latest or "best") -checkpoint_reference = "USER/PROJECT/MODEL-RUN_ID:VERSION" +# "VERSION" can be a version (for example, "v2") or an alias ("latest" or "best") +checkpoint_reference = "[USER]/[PROJECT]/[MODEL-RUN_ID]:[VERSION]" ``` - + ```python # download checkpoint locally (if not already cached) wandb_logger.download_artifact(checkpoint_reference, artifact_type="model") ``` - + ```python # download checkpoint locally (if not already cached) run = wandb.init(project="MNIST") @@ -386,17 +399,17 @@ optimizer.load_state_dict(full_checkpoint["optimizer"]) -The model checkpoints you log are viewable through the [W&B Artifacts](/models/artifacts/) UI, and include the full model lineage (see an example model checkpoint in the UI [here](https://wandb.ai/wandb/arttest/artifacts/model/iv3_trained/5334ab69740f9dda4fed/lineage?_gl=1*yyql5q*_ga*MTQxOTYyNzExOS4xNjg0NDYyNzk1*_ga_JH1SJHJQXJ*MTY5MjMwNzI2Mi4yNjkuMS4xNjkyMzA5NjM2LjM3LjAuMA..)). +The model checkpoints you log are viewable through the [W&B Artifacts](/models/artifacts/) UI, and include the full model lineage (see an [example model checkpoint in the UI](https://wandb.ai/wandb/arttest/artifacts/model/iv3_trained/5334ab69740f9dda4fed/lineage?_gl=1*yyql5q*_ga*MTQxOTYyNzExOS4xNjg0NDYyNzk1*_ga_JH1SJHJQXJ*MTY5MjMwNzI2Mi4yNjkuMS4xNjkyMzA5NjM2LjM3LjAuMA..)). -To bookmark your best model checkpoints and centralize them across your team, you can link them to the [W&B Model Registry](/models). +To bookmark your best model checkpoints and centralize them across your team, link them to the [W&B Model Registry](/models). -Here you can organize your best models by task, manage model lifecycle, facilitate easy tracking and auditing throughout the ML lifecyle, and [automate](/models/automations/) downstream actions with webhooks or jobs. +In the Registry, you can organize your best models by task, manage model lifecycle, track and audit throughout the ML lifecycle, and [automate](/models/automations/) downstream actions with webhooks or jobs. ## Log images, text, and more -The `WandbLogger` has `log_image`, `log_text` and `log_table` methods for logging media. +The `WandbLogger` has `log_image`, `log_text`, and `log_table` methods for logging media. -You can also directly call `wandb.log()` or `trainer.logger.experiment.log()` to log other media types such as Audio, Molecules, Point Clouds, 3D Objects and more. +You can also call `wandb.log()` or `trainer.logger.experiment.log()` directly to log other media types such as Audio, Molecules, Point Clouds, and 3D Objects. @@ -447,7 +460,7 @@ wandb_logger.log_table(key="my_samples", columns=columns, data=data) -You can use Lightning's Callbacks system to control when you log to W&B via the `WandbLogger`, in this example we log a sample of our validation images and predictions: +Use Lightning's Callbacks system to control when you log to W&B through the `WandbLogger`. The following example logs a sample of validation images and predictions: ```python @@ -469,7 +482,7 @@ class LogPredictionSamplesCallback(Callback): # `outputs` comes from `LightningModule.validation_step` # which corresponds to our model predictions in this case - # Let's log 20 sample image predictions from the first batch + # Log 20 sample image predictions from the first batch if batch_idx == 0: n = 20 x, y = batch @@ -497,11 +510,13 @@ trainer = pl.Trainer(callbacks=[LogPredictionSamplesCallback()]) ## Use multiple GPUs with Lightning and W&B -PyTorch Lightning has Multi-GPU support through their DDP Interface. However, PyTorch Lightning's design requires you to be careful about how you instantiate our GPUs. +When you run distributed training, the way you reference `wandb.run` across ranks can affect whether training proceeds or deadlocks. This section explains the requirements and shows a recommended pattern. + +PyTorch Lightning supports multi-GPU through its DDP Interface. However, PyTorch Lightning's design requires you to be careful about how you instantiate your GPUs. -Lightning assumes that each GPU (or Rank) in your training loop must be instantiated in exactly the same way - with the same initial conditions. However, only rank 0 process gets access to the `wandb.run` object, and for non-zero rank processes: `wandb.run = None`. This could cause your non-zero processes to fail. Such a situation can put you in a **deadlock** because rank 0 process will wait for the non-zero rank processes to join, which have already crashed. +Lightning requires each GPU (or rank) in your training loop to be instantiated in exactly the same way, with the same initial conditions. However, only the rank 0 process gets access to the `wandb.run` object. For non-zero rank processes, `wandb.run = None`. This can cause your non-zero processes to fail. Such a situation can put you in a deadlock because the rank 0 process waits for the non-zero rank processes to join, which have already crashed. -For this reason, be careful about how we set up your training code. The recommended way to set it up would be to have your code be independent of the `wandb.run` object. +For this reason, be careful about how you set up your training code. The recommended approach is to make your code independent of the `wandb.run` object. ```python class MNISTClassifier(pl.LightningModule): @@ -552,7 +567,7 @@ def main(): val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False, num_workers=4) model = MNISTClassifier() - wandb_logger = WandbLogger(project="") + wandb_logger = WandbLogger(project="[PROJECT-NAME]") callbacks = [ ModelCheckpoint( dirpath="checkpoints", @@ -569,17 +584,17 @@ def main(): ## Examples -You can follow along in a [video tutorial with a Colab notebook](https://wandb.me/lit-colab). +For an end-to-end walkthrough, you can follow along in a [video tutorial with a Colab notebook](https://wandb.me/lit-colab). ## Frequently asked questions ### How does W&B integrate with Lightning? -The core integration is based on the [Lightning `loggers` API](https://lightning.ai/docs/pytorch/stable/extensions/logging.html), which lets you write much of your logging code in a framework-agnostic way. `Logger`s are passed to the [Lightning `Trainer`](https://lightning.ai/docs/pytorch/stable/common/trainer.html) and are triggered based on that API's rich [hook-and-callback system](https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html). This keeps your research code well-separated from engineering and logging code. +The core integration is based on the [Lightning `loggers` API](https://lightning.ai/docs/pytorch/stable/extensions/logging.html), which lets you write much of your logging code in a framework-independent way. `Logger` instances are passed to the [Lightning `Trainer`](https://lightning.ai/docs/pytorch/stable/common/trainer.html) and are triggered based on that API's rich [hook-and-callback system](https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html). This keeps your research code well separated from engineering and logging code. ### What does the integration log without any additional code? -We'll save your model checkpoints to W&B, where you can view them or download them for use in future runs. We'll also capture [system metrics](/models/ref/python/experiments/system-metrics), like GPU usage and network I/O, environment information, like hardware and OS information, [code state](/models/app/features/panels/code/) (including git commit and diff patch, notebook contents and session history), and anything printed to the standard out. +W&B saves your model checkpoints, where you can view them or download them for use in future runs. W&B also captures [system metrics](/models/ref/python/experiments/system-metrics), like GPU usage and network I/O. It captures environment information, like hardware and OS information. It captures [code state](/models/app/features/panels/code/), including Git commit and diff patch, notebook contents, and session history. It also captures anything printed to standard out. ### What if I need to use `wandb.run` in my training setup? diff --git a/models/integrations/metaflow.mdx b/models/integrations/metaflow.mdx index abffda3c2b..941d7b29bd 100644 --- a/models/integrations/metaflow.mdx +++ b/models/integrations/metaflow.mdx @@ -1,6 +1,7 @@ --- description: "Integrate W&B with Metaflow to track experiments and manage ML workflows with automatic metric and artifact logging." title: Metaflow +keywords: ["@step decorator", "metaflow card", "flow tracking"] --- import ApiKeyCreateStreamlined from "/snippets/_includes/api-key-create-streamlined.mdx"; @@ -9,19 +10,23 @@ import ApiKeyCreateStreamlined from "/snippets/_includes/api-key-create-streamli [Metaflow](https://docs.metaflow.org) is a framework created by Netflix for creating and running ML workflows. -This integration lets users apply decorators to Metaflow [steps and flows](https://docs.metaflow.org/metaflow/basics) to automatically log parameters and artifacts to W&B. +This integration lets you apply decorators to Metaflow [steps and flows](https://docs.metaflow.org/metaflow/basics) to automatically log parameters and artifacts to W&B, so you can track experiments and inspect lineage across the workflows you build with Metaflow without writing custom logging code: -* Decorating a step will turn logging off or on for certain types within that step. -* Decorating the flow will turn logging off or on for every step in the flow. +* Decorating a step turns logging off or on for certain types within that step. +* Decorating the flow turns logging off or on for every step in the flow. ## Quickstart +The following sections walk you through authenticating with W&B, installing the required libraries, and adding the `wandb_log` decorator to your Metaflow steps and flows. + ### Sign up and create an API key An API key authenticates your machine to W&B. You can generate an API key from your user profile. +To find your API key in the W&B app: + 1. Click your user profile icon in the upper right corner. 1. Select **User Settings**, then scroll to the **API Keys** section. @@ -35,18 +40,18 @@ For `wandb` version 0.19.8 or below, install `fastcore` version 1.8.0 or below ( - + 1. Set the `WANDB_API_KEY` [environment variable](/models/track/environment-variables/) to your API key. ```bash - export WANDB_API_KEY= + export WANDB_API_KEY=[YOUR-API-KEY] ``` 1. Install the `wandb` library and log in. - ```shell + ```bash pip install -Uqqq metaflow "plum-dispatch<3.0.0" wandb wandb login @@ -62,7 +67,7 @@ wandb.login() ``` -```notebook +```python !pip install -Uqqq metaflow "plum-dispatch<3.0.0" wandb import wandb @@ -77,7 +82,7 @@ wandb.login() Decorating a step turns logging off or on for certain types within that step. -In this example, all datasets and models in `start` will be logged +In this example, the integration logs all datasets and models in `start`: ```python from wandb.integration.metaflow import wandb_log @@ -94,7 +99,7 @@ class WandbExampleFlow(FlowSpec): Decorating a flow is equivalent to decorating all the constituent steps with a default. -In this case, all steps in `WandbExampleFlow` default to logging datasets and models by default, just like decorating each step with `@wandb_log(datasets=True, models=True)` +In this case, all steps in `WandbExampleFlow` default to logging datasets and models by default, equivalent to decorating each step with `@wandb_log(datasets=True, models=True)`: ```python from wandb.integration.metaflow import wandb_log @@ -108,7 +113,7 @@ class WandbExampleFlow(FlowSpec): self.next(self.transform) ``` - + Decorating the flow is equivalent to decorating all steps with a default. That means if you later decorate a Step with another `@wandb_log`, it overrides the flow-level decoration. In this example: @@ -147,7 +152,7 @@ class WandbExampleFlow(FlowSpec): ## Access your data programmatically -You can access the information we've captured in three ways: inside the original Python process being logged using the [`wandb` client library](/models/ref/python/), with the [web app UI](/models/track/workspaces/), or programmatically using [our Public API](/models/ref/python/public-api/). `Parameter`s are saved to W&B's [`config`](/models/) and can be found in the [Overview tab](/models/runs/#overview-tab). `datasets`, `models`, and `others` are saved to [W&B Artifacts](/models/artifacts/) and can be found in the [Artifacts tab](/models/runs/#artifacts-tab). Base python types are saved to W&B's [`summary`](/models/) dict and can be found in the Overview tab. See our [guide to the Public API](/models/track/public-api-guide/) for details on using the API to get this information programmatically from outside . +Once your flows and steps are decorated, runs send parameters and artifacts to W&B each time the flow executes. You can access the captured information in three ways: inside the original Python process being logged using the [`wandb` client library](/models/ref/python/), with the [web app UI](/models/track/workspaces/), or programmatically using the [Public API](/models/ref/python/public-api/). `Parameter`s are saved to the W&B [`config`](/models/) and can be found in the [Overview tab](/models/runs/#overview-tab). `datasets`, `models`, and `others` are saved to [W&B Artifacts](/models/artifacts/) and can be found in the [Artifacts tab](/models/runs/#artifacts-tab). Base python types are saved to the W&B [`summary`](/models/) dict and can be found in the Overview tab. See the [guide to the Public API](/models/track/public-api-guide/) for details on using the API to get this information programmatically from outside. ### Quick reference @@ -164,28 +169,30 @@ You can access the information we've captured in three ways: inside the original | `datasets` |
  • True: Log instance variables that are a dataset
  • False
| | `models` |
  • True: Log instance variables that are a model
  • False
| | `others` |
  • True: Log anything else that is serializable as a pickle
  • False
| -| `settings` |
  • wandb.Settings(...): Specify your own wandb settings for this step or flow
  • None: Equivalent to passing wandb.Settings()

By default, if:

  • settings.run_group is None, it will be set to \{flow_name\}/\{run_id\}
  • settings.run_job_type is None, it will be set to \{run_job_type\}/\{step_name\}
| +| `settings` |
  • wandb.Settings(...): Specify your own wandb settings for this step or flow
  • None: Equivalent to passing wandb.Settings()

By default, if:

  • settings.run_group is None, it's set to \{flow_name\}/\{run_id\}
  • settings.run_job_type is None, it's set to \{run_job_type\}/\{step_name\}
| ## Frequently asked questions -### What exactly do you log? Do you log all instance and local variables? +The following sections answer common questions about logging behavior, supported data types, and artifact lineage. + +### What exactly do you log -`wandb_log` only logs instance variables. Local variables are NEVER logged. This is useful to avoid logging unnecessary data. +`wandb_log` only logs instance variables. Local variables are never logged. This is useful to avoid logging unnecessary data. -### Which data types get logged? +### Which data types get logged -We currently support these types: +W&B supports these types: -| Logging Setting | Type | +| Logging setting | Type | | ------------------- | --------------------------------------------------------------------------------------------------------------------------- | | default (always on) |
  • dict, list, set, str, int, float, bool
| | `datasets` |
  • pd.DataFrame
  • pathlib.Path
| | `models` |
  • nn.Module
  • sklearn.base.BaseEstimator
| | `others` | | -### How can I configure logging behavior? +### Configure logging behavior -| Kind of Variable | behavior | Example | Data Type | +| Kind of variable | Behavior | Example | Data type | | ---------------- | ------------------------------ | --------------- | -------------- | | Instance | Auto-logged | `self.accuracy` | `float` | | Instance | Logged if `datasets=True` | `self.df` | `pd.DataFrame` | @@ -193,8 +200,8 @@ We currently support these types: | Local | Never logged | `accuracy` | `float` | | Local | Never logged | `df` | `pd.DataFrame` | -### Is artifact lineage tracked? +### Artifact lineage tracking -Yes. If you have an artifact that is an output of step A and an input to step B, we automatically construct the lineage DAG for you. +If you have an artifact that is an output of step A and an input to step B, W&B automatically constructs the lineage directed acyclic graph (DAG) for you. -For an example of this behavior, please see this [notebook](https://colab.research.google.com/drive/1wZG-jYzPelk8Rs2gIM3a71uEoG46u_nG#scrollTo=DQQVaKS0TmDU) and its corresponding [W&B Artifacts page](https://wandb.ai/megatruong/metaflow_integration/artifacts/dataset/raw_df/7d14e6578d3f1cfc72fe/graph) +For an example of this behavior, see this [Metaflow integration example notebook](https://colab.research.google.com/drive/1wZG-jYzPelk8Rs2gIM3a71uEoG46u_nG#scrollTo=DQQVaKS0TmDU) and its corresponding [Artifacts page](https://wandb.ai/megatruong/metaflow_integration/artifacts/dataset/raw_df/7d14e6578d3f1cfc72fe/graph). diff --git a/models/integrations/mmengine.mdx b/models/integrations/mmengine.mdx index e441a1eb05..b5d3e01cfd 100644 --- a/models/integrations/mmengine.mdx +++ b/models/integrations/mmengine.mdx @@ -1,13 +1,17 @@ --- title: MMEngine description: "Use W&B with OpenMMLab's MMEngine through the WandbVisBackend to log training metrics, configs, and visual records." +keywords: ["mmdetection", "mmsegmentation", "Runner visualizer"] --- -MMEngine by [OpenMMLab](https://github.com/open-mmlab) is a foundational library for training deep learning models based on PyTorch. MMEngine implements a next-generation training architecture for the OpenMMLab algorithm library, providing a unified execution foundation for over 30 algorithm libraries within OpenMMLab. Its core components include the training engine, evaluation engine, and module management. +This page shows you how to use W&B with OpenMMLab's MMEngine to track and visualize training runs. It's for users who train deep learning models with MMEngine or OpenMMLab computer vision libraries and want to log metrics, configs, and visualizations to W&B. -[W&B](https://wandb.ai/site) is directly integrated into MMEngine through a dedicated [`WandbVisBackend`](https://mmengine.readthedocs.io/en/latest/api/generated/mmengine.visualization.WandbVisBackend.html#mmengine.visualization.WandbVisBackend) that can be used to -- log training and evaluation metrics. -- log and manage experiment configs. -- log additional records such as graph, images, scalars, etc. +MMEngine by [OpenMMLab](https://github.com/open-mmlab) is a foundational library for training deep learning models based on PyTorch. MMEngine implements a training architecture for the OpenMMLab algorithm library, providing a unified execution foundation for over 30 algorithm libraries within OpenMMLab. Its core components include the training engine, evaluation engine, and module management. + +[W&B](https://wandb.ai/site) is directly integrated into MMEngine through a dedicated [`WandbVisBackend`](https://mmengine.readthedocs.io/en/latest/api/generated/mmengine.visualization.WandbVisBackend.html#mmengine.visualization.WandbVisBackend) that you can use to: + +- Log training and evaluation metrics. +- Log and manage experiment configs. +- Log additional records such as graphs, images, and scalars. ## Get started @@ -15,12 +19,12 @@ Install `openmim` and `wandb`. -``` bash +```bash pip install -q -U openmim wandb ``` -``` bash +```bash !pip install -q -U openmim wandb ``` @@ -30,22 +34,22 @@ Next, install `mmengine` and `mmcv` using `mim`. -``` bash +```bash mim install -q mmengine mmcv ``` -``` bash +```bash !mim install -q mmengine mmcv ``` -## Use the `WandbVisBackend` with MMEngine Runner +## Use the `WandbVisBackend` with MMEngine runner -This section demonstrates a typical workflow using `WandbVisBackend` using [`mmengine.runner.Runner`](https://mmengine.readthedocs.io/en/latest/api/generated/mmengine.runner.Runner.html#mmengine.runner.Runner). +This section demonstrates a typical workflow using `WandbVisBackend` with [`mmengine.runner.Runner`](https://mmengine.readthedocs.io/en/latest/api/generated/mmengine.runner.Runner.html#mmengine.runner.Runner). The visualizer wraps the W&B backend so the MMEngine runner can route logs to W&B during training. -1. Define a `visualizer` from a visualization config. +1. Define a `visualizer` from a visualization config. The visualizer is what the runner uses to dispatch logs to the configured backend. ```python from mmengine.visualization import Visualizer @@ -66,11 +70,9 @@ This section demonstrates a typical workflow using `WandbVisBackend` using [`mme visualizer = Visualizer.get_instance(**visualization_cfg) ``` - You pass a dictionary of arguments for [W&B run initialization](/models/ref/python/functions/init) input parameters to `init_kwargs`. - -2. Initialize a `runner` with the `visualizer`, and call `runner.train()`. +2. Initialize a `runner` with the `visualizer`, and call `runner.train()` to start training. The runner uses the visualizer to stream metrics and configs to W&B. ```python from mmengine.runner import Runner @@ -91,7 +93,7 @@ This section demonstrates a typical workflow using `WandbVisBackend` using [`mme ## Use the `WandbVisBackend` with OpenMMLab computer vision libraries -The `WandbVisBackend` can also be used easily to track experiments with OpenMMLab computer vision libraries such as [MMDetection](https://mmdetection.readthedocs.io/). +You can also use the `WandbVisBackend` to track experiments with OpenMMLab computer vision libraries such as [MMDetection](https://mmdetection.readthedocs.io/). The following example overrides the `vis_backends` entry from a base config so that the existing visualizer logs to W&B. ```python # inherit base configs from the default runtime configs diff --git a/models/integrations/mmf.mdx b/models/integrations/mmf.mdx index d3af30b534..3a16ddd1ed 100644 --- a/models/integrations/mmf.mdx +++ b/models/integrations/mmf.mdx @@ -1,31 +1,32 @@ --- description: "Integrate W&B with Meta AI's MMF framework to track multimodal model training experiments and log metrics." title: MMF +keywords: ["WandbLogger MMF", "multimodal fusion", "hateful memes"] --- -The `WandbLogger` class in [Meta AI's MMF](https://github.com/facebookresearch/mmf) library will enable W&B to log the training/validation metrics, system (GPU and CPU) metrics, model checkpoints and configuration parameters. +This page describes how to use the `WandbLogger` class in [Meta AI's MMF](https://github.com/facebookresearch/mmf) library to track your multimodal model training with W&B. Enabling `WandbLogger` lets you log training and validation metrics, system (GPU and CPU) metrics, model checkpoints, and configuration parameters, so you can monitor experiments and compare runs without adding custom logging code. -## Current features +## Features -The following features are currently supported by the `WandbLogger` in MMF: +The `WandbLogger` in MMF supports the following features: -* Training & Validation metrics -* Learning Rate over time -* Model Checkpoint saving to W&B Artifacts +* Training and validation metrics +* Learning rate over time +* Model checkpoint saving to W&B Artifacts * GPU and CPU system metrics * Training configuration parameters -## Config parameters +## Configuration parameters -The following options are available in MMF config to enable and customize the wandb logging: +To turn on W&B logging and customize how runs are tracked, set the following options in your MMF configuration: -``` +```yaml training: wandb: enabled: true # An entity is a username or team name where you're sending runs. - # By default it will log the run to your user account. + # By default, it logs the run to your user account. entity: null # Project name to be used while logging the experiment with wandb @@ -44,7 +45,7 @@ training: # tags: ['tag1', 'tag2'] env: - # To change the path to the directory where wandb metadata would be + # To change the path to the directory where wandb metadata is # stored (Default: env.log_dir): wandb_logdir: ${env:MMF_WANDB_LOGDIR,} ``` \ No newline at end of file diff --git a/models/integrations/nim.mdx b/models/integrations/nim.mdx index 825bedd380..c239da4595 100644 --- a/models/integrations/nim.mdx +++ b/models/integrations/nim.mdx @@ -1,22 +1,24 @@ --- title: NVIDIA NeMo Inference Microservice Deploy Job description: "Deploy a W&B model artifact to NVIDIA NeMo Inference Microservice using W&B Launch for scalable model serving." +keywords: ["NIM deploy job", "Triton server", "NeMo model format"] --- -Deploy a model artifact from W&B to a NVIDIA NeMo Inference Microservice. To do this, use W&B Launch. W&B Launch converts model artifacts to NVIDIA NeMo Model and deploys to a running NIM/Triton server. +This guide shows you how to deploy a model artifact from W&B to an NVIDIA NeMo Inference Microservice (NIM) so you can serve the model for scalable inference. To do this, use W&B Launch. W&B Launch converts model artifacts to NVIDIA NeMo Model format and deploys them to a running NIM/Triton server. This lets you take a tracked W&B model directly to a production-ready endpoint without manual conversion. -W&B Launch currently accepts the following compatible model types: +W&B Launch accepts the following compatible model types: -1. [Llama2](https://llama.meta.com/llama2/) -2. [StarCoder](https://github.com/bigcode-project/starcoder) -3. NV-GPT (coming soon) +- [Llama2](https://llama.meta.com/llama2/) +- [StarCoder](https://github.com/bigcode-project/starcoder) -Deployment time varies by model and machine type. The base Llama2-7b config takes about 1 minute on Google Cloud's `a2-ultragpu-1g`. +Deployment time varies by model and machine type. The base `Llama2-7b` config takes about 1 minute on Google Cloud's `a2-ultragpu-1g`. ## Quickstart -1. [Create a launch queue](/platform/launch/add-job-to-queue/) if you don't have one already. See an example queue config below. +Follow these steps to create a launch queue, register the deployment job, run an agent, and submit the deployment. + +1. [Create a launch queue](/platform/launch/add-job-to-queue/) if you don't have one already. The queue defines how the job runs on your GPU machine. See the following example queue configuration. ```yaml net: host @@ -27,10 +29,10 @@ Deployment time varies by model and machine type. The base Llama2-7b config take ``` - image + Launch queue configuration in the W&B UI -2. Create this job in your project: +2. Create this job in your project. This registers the deployment job code with your W&B project so Launch can run it. ```bash wandb job create -n "deploy-to-nvidia-nemo-inference-microservice" \ @@ -41,27 +43,28 @@ Deployment time varies by model and machine type. The base Llama2-7b config take git https://github.com/wandb/launch-jobs ``` -3. Launch an agent on your GPU machine: +3. Launch an agent on your GPU machine. The agent polls the queue and executes the deployment job when you submit it. ```bash wandb launch-agent -e $ENTITY -p $PROJECT -q $QUEUE ``` -4. Submit the deployment launch job with your desired configs from the [Launch UI](https://wandb.ai/launch) - 1. You can also submit via the CLI: - ```bash - wandb launch -d gcr.io/playground-111/deploy-to-nemo:latest \ - -e $ENTITY \ - -p $PROJECT \ - -q $QUEUE \ - -c $CONFIG_JSON_FNAME - ``` - - image - +4. Submit the deployment launch job with your desired configurations from the [Launch UI](https://wandb.ai/launch). You can also submit through the CLI. + + ```bash + wandb launch -d gcr.io/playground-111/deploy-to-nemo:latest \ + -e $ENTITY \ + -p $PROJECT \ + -q $QUEUE \ + -c $CONFIG_JSON_FNAME + ``` + + + Submitting a launch job from the W&B Launch UI + 5. You can track the deployment process in the Launch UI. - image + Deployment progress tracked in the Launch UI -6. Once complete, you can immediately curl the endpoint to test the model. The model name is always `ensemble`. +6. After the deployment completes, the NIM/Triton endpoint serves the model and is ready for inference requests. To test the model, `curl` the endpoint. The model name is always `ensemble`. ```bash #!/bin/bash curl -X POST "http://0.0.0.0:9999/v1/completions" \ diff --git a/models/integrations/openai-api.mdx b/models/integrations/openai-api.mdx index b96846cd43..993a93db55 100644 --- a/models/integrations/openai-api.mdx +++ b/models/integrations/openai-api.mdx @@ -1,41 +1,44 @@ --- description: "Use W&B with the OpenAI API to log and monitor chat completions, fine-tuning jobs, and token usage metrics." title: OpenAI API +keywords: ["completions API", "embeddings API", "usage tracking"] --- import { ColabLink } from '/snippets/_includes/colab-link.mdx'; -Use the W&B OpenAI API integration to log requests, responses, token counts and model metadata for all OpenAI models, including fine-tuned models. +Use the W&B OpenAI API integration to log requests, responses, token counts, and model metadata for all OpenAI models, including fine-tuned models. This guide is for developers who call the OpenAI API and want visibility into their prompts, completions, and usage without adding manual logging code. +By logging your API inputs and outputs, you can quickly evaluate the performance of different prompts, compare different model settings (such as temperature), and track other usage metrics such as token usage. See the [OpenAI fine-tuning integration](./openai-fine-tuning) to learn how to use W&B to track your fine-tuning experiments, models, and datasets and share your results with your colleagues. -Log your API inputs and outputs you can quickly evaluate the performance of difference prompts, compare different model settings (such as temperature), and track other usage metrics such as token usage. - - OpenAI API automatic logging + W&B trace view showing OpenAI API requests, responses, and token usage logged automatically ## Install OpenAI Python API library -The W&B autolog integration works with OpenAI version 0.28.1 and below. +The W&B autolog integration works with OpenAI version 0.28.1 and earlier, so you must install a compatible version before enabling autologging. -To install OpenAI Python API version 0.28.1, run: -```python +To install OpenAI Python API version 0.28.1: +```bash pip install openai==0.28.1 ``` ## Use the OpenAI Python API -### 1. Import autolog and initialise it -First, import `autolog` from `wandb.integration.openai` and initialise it. +The following steps walk you through enabling autologging, calling the OpenAI API, and viewing the resulting traces in W&B. + +### Import and initialize autolog + +First, import `autolog` from `wandb.integration.openai` and initialize it. This sets up the W&B run that captures every subsequent OpenAI API call. ```python import os @@ -45,10 +48,11 @@ from wandb.integration.openai import autolog autolog({"project": "gpt5"}) ``` -You can optionally pass a dictionary with argument that `wandb.init()` accepts to `autolog`. This includes a project name, team name, entity, and more. For more information about [`wandb.init()`](/models/ref/python/functions/init), see the API Reference Guide. +You can optionally pass a dictionary with arguments that `wandb.init()` accepts to `autolog`. This includes a project name, team name, entity, and more. For more information, see the [`wandb.init()` API reference](/models/ref/python/functions/init). + +### Call the OpenAI API -### 2. Call the OpenAI API -Each call you make to the OpenAI API is now logged to W&B automatically. +With autolog enabled, W&B logs each call you make to the OpenAI API automatically. You don't need to add any logging code to your existing API calls. ```python os.environ["OPENAI_API_KEY"] = "XXX" @@ -65,17 +69,18 @@ chat_request_kwargs = dict( response = openai.ChatCompletion.create(**chat_request_kwargs) ``` -### 3. View your OpenAI API inputs and responses +### View your OpenAI API inputs and responses -Click on the W&B [run](/models/runs/) link generated by `autolog` in **step 1**. This redirects you to your project workspace in the W&B App. +After making one or more API calls, you can inspect the captured data in the W&B App. -Select a run you created to view the trace table, trace timeline and the model architecture of the OpenAI LLM used. +Click the W&B [run](/models/runs/) link generated by `autolog`. This redirects you to your project workspace in the W&B App. + +Select a run you created to view the trace table, trace timeline, and the model architecture of the OpenAI LLM used. ## Turn off autolog -W&B recommends that you call `disable()` to close all W&B processes when you are finished using the OpenAI API. + +Call `disable()` to close all W&B processes when you're finished using the OpenAI API. This ensures that W&B flushes any pending data and doesn't capture further API calls unintentionally. ```python autolog.disable() -``` - -Now your inputs and completions will be logged to W&B, ready for analysis or to be shared with colleagues. \ No newline at end of file +``` \ No newline at end of file diff --git a/models/integrations/openai-fine-tuning.mdx b/models/integrations/openai-fine-tuning.mdx index 653bdecc3f..1dd0b65567 100644 --- a/models/integrations/openai-fine-tuning.mdx +++ b/models/integrations/openai-fine-tuning.mdx @@ -1,15 +1,16 @@ --- description: "Fine-tune OpenAI models with W&B to log training metrics, monitor jobs, and compare model performance over time." -title: OpenAI Fine-Tuning +title: OpenAI fine-tuning +keywords: ["ft:gpt", "fine-tuning job events", "OpenAI evals"] --- import { ColabLink } from '/snippets/_includes/colab-link.mdx'; -Log your OpenAI GPT-3.5 or GPT-4 model's fine-tuning metrics and configuration to W&B. Utilize the W&B ecosystem to track your fine-tuning experiments, models, and datasets and share your results with your colleagues. +This page shows you how to log your OpenAI GPT-3.5 or GPT-4 model's fine-tuning metrics and configuration to W&B. By integrating W&B with OpenAI's fine-tuning API, you can track your fine-tuning experiments, models, and datasets in one place and share results with your colleagues. This guide is for ML practitioners who fine-tune OpenAI models and want centralized experiment tracking and version control over the training data and resulting models. -See the [OpenAI documentation](https://platform.openai.com/docs/guides/fine-tuning/which-models-can-be-fine-tuned) for a list of models that you can fine tune. +See the [OpenAI documentation](https://platform.openai.com/docs/guides/fine-tuning/which-models-can-be-fine-tuned) for a list of models that you can fine-tune. See the [W&B Integration](https://developers.openai.com/cookbook/examples/third_party/gpt_finetuning_with_wandb) section in the OpenAI documentation for supplemental information on how to integrate W&B with OpenAI for fine-tuning. @@ -17,23 +18,23 @@ See the [W&B Integration](https://developers.openai.com/cookbook/examples/third_ ## Install or update OpenAI Python API -The W&B OpenAI fine-tuning integration works with OpenAI version 1.0 and above. See the PyPI documentation for the latest version of the [OpenAI Python API](https://pypi.org/project/openai/) library. +Before you sync fine-tuning results, make sure you have a compatible version of the OpenAI Python client installed. The W&B OpenAI fine-tuning integration works with OpenAI version 1.0 or later. See the PyPI documentation for the latest version of the [OpenAI Python API](https://pypi.org/project/openai/) library. -To install OpenAI Python API, run: -```python +To install the OpenAI Python API, run: +```bash pip install openai ``` -If you already have OpenAI Python API installed, you can update it with: -```python +If you already have the OpenAI Python API installed, update it with: +```bash pip install -U openai ``` ## Sync your OpenAI fine-tuning results -Integrate W&B with OpenAI's fine-tuning API to log your fine-tuning metrics and configuration to W&B. To do this, use the `WandbLogger` class from the `wandb.integration.openai.fine_tuning` module. +This section shows you how to send the metrics and configuration from an OpenAI fine-tuning job to W&B so you can review them alongside your other experiments. To do this, use the `WandbLogger` class from the `wandb.integration.openai.fine_tuning` module. ```python @@ -50,7 +51,7 @@ WandbLogger.sync(fine_tune_job_id=FINETUNE_JOB_ID) ### Sync your fine-tunes -Sync your results from your script +Sync your results from your script. The following example shows both the minimal one-line call and the full set of optional parameters you can pass to control how the sync behaves. ```python @@ -74,24 +75,28 @@ WandbLogger.sync( ### Reference +The following table describes each argument accepted by `WandbLogger.sync`. + | Argument | Description | | ------------------------ | ------------------------------------------------------------------------------------------------------------------------- | -| fine_tune_job_id | This is the OpenAI Fine-Tune ID which you get when you create your fine-tune job using `client.fine_tuning.jobs.create`. If this argument is None (default), all the OpenAI fine-tune jobs that haven't already been synced will be synced to W&B. | -| openai_client | Pass an initialized OpenAI client to `sync`. If no client is provided, one is initialized by the logger itself. By default it is None. | -| num_fine_tunes | If no ID is provided, then all the unsynced fine-tunes will be logged to W&B. This argument allows you to select the number of recent fine-tunes to sync. If num_fine_tunes is 5, it selects the 5 most recent fine-tunes. | -| project | W&B project name where your fine-tune metrics, models, data, etc. will be logged. By default, the project name is "OpenAI-Fine-Tune." | -| entity | W&B Username or team name where you're sending runs. By default, your default entity is used, which is usually your username. | -| overwrite | Forces logging and overwrite existing wandb run of the same fine-tune job. By default this is False. | -| wait_for_job_success | Once an OpenAI fine-tuning job is started it usually takes a bit of time. To ensure that your metrics are logged to W&B as soon as the fine-tune job is finished, this setting will check every 60 seconds for the status of the fine-tune job to change to `succeeded`. Once the fine-tune job is detected as being successful, the metrics will be synced automatically to W&B. Set to True by default. | -| model_artifact_name | The name of the model artifact that is logged. Defaults to `"model-metadata"`. | -| model_artifact_type | The type of the model artifact that is logged. Defaults to `"model"`. | -| \*\*kwargs_wandb_init | Aany additional argument passed directly to [`wandb.init()`](/models/ref/python/functions/init) | +| `fine_tune_job_id` | The OpenAI fine-tune ID you get when you create your fine-tune job with `client.fine_tuning.jobs.create`. If this argument is `None` (default), W&B syncs all OpenAI fine-tune jobs that haven't already been synced. | +| `openai_client` | Pass an initialized OpenAI client to `sync`. If you don't provide a client, the logger initializes one. The default is `None`. | +| `num_fine_tunes` | If you don't provide an ID, W&B logs all unsynced fine-tunes. This argument lets you select the number of recent fine-tunes to sync. If `num_fine_tunes` is 5, W&B selects the 5 most recent fine-tunes. | +| `project` | W&B project name where W&B logs your fine-tune metrics, models, data, and so on. By default, the project name is `"OpenAI-Fine-Tune"`. | +| `entity` | W&B username or team name where you send runs. By default, W&B uses your default entity, which is usually your username. | +| `overwrite` | Forces logging and overwrites the existing `wandb` run for the same fine-tune job. The default is `False`. | +| `wait_for_job_success` | An OpenAI fine-tuning job takes some time after it starts. To ensure that W&B logs your metrics as soon as the fine-tune job finishes, this setting checks every 60 seconds for the fine-tune job status to change to `succeeded`. Once the fine-tune job succeeds, W&B syncs the metrics automatically. The default is `True`. | +| `model_artifact_name` | The name of the logged model artifact. Defaults to `"model-metadata"`. | +| `model_artifact_type` | The type of the logged model artifact. Defaults to `"model"`. | +| `**kwargs_wandb_init` | Any additional argument passed directly to [`wandb.init()`](/models/ref/python/functions/init). | ## Dataset versioning and visualization +When you sync a fine-tuning job, W&B also captures the training and validation data so you can version it and explore it interactively. The following subsections describe what W&B tracks and how to view it. + ### Versioning -The training and validation data that you upload to OpenAI for fine-tuning are automatically logged as W&B Artifacts for easier version control. Below is an view of the training file in Artifacts. Here you can see the W&B run that logged this file, when it was logged, what version of the dataset this is, the metadata, and DAG lineage from the training data to the trained model. +The training and validation data that you upload to OpenAI for fine-tuning are automatically logged as W&B Artifacts for easier version control. The following image shows a view of the training file in Artifacts. You can see the W&B run that logged this file, when it was logged, the version of the dataset, the metadata, and DAG lineage from the training data to the trained model. W&B Artifacts with training datasets @@ -99,7 +104,7 @@ The training and validation data that you upload to OpenAI for fine-tuning are a ### Visualization -The datasets are visualized as W&B Tables, which allows you to explore, search, and interact with the dataset. Check out the training samples visualized using W&B Tables below. +W&B visualizes the datasets as W&B Tables, which lets you explore, search, and interact with the dataset. The following image shows training samples visualized in W&B Tables. OpenAI data @@ -108,9 +113,9 @@ The datasets are visualized as W&B Tables, which allows you to explore, search, ## The fine-tuned model and model versioning -OpenAI gives you an id of the fine-tuned model. Since we don't have access to the model weights, the `WandbLogger` creates a `model_metadata.json` file with all the details (hyperparameters, data file ids, etc.) of the model along with the `fine_tuned_model`` id and is logged as a W&B Artifact. +OpenAI doesn't expose the underlying weights of a fine-tuned model, so W&B tracks the model by capturing its metadata instead. OpenAI gives you an ID of the fine-tuned model. Since you don't have access to the model weights, the `WandbLogger` creates a `model_metadata.json` file with all the details (hyperparameters, data file IDs, and so on) of the model along with the `fine_tuned_model` ID, and logs it as a W&B Artifact. -This model (metadata) artifact can further be linked to a model in the [W&B Registry](/models/registry/). +You can link this model (metadata) artifact to a model in the [W&B Registry](/models/registry/). OpenAI model metadata @@ -119,58 +124,60 @@ This model (metadata) artifact can further be linked to a model in the [W&B Regi ## Frequently asked questions -### How do I share my fine-tune results with my team in W&B? +The following sections answer common questions about sharing, organizing, and recovering fine-tuning runs synced from OpenAI. + +### Share fine-tune results with your team Log your fine-tune jobs to your team account with: ```python -WandbLogger.sync(entity="YOUR_TEAM_NAME") +WandbLogger.sync(entity="[YOUR-TEAM-NAME]") ``` -### How can I organize my runs? +### Organize your runs -Your W&B runs are automatically organized and can be filtered/sorted based on any configuration parameter such as job type, base model, learning rate, training filename and any other hyper-parameter. +W&B automatically organizes your runs. You can filter and sort them based on any configuration parameter such as job type, base model, learning rate, training filename, and any other hyperparameter. -In addition, you can rename your runs, add notes or create tags to group them. +You can also rename your runs, add notes, or create tags to group them. -Once you’re satisfied, you can save your workspace and use it to create report, importing data from your runs and saved artifacts (training/validation files). +Once you're satisfied, save your workspace and use it to create a report, importing data from your runs and saved artifacts (training and validation files). -### How can I access my fine-tuned model? +### Access your fine-tuned model -Fine-tuned model ID is logged to W&B as artifacts (`model_metadata.json`) as well config. +W&B logs the fine-tuned model ID as artifacts (`model_metadata.json`) and as config. ```python import wandb -with wandb.init(project="OpenAI-Fine-Tune", entity="YOUR_TEAM_NAME") as run: - ft_artifact = run.use_artifact("ENTITY/PROJECT/model_metadata:VERSION") +with wandb.init(project="OpenAI-Fine-Tune", entity="[YOUR-TEAM-NAME]") as run: + ft_artifact = run.use_artifact("[ENTITY]/[PROJECT]/model_metadata:[VERSION]") artifact_dir = ft_artifact.download() ``` -where `VERSION` is either: +The `[VERSION]` placeholder is one of the following: -* a version number such as `v2` -* the fine-tune id such as `ft-xxxxxxxxx` -* an alias added automatically such as `latest` or manually +* A version number such as `v2`. +* The fine-tune ID such as `ft-xxxxxxxxx`. +* An alias added automatically such as `latest`, or added manually. -You can then access `fine_tuned_model` id by reading the downloaded `model_metadata.json` file. +You can then access the `fine_tuned_model` ID by reading the downloaded `model_metadata.json` file. -### What if a fine-tune was not synced successfully? +### Recover a fine-tune that didn't sync -If a fine-tune was not logged to W&B successfully, you can use the `overwrite=True` and pass the fine-tune job id: +If a fine-tune wasn't logged to W&B successfully, use `overwrite=True` and pass the fine-tune job ID: ```python WandbLogger.sync( - fine_tune_job_id="FINE_TUNE_JOB_ID", + fine_tune_job_id="[FINE-TUNE-JOB-ID]", overwrite=True, ) ``` -### Can I track my datasets and models with W&B? +### Track datasets and models with W&B -The training and validation data are logged automatically to W&B as artifacts. The metadata including the ID for the fine-tuned model is also logged as artifacts. +W&B logs the training and validation data automatically as artifacts. W&B also logs the metadata, including the ID for the fine-tuned model, as artifacts. -You can always control the pipeline using low level wandb APIs like `wandb.Artifact`, `wandb.Run.log`, etc. This will allow complete traceability of your data and models. +You can also control the pipeline with low-level `wandb` APIs like `wandb.Artifact`, `wandb.Run.log`, and so on. This gives you full traceability of your data and models. OpenAI tracking FAQ @@ -178,6 +185,8 @@ You can always control the pipeline using low level wandb APIs like `wandb.Artif ## Resources -* [OpenAI Fine-tuning Documentation](https://platform.openai.com/docs/guides/fine-tuning/) is very thorough and contains many useful tips -* [Demo Colab](https://wandb.me/openai-colab) -* [How to Fine-Tune Your OpenAI GPT-3.5 and GPT-4 Models with W&B](https://wandb.me/openai-report) report \ No newline at end of file +For deeper background and end-to-end examples, see the following resources. + +* [OpenAI Fine-tuning Documentation](https://platform.openai.com/docs/guides/fine-tuning/) for thorough guidance and tips. +* [Demo Colab](https://wandb.me/openai-colab). +* [How to Fine-Tune Your OpenAI GPT-3.5 and GPT-4 Models with W&B](https://wandb.me/openai-report) report. \ No newline at end of file diff --git a/models/integrations/openai-gym.mdx b/models/integrations/openai-gym.mdx index baccd864a8..1cef5af40d 100644 --- a/models/integrations/openai-gym.mdx +++ b/models/integrations/openai-gym.mdx @@ -1,19 +1,22 @@ --- description: "Integrate W&B with OpenAI Gym to track reinforcement learning experiments and record episode performance videos." title: OpenAI Gym +keywords: ["classic gym", "gym Monitor", "RL episode video"] --- "The team that has been maintaining Gym since 2021 has moved all future development to [Gymnasium](https://github.com/Farama-Foundation/Gymnasium), a drop in replacement for Gym (import gymnasium as gym), and Gym will not be receiving any future updates." ([Source](https://github.com/openai/gym#the-team-that-has-been-maintaining-gym-since-2021-has-moved-all-future-development-to-gymnasium-a-drop-in-replacement-for-gym-import-gymnasium-as-gym-and-gym-will-not-be-receiving-any-future-updates-please-switch-over-to-gymnasium-as-soon-as-youre-able-to-do-so-if-youd-like-to-read-more-about-the-story-behind-this-switch-please-check-out-this-blog-post)) -Since Gym is no longer an actively maintained project, try out our integration with Gymnasium. +Since Gym is no longer a maintained project, try out the integration with Gymnasium. -If you're using [OpenAI Gym](https://github.com/openai/gym), W&B automatically logs videos of your environment generated by `gym.wrappers.Monitor`. Just set the `monitor_gym` keyword argument to [`wandb.init()`](/models/ref/python/functions/init) to `True` or call `wandb.gym.monitor()`. +This page describes how to use W&B with [OpenAI Gym](https://github.com/openai/gym) to automatically capture videos of your reinforcement learning environments. You can then review agent behavior alongside your experiment metrics in W&B. -Our gym integration is very light. We simply [look at the name of the video file](https://github.com/wandb/wandb/blob/master/wandb/integration/gym/__init__.py#L15) being logged from `gym` and name it after that or fall back to `"videos"` if we don't find a match. If you want more control, you can always just manually [log a video](/models/track/log/media/). +If you're using OpenAI Gym, W&B automatically logs videos of your environment generated by `gym.wrappers.Monitor`. To enable this, set the `monitor_gym` keyword argument to [`wandb.init()`](/models/ref/python/functions/init) to `True`, or call `wandb.gym.monitor()`. -The [OpenRL Benchmark](https://wandb.me/openrl-benchmark-report) by[ CleanRL](https://github.com/vwxyzjn/cleanrl) uses this integration for its OpenAI Gym examples. You can find source code (including [the specific code used for specific runs](https://wandb.ai/cleanrl/cleanrl.benchmark/runs/2jrqfugg/code?workspace=user-costa-huang)) that demonstrates how to use gym with +The gym integration is lightweight. It [looks at the name of the video file](https://github.com/wandb/wandb/blob/master/wandb/integration/gym/__init__.py#L15) being logged from `gym` and names it after that, or falls back to `"videos"` if it doesn't find a match. If you want more control, you can manually [log a video](/models/track/log/media/). + +The [OpenRL Benchmark](https://wandb.me/openrl-benchmark-report) by[ CleanRL](https://github.com/vwxyzjn/cleanrl) uses this integration for its OpenAI Gym examples. You can find source code (including [the code used for specific runs](https://wandb.ai/cleanrl/cleanrl.benchmark/runs/2jrqfugg/code?workspace=user-costa-huang)) that demonstrates how to use gym with OpenAI Gym dashboard diff --git a/models/integrations/paddledetection.mdx b/models/integrations/paddledetection.mdx index 5770315782..b9bcc8c169 100644 --- a/models/integrations/paddledetection.mdx +++ b/models/integrations/paddledetection.mdx @@ -1,6 +1,7 @@ --- description: "Integrate W&B with PaddleDetection to track object detection model training, log metrics, and visualize results." title: PaddleDetection +keywords: ["PP-YOLO", "PaddlePaddle", "detection model zoo"] --- import { ColabLink } from '/snippets/_includes/colab-link.mdx'; @@ -8,13 +9,13 @@ import ApiKeyCreateStreamlined from "/snippets/_includes/api-key-create-streamli -[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) is an end-to-end object-detection development kit based on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle). It detects various mainstream objects, segments instances, and tracks and detects keypoints using configurable modules such as network components, data augmentations, and losses. +[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) is an end-to-end object-detection development kit based on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle). It detects mainstream objects, segments instances, and tracks and detects keypoints using configurable modules such as network components, data augmentations, and losses. -PaddleDetection now includes a built-in W&B integration which logs all your training and validation metrics, as well as your model checkpoints and their corresponding metadata. +PaddleDetection includes a built-in W&B integration that logs all your training and validation metrics, as well as your model checkpoints and their corresponding metadata. By following this guide, you enable the PaddleDetection `WandbLogger` so that W&B automatically tracks your object detection experiments, making it easier to compare runs, monitor progress, and reproduce results. The PaddleDetection `WandbLogger` logs your training and evaluation metrics to W&B as well as your model checkpoints while training. -[Read a W&B blog post](https://wandb.ai/manan-goel/PaddleDetectionYOLOX/reports/Object-Detection-with-PaddleDetection-and-W-B--VmlldzoyMDU4MjY0) which illustrates how to integrate a YOLOX model with PaddleDetection on a subset of the `COCO2017` dataset. +For a worked example, [read a W&B blog post](https://wandb.ai/manan-goel/PaddleDetectionYOLOX/reports/Object-Detection-with-PaddleDetection-and-W-B--VmlldzoyMDU4MjY0) that illustrates how to integrate a YOLOX model with PaddleDetection on a subset of the `COCO2017` dataset. ## Sign up and create an API key @@ -34,14 +35,12 @@ To install the `wandb` library locally and log in: 1. Set the `WANDB_API_KEY` [environment variable](/models/track/environment-variables/) to your API key. ```bash - export WANDB_API_KEY= + export WANDB_API_KEY=[YOUR-API-KEY] ``` 1. Install the `wandb` library and log in. - - - ```shell + ```bash pip install wandb wandb login @@ -68,35 +67,37 @@ wandb.login() ## Activate the `WandbLogger` in your training script +With the `wandb` library installed and your machine authenticated, turn on the `WandbLogger` for your PaddleDetection training job. You can do this either through command-line arguments or by editing your config file. + -To use wandb via arguments to `train.py` in [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/): +To use `wandb` through arguments to `train.py` in [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/): -* Add the `--use_wandb` flag -* The first wandb arguments must be preceded by `-o` (you only need to pass this once) -* Each individual argument must contain the prefix `"wandb-"` . For example any argument to be passed to [`wandb.init()`](/models/ref/python/functions/init) would get the `wandb-` prefix +* Add the `--use_wandb` flag. +* The first `wandb` arguments must be preceded by `-o` (you only need to pass this once). +* Each individual argument must contain the prefix `wandb-`. For example, any argument to pass to [`wandb.init()`](/models/ref/python/functions/init) gets the `wandb-` prefix. -```shell -python tools/train.py - -c config.yml \ +```bash +python tools/train.py \ + -c config.yml \ --use_wandb \ - -o \ + -o \ wandb-project=MyDetector \ wandb-entity=MyTeam \ wandb-save_dir=./logs ``` -Add the wandb arguments to the config.yml file under the `wandb` key: +Alternatively, you can configure the integration declaratively. Add the `wandb` arguments to the `config.yml` file under the `wandb` key: -``` +```yaml wandb: project: MyProject entity: MyTeam save_dir: ./logs ``` -When you run your `train.py` file, it generates a link to your W&B Dashboard. +When you run your `train.py` file, it generates a link to your W&B Dashboard, where you can view your training and validation metrics, model checkpoints, and run metadata in real time. A W&B Dashboard diff --git a/models/integrations/paddleocr.mdx b/models/integrations/paddleocr.mdx index 2bc8851964..a936582e4b 100644 --- a/models/integrations/paddleocr.mdx +++ b/models/integrations/paddleocr.mdx @@ -1,15 +1,18 @@ --- description: "Integrate W&B with PaddleOCR to track OCR model training, log recognition metrics, and visualize predictions." title: PaddleOCR +keywords: ["PP-OCR", "text detection", "PaddlePaddle"] --- import ApiKeyCreateStreamlined from "/snippets/_includes/api-key-create-streamlined.mdx"; -[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and apply them into practice implemented in PaddlePaddle. PaddleOCR support a variety of cutting-edge algorithms related to OCR, and developed industrial solution. PaddleOCR now comes with a W&B integration for logging training and evaluation metrics along with model checkpoints with corresponding metadata. +[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) provides multilingual, practical OCR tools that help users train models and apply them in production, implemented in PaddlePaddle. PaddleOCR supports a range of OCR algorithms and includes industrial solutions. PaddleOCR includes a W&B integration for logging training and evaluation metrics along with model checkpoints and corresponding metadata. -## Example blog & Colab +This page shows you how to enable the W&B integration in PaddleOCR so that your OCR training runs automatically stream metrics, validation results, and checkpoint metadata to a W&B dashboard. Use this integration to compare experiments, monitor training in real time, and keep a versioned history of your OCR models. -[Read here](https://wandb.ai/manan-goel/text_detection/reports/Train-and-Debug-Your-OCR-Models-with-PaddleOCR-and-W-B--VmlldzoyMDUwMDIw) to see how to train a model with PaddleOCR on the ICDAR2015 dataset. This also comes with a [Google Colab](https://colab.research.google.com/drive/1id2VTIQ5-M1TElAkzjzobUCdGeJeW-nV?usp=sharing) and the corresponding live W&B dashboard is available [here](https://wandb.ai/manan-goel/text_detection). There is also a Chinese version of this blog here: [W&B对您的OCR模型进行训练和调试](https://wandb.ai/wandb_fc/chinese/reports/W-B-OCR---VmlldzoyMDk1NzE4) +## Example blog and Colab + +See the [PaddleOCR and W&B training tutorial](https://wandb.ai/manan-goel/text_detection/reports/Train-and-Debug-Your-OCR-Models-with-PaddleOCR-and-W-B--VmlldzoyMDUwMDIw) for how to train a model with PaddleOCR on the ICDAR2015 dataset. This also comes with a [Google Colab](https://colab.research.google.com/drive/1id2VTIQ5-M1TElAkzjzobUCdGeJeW-nV?usp=sharing) and the corresponding live [W&B dashboard](https://wandb.ai/manan-goel/text_detection). A Chinese version of this blog is also available: [W&B对您的OCR模型进行训练和调试](https://wandb.ai/wandb_fc/chinese/reports/W-B-OCR---VmlldzoyMDk1NzE4). ## Sign up and create an API key @@ -25,18 +28,18 @@ An API key authenticates your machine to W&B. You can generate an API key from y To install the `wandb` library locally and log in: - + 1. Set the `WANDB_API_KEY` [environment variable](/models/track/environment-variables/) to your API key. ```bash - export WANDB_API_KEY= + export WANDB_API_KEY=[YOUR_API_KEY] ``` 1. Install the `wandb` library and log in. - ```shell + ```bash pip install wandb wandb login @@ -52,7 +55,7 @@ wandb.login() ``` -```notebook +```python !pip install wandb import wandb @@ -61,33 +64,33 @@ wandb.login() -## Add wandb to your `config.yml` file +## Add `wandb` to your `config.yml` file -PaddleOCR requires configuration variables to be provided using a yaml file. Adding the following snippet at the end of the configuration yaml file will automatically log all training and validation metrics to a W&B dashboard along with model checkpoints: +PaddleOCR requires you to provide configuration variables using a YAML file. To enable W&B logging, add the following snippet at the end of the configuration YAML file. This setting configures PaddleOCR to automatically log all training and validation metrics to a W&B dashboard along with model checkpoints: -```python +```yaml Global: use_wandb: True ``` -Any additional, optional arguments that you might like to pass to [`wandb.init()`](/models/ref/python/functions/init) can also be added under the `wandb` header in the yaml file: +You can also add any additional, optional arguments that you want to pass to [`wandb.init()`](/models/ref/python/functions/init) under the `wandb` header in the YAML file: -``` -wandb: - project: CoolOCR # (optional) this is the wandb project name +```yaml +wandb: + project: CoolOCR # (optional) this is the wandb project name entity: my_team # (optional) if you're using a wandb team, you can pass the team name here name: MyOCRModel # (optional) this is the name of the wandb run ``` ## Pass the `config.yml` file to `train.py` -The yaml file is then provided as an argument to the [training script](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.5/tools/train.py) available in the PaddleOCR repository. +With W&B logging configured, start training by passing the YAML file as an argument to the [training script](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.5/tools/train.py) available in the PaddleOCR repository. ```bash python tools/train.py -c config.yml ``` -Once you run your `train.py` file with W&B turned on, a link will be generated to bring you to your W&B dashboard: +Once you run your `train.py` file with W&B enabled, PaddleOCR generates a link to your W&B dashboard, where you can monitor training and validation metrics in real time: PaddleOCR training dashboard @@ -98,7 +101,7 @@ Once you run your `train.py` file with W&B turned on, a link will be generated t - Text Detection Model dashboard + Text detection model dashboard ## Feedback or issues diff --git a/models/integrations/prodigy.mdx b/models/integrations/prodigy.mdx index bf74d1811e..6239e7c284 100644 --- a/models/integrations/prodigy.mdx +++ b/models/integrations/prodigy.mdx @@ -1,13 +1,14 @@ --- description: "Integrate W&B with Prodigy to track annotation workflows, log training metrics, and manage labeled datasets." title: Prodigy +keywords: ["annotation tool", "prodigy train", "W&B Tables"] --- -[Prodigy](https://prodi.gy/) is an annotation tool for creating training and evaluation data for machine learning models, error analysis, data inspection & cleaning. [W&B Tables](/models/tables/tables-walkthrough/) allow you to log, visualize, analyze, and share datasets (and more!) inside W&B. +[Prodigy](https://prodi.gy/) is an annotation tool for creating training and evaluation data for machine learning models, error analysis, and data inspection and cleaning. [W&B Tables](/models/tables/tables-walkthrough/) let you log, visualize, analyze, and share datasets (and more) inside W&B. -The [W&B integration with Prodigy](https://github.com/wandb/wandb/blob/master/wandb/integration/prodigy/prodigy.py) adds simple and easy-to-use functionality to upload your Prodigy-annotated dataset directly to W&B for use with Tables. +This guide shows you how to use the [W&B integration with Prodigy](https://github.com/wandb/wandb/blob/master/wandb/integration/prodigy/prodigy.py) to upload your Prodigy-annotated dataset directly to W&B so you can explore and share it as an interactive Table. Use this when you want to inspect annotation quality, compare versions of a labeled dataset, or share results with collaborators. -Run a few lines of code, like these: +With a few lines of code, like these: ```python import wandb @@ -17,7 +18,7 @@ with wandb.init(project="prodigy"): upload_dataset("news_headlines_ner") ``` -and get visual, interactive, shareable tables like this one: +you can produce visual, interactive, shareable tables like this one: Prodigy annotation table @@ -25,10 +26,10 @@ and get visual, interactive, shareable tables like this one: ## Quickstart -Use `wandb.integration.prodigy.upload_dataset` to upload your annotated prodigy dataset directly from the local Prodigy database to W&B in our [Table](/models/ref/python/data-types/table) format. For more information on Prodigy, including installation & setup, please refer to the [Prodigy documentation](https://prodi.gy/docs/). +Use `wandb.integration.prodigy.upload_dataset` to upload your annotated Prodigy dataset directly from the local Prodigy database to W&B in the [Table](/models/ref/python/data-types/table) format. For more information about Prodigy, including installation and setup, see the [Prodigy documentation](https://prodi.gy/docs/). -W&B will automatically try to convert images and named entity fields to [`wandb.Image`](/models/ref/python/data-types/image) and [`wandb.Html`](/models/ref/python/data-types/html)respectively. Extra columns may be added to the resulting table to include these visualizations. +When you upload a dataset, W&B automatically converts images and named entity fields to [`wandb.Image`](/models/ref/python/data-types/image) and [`wandb.Html`](/models/ref/python/data-types/html) respectively, so they render as interactive visualizations in your Table. W&B may add extra columns to the resulting table to include these visualizations. ## Read through a detailed example -Explore the [Visualizing Prodigy Datasets Using W&B Tables](https://wandb.ai/kshen/prodigy/reports/Visualizing-Prodigy-Datasets-Using-W-B-Tables--Vmlldzo5NDE2MTc) for example visualizations generated with W&B Prodigy integration. \ No newline at end of file +To see what's possible with the integration, explore the [Visualizing Prodigy Datasets Using W&B Tables](https://wandb.ai/kshen/prodigy/reports/Visualizing-Prodigy-Datasets-Using-W-B-Tables--Vmlldzo5NDE2MTc) report for example visualizations generated with the W&B Prodigy integration. \ No newline at end of file diff --git a/models/integrations/pytorch-geometric.mdx b/models/integrations/pytorch-geometric.mdx index 250a5c636a..01f02f755e 100644 --- a/models/integrations/pytorch-geometric.mdx +++ b/models/integrations/pytorch-geometric.mdx @@ -1,13 +1,16 @@ --- title: PyTorch Geometric description: "Integrate W&B with PyTorch Geometric for graph visualization and experiment tracking in geometric deep learning." +keywords: ["GNN", "PyG", "node embedding"] --- import ApiKeyCreateStreamlined from "/snippets/_includes/api-key-create-streamlined.mdx"; -[PyTorch Geometric](https://github.com/pyg-team/pytorch_geometric) or PyG is one of the most popular libraries for geometric deep learning and W&B works extremely well with it for visualizing graphs and tracking experiments. +[PyTorch Geometric](https://github.com/pyg-team/pytorch_geometric) (PyG) is a library for geometric deep learning, and W&B works with it for visualizing graphs and tracking experiments. -After you have installed PyTorch Geometric, follow these steps to get started. +This guide shows you how to authenticate to W&B, install the `wandb` library, visualize PyG graphs with PyVis or Plotly, and log training metrics from your PyG workflows. It's intended for PyG users who want to track experiments and share graph visualizations in W&B. + +After you install PyTorch Geometric, follow these steps to get started. ## Sign up and create an API key @@ -27,14 +30,14 @@ To install the `wandb` library locally and log in: 1. Set the `WANDB_API_KEY` [environment variable](/models/track/environment-variables/) to your API key. ```bash - export WANDB_API_KEY= + export WANDB_API_KEY=[YOUR-API-KEY] ``` 1. Install the `wandb` library and log in. - ```shell + ```bash pip install wandb wandb login @@ -50,7 +53,7 @@ wandb.login() ``` -```notebook +```python !pip install wandb import wandb @@ -61,17 +64,19 @@ wandb.login() ## Visualize the graphs -You can save details about the input graphs including number of edges, number of nodes and more. W&B supports logging plotly charts and HTML panels so any visualizations you create for your graph can then also be logged to W&B. +After you log in, you can begin sending graph visualizations and run data to W&B. + +You can save details about the input graphs, including number of edges, number of nodes, and more. W&B supports logging Plotly charts and HTML panels, so you can also log any visualizations you create for your graph to W&B. The following sections show two common approaches: PyVis for interactive HTML visualizations and Plotly for chart-based visualizations. ### Use PyVis -The following snippet shows how you could do that with PyVis and HTML. +The following snippet shows how to do that with PyVis and HTML. ```python from pyvis.network import Network import wandb -with wandb.init(project=’graph_vis’) as run: +with wandb.init(project="graph_vis") as run: net = Network(height="750px", width="100%", bgcolor="#222222", font_color="white") # Add the edges from the PyG graph to the PyVis network @@ -95,7 +100,7 @@ with wandb.init(project=’graph_vis’) as run: ### Use Plotly -To use plotly to create a graph visualization, first you need to convert the PyG graph to a networkx object. Following this you will need to create Plotly scatter plots for both nodes and edges. The snippet below can be used for this task. +To use Plotly to create a graph visualization, first convert the PyG graph to a networkx object. Then create Plotly scatter plots for both nodes and edges. Use the following snippet for this task. ```python def create_vis(graph): @@ -140,8 +145,8 @@ def create_vis(graph): return fig -with wandb.init(project=’visualize_graph’) as run: - run.log({‘graph’: wandb.Plotly(create_vis(graph))}) +with wandb.init(project="visualize_graph") as run: + run.log({"graph": wandb.Plotly(create_vis(graph))}) ``` @@ -150,7 +155,7 @@ with wandb.init(project=’visualize_graph’) as run: ## Log metrics -You can use W&B to track your experiments and related metrics, such as loss functions, accuracy, and more. Add the following line to your training loop: +In addition to graph visualizations, you can use W&B to track your experiments and related metrics, such as loss functions, accuracy, and more. Add the following lines to your training loop: ```python with wandb.init(project="my_project", entity="my_entity") as run: @@ -166,8 +171,12 @@ with wandb.init(project="my_project", entity="my_entity") as run: hits@K metrics over epochs +With graph visualizations and training metrics logged to W&B, you can compare runs and share results from your PyG experiments in your W&B workspace. + ## More resources +The following W&B reports show PyG in action: + - [Recommending Amazon Products using Graph Neural Networks in PyTorch Geometric](https://wandb.ai/manan-goel/gnn-recommender/reports/Recommending-Amazon-Products-using-Graph-Neural-Networks-in-PyTorch-Geometric--VmlldzozMTA3MzYw#what-does-the-data-look-like?) - [Point Cloud Classification using PyTorch Geometric](https://wandb.ai/geekyrakshit/pyg-point-cloud/reports/Point-Cloud-Classification-using-PyTorch-Geometric--VmlldzozMTExMTE3) - [Point Cloud Segmentation using PyTorch Geometric](https://wandb.ai/wandb/point-cloud-segmentation/reports/Point-Cloud-Segmentation-using-Dynamic-Graph-CNN--VmlldzozMTk5MDcy) diff --git a/models/integrations/pytorch.mdx b/models/integrations/pytorch.mdx index bd5718c7e5..3f6116b605 100644 --- a/models/integrations/pytorch.mdx +++ b/models/integrations/pytorch.mdx @@ -1,6 +1,7 @@ --- title: PyTorch description: "Integrate W&B with PyTorch for experiment tracking, dataset versioning, and logging of metrics, gradients, and models." +keywords: ["torch training loop", "wandb.watch", "MNIST tutorial"] --- import { ColabLink } from '/snippets/_includes/colab-link.mdx'; @@ -14,7 +15,7 @@ Use [W&B](https://wandb.ai) for machine learning experiment tracking, dataset ve ## What this notebook covers -We show you how to integrate W&B with your PyTorch code to add experiment tracking to your pipeline. +This tutorial walks you through integrating W&B with your PyTorch training code so you can track experiments, log metrics and gradients, and version models. It's intended for PyTorch users who want to add experiment tracking to an existing pipeline. PyTorch and W&B integration diagram @@ -52,10 +53,11 @@ with wandb.init(project="new-sota-model", config=config) as run: Follow along with a [video tutorial](https://wandb.me/pytorch-video). -**Note**: Sections starting with _Step_ are all you need to integrate W&B in an existing pipeline. The rest just loads data and defines a model. +Sections starting with _Step_ are all you need to integrate W&B in an existing pipeline. The rest loads data and defines a model. ## Install, import, and log in +Before defining the experiment, set up the environment and authenticate with W&B. ```python import os @@ -85,24 +87,21 @@ torchvision.datasets.MNIST.mirrors = [mirror for mirror in torchvision.datasets. ### Step 0: Install W&B -To get started, we'll need to get the library. -`wandb` is easily installed using `pip`. +To get started, you must install the `wandb` library with `pip`. ```python !pip install wandb onnx -Uq ``` -### Step 1: Import W&B and Login +### Step 1: Import W&B and log in -In order to log data to our web service, -you'll need to log in. +To log data to the W&B service, you must log in. -If this is your first time using W&B, -you'll need to sign up for a free account at the link that appears. +If this is your first time using W&B, sign up for a free account at the link that appears. -``` +```python import wandb wandb.login() @@ -110,23 +109,17 @@ wandb.login() ## Define the experiment and pipeline +With W&B installed and your session authenticated, define the experiment configuration and the training pipeline that will use it. + ### Track metadata and hyperparameters with `wandb.init()` -Programmatically, the first thing we do is define our experiment: -what are the hyperparameters? what metadata is associated with this run? +Programmatically, define your experiment first. What are the hyperparameters? What metadata is associated with this run? -It's a pretty common workflow to store this information in a `config` dictionary -(or similar object) -and then access it as needed. +A common workflow is to store this information in a `config` dictionary (or similar object) and then access it as needed. -For this example, we're only letting a few hyperparameters vary -and hand-coding the rest. -But any part of your model can be part of the `config`. +This example varies only a few hyperparameters and hand-codes the rest. Any part of your model can be part of the `config`. -We also include some metadata: we're using the MNIST dataset and a convolutional -architecture. If we later work with, say, -fully connected architectures on CIFAR in the same project, -this will help us separate our runs. +The example also includes metadata for the MNIST dataset and a convolutional architecture. If you later work with, say, fully connected architectures on CIFAR in the same project, this metadata helps you separate your runs. ```python @@ -140,14 +133,13 @@ config = dict( architecture="CNN") ``` -Now, let's define the overall pipeline, -which is pretty typical for model-training: +Next, define the overall pipeline, which is typical for model-training: -1. we first `make` a model, plus associated data and optimizer, then -2. we `train` the model accordingly and finally +1. `make` a model, plus associated data and optimizer. +2. `train` the model accordingly. 3. `test` it to see how training went. -We'll implement these functions below. +The following code implements these functions. ```python @@ -171,26 +163,15 @@ def model_pipeline(hyperparameters): return model ``` -The only difference here from a standard pipeline -is that it all occurs inside the context of `wandb.init()`. -Calling this function sets up a line of communication -between your code and our servers. +The only difference here from a standard pipeline is that it all occurs inside the context of `wandb.init()`. Calling this function sets up a line of communication between your code and W&B servers. -Passing the `config` dictionary to `wandb.init()` -immediately logs all that information to us, -so you'll always know what hyperparameter values -you set your experiment to use. +Passing the `config` dictionary to `wandb.init()` immediately logs all that information to W&B, so you always know what hyperparameter values you set your experiment to use. -To ensure the values you chose and logged are always the ones that get used -in your model, we recommend using the `run.config` copy of your object. -Check the definition of `make` below to see some examples. +To ensure the values you chose and logged are always the ones used in your model, W&B recommends using the `run.config` copy of your object. Check the following definition of `make` to see some examples. -> *Side Note*: We take care to run our code in separate processes, -so that any issues on our end -(such as if a giant sea monster attacks our data centers) -don't crash your code. -Once the issue is resolved, such as when the Kraken returns to the deep, -you can log the data with `wandb sync`. +With the pipeline defined, the next sections implement each of its steps in turn: data and model setup, training, and testing. + +> *Side Note*: W&B runs its code in separate processes so that any issues on the W&B side don't crash your code. Once the issue is resolved, you can log the data with `wandb sync`. ```python @@ -213,11 +194,9 @@ def make(config): ### Define the data loading and model -Now, we need to specify how the data is loaded and what the model looks like. +Next, specify how the data is loaded and what the model looks like. -This part is very important, but it's -no different from what it would be without `wandb`, -so we won't dwell on it. +This part is important, but it's no different from what it would be without `wandb`. ```python @@ -241,13 +220,7 @@ def make_loader(dataset, batch_size): return loader ``` -Defining the model is normally the fun part. - -But nothing changes with `wandb`, -so we're gonna stick with a standard ConvNet architecture. - -Don't be afraid to mess around with this and try some experiments -- -all your results will be logged on [wandb.ai](https://wandb.ai). +Defining the model doesn't change with `wandb`, so this example uses a standard ConvNet architecture. Experiment freely with this code. W&B logs all your results on [wandb.ai](https://wandb.ai). @@ -279,21 +252,17 @@ class ConvNet(nn.Module): ### Define training logic -Moving on in our `model_pipeline`, it's time to specify how we `train`. +Moving on in the `model_pipeline`, it's time to specify how to `train`. This is where the W&B integration tracks gradients, parameters, and metrics as training proceeds. Two `wandb` functions come into play here: `watch` and `log`. -## Track gradients with `run.watch()` and everything else with `run.log()` +### Track gradients with `run.watch()` and everything else with `run.log()` -`run.watch()` will log the gradients and the parameters of your model, -every `log_freq` steps of training. +`run.watch()` logs the gradients and the parameters of your model every `log_freq` steps of training. All you need to do is call it before you start training. -The rest of the training code remains the same: -we iterate over epochs and batches, -running forward and backward passes -and applying our `optimizer`. +The rest of the training code remains the same: iterate over epochs and batches, run forward and backward passes, and apply your `optimizer`. ```python @@ -335,17 +304,11 @@ def train_batch(images, labels, model, optimizer, criterion): return loss ``` -The only difference is in the logging code: -where previously you might have reported metrics by printing to the terminal, -now you pass the same information to `run.log()`. +The only difference is in the logging code: where previously you might have reported metrics by printing to the terminal, now you pass the same information to `run.log()`. -`run.log()` expects a dictionary with strings as keys. -These strings identify the objects being logged, which make up the values. -You can also optionally log which `step` of training you're on. +`run.log()` expects a dictionary with strings as keys. These strings identify the objects being logged, which make up the values. You can also optionally log which `step` of training you're on. -> *Side Note*: I like to use the number of examples the model has seen, -since this makes for easier comparison across batch sizes, -but you can use raw steps or batch count. For longer training runs, it can also make sense to log by `epoch`. +> *Side Note*: Using the number of examples the model has seen makes for easier comparison across batch sizes, but you can use raw steps or batch count. For longer training runs, it can also make sense to log by `epoch`. ```python @@ -359,25 +322,17 @@ def train_log(loss, example_ct, epoch): ### Define testing logic -Once the model is done training, we want to test it: -run it against some fresh data from production, perhaps, -or apply it to some hand-curated examples. +Once the model is done training, test it: run it against some fresh data from production, perhaps, or apply it to some hand-curated examples. Testing also gives you a natural point at which to save the trained model. -## (Optional) Call `run.save()` +### Optional: Call `run.save()` -This is also a great time to save the model's architecture -and final parameters to disk. -For maximum compatibility, we'll `export` our model in the -[Open Neural Network eXchange (ONNX) format](https://onnx.ai/). +This is also a good time to save the model's architecture and final parameters to disk. For broad compatibility, `export` the model in the [Open Neural Network eXchange (ONNX) format](https://onnx.ai/). -Passing that filename to `run.save()` ensures that the model parameters -are saved to W&B's servers: no more losing track of which `.h5` or `.pb` -corresponds to which training runs. +Passing that filename to `run.save()` ensures that the model parameters are saved to W&B servers: no more losing track of which `.h5` or `.pb` corresponds to which training runs. -For more advanced `wandb` features for storing, versioning, and distributing -models, check out our [Artifacts tools](https://www.wandb.com/artifacts). +For more advanced `wandb` features for storing, versioning, and distributing models, check out [Artifacts tools](https://www.wandb.com/artifacts). ```python @@ -407,24 +362,18 @@ def test(model, test_loader): ### Run training and watch your metrics live on wandb.ai -Now that we've defined the whole pipeline and slipped in -those few lines of W&B code, -we're ready to run our fully tracked experiment. +Now that you've defined the whole pipeline and added those few lines of W&B code, you're ready to run your fully tracked experiment. -We'll report a few links to you: -our documentation, -the Project page, which organizes all the runs in a project, and -the Run page, where this run's results will be stored. +W&B reports a few links to you: the documentation, the Project page (which organizes all the runs in a project), and the Run page (where this run's results are stored). Navigate to the Run page and check out these tabs: -1. **Charts**, where the model gradients, parameter values, and loss are logged throughout training -2. **System**, which contains a variety of system metrics, including Disk I/O utilization, CPU and GPU metrics (watch that temperature soar), and more -3. **Logs**, which has a copy of anything pushed to standard out during training -4. **Files**, where, once training is complete, you can click on the `model.onnx` to view our network with the [Netron model viewer](https://github.com/lutzroeder/netron). +1. **Charts**, where the model gradients, parameter values, and loss are logged throughout training. +2. **System**, which contains system metrics including Disk I/O utilization and CPU and GPU metrics. +3. **Logs**, which has a copy of anything pushed to standard out during training. +4. **Files**, where, once training is complete, you can click the `model.onnx` to view your network with the [Netron model viewer](https://github.com/lutzroeder/netron). -Once the run in finished, when the `with wandb.init()` block exits, -we'll also print a summary of the results in the cell output. +Once the run is finished, when the `with wandb.init()` block exits, W&B also prints a summary of the results in the cell output. ```python @@ -432,25 +381,19 @@ we'll also print a summary of the results in the cell output. model = model_pipeline(config) ``` -### Test Hyperparameters with Sweeps +### Test hyperparameters with sweeps -We only looked at a single set of hyperparameters in this example. -But an important part of most ML workflows is iterating over -a number of hyperparameters. +This example only looked at a single set of hyperparameters. An important part of most ML workflows is iterating over several hyperparameters. -You can use W&B Sweeps to automate hyperparameter testing and explore the space of possible models and optimization strategies. +You can use W&B Sweeps to automate hyperparameter testing and explore the space of possible models and optimization strategies. This lets you scale beyond the preceding single-configuration run. Check out a [Colab notebook demonstrating hyperparameter optimization using W&B Sweeps](https://wandb.me/sweeps-colab). -Running a hyperparameter sweep with W&B is very easy. There are just 3 simple steps: - -1. **Define the sweep:** We do this by creating a dictionary or a [YAML file](/models/sweeps/define-sweep-configuration/) that specifies the parameters to search through, the search strategy, the optimization metric et all. +Running a hyperparameter sweep with W&B takes three steps: -2. **Initialize the sweep:** -`sweep_id = wandb.sweep(sweep_config)` - -3. **Run the sweep agent:** -`wandb.agent(sweep_id, function=train)` +1. **Define the sweep:** Create a dictionary or a [YAML file](/models/sweeps/define-sweep-configuration/) that specifies the parameters to search through, the search strategy, the optimization metric, and more. +2. **Initialize the sweep:** `sweep_id = wandb.sweep(sweep_config)`. +3. **Run the sweep agent:** `wandb.agent(sweep_id, function=train)`. That's all there is to running a hyperparameter sweep. @@ -461,10 +404,12 @@ That's all there is to running a hyperparameter sweep. ## Example gallery -Explore examples of projects tracked and visualized with W&B in our [Gallery →](https://app.wandb.ai/gallery). +Explore examples of projects tracked and visualized with W&B in the [Gallery](https://app.wandb.ai/gallery). ## Advanced setup -1. [Environment variables](/platform/hosting/env-vars/): Set API keys in environment variables so you can run training on a managed cluster. -2. [Offline mode](/support/models/articles/can-i-run-wandb-offline): Use `dryrun` mode to train offline and sync results later. -3. [On-prem](/platform/hosting/hosting-options/self-managed): Install W&B in a private cloud or air-gapped servers in your own infrastructure. We have local installations for everyone from academics to enterprise teams. -4. [Sweeps](/models/sweeps/): Set up hyperparameter search quickly with our lightweight tool for tuning. + +The following options can extend the preceding basic workflow for production, offline, or managed environments: +- [Environment variables](/platform/hosting/env-vars/): Set API keys in environment variables so you can run training on a managed cluster. +- [Offline mode](/support/models/articles/can-i-run-wandb-offline): Use `dryrun` mode to train offline and sync results later. +- [On-premises](/platform/hosting/hosting-options/self-managed): Install W&B in a private cloud or air-gapped servers in your own infrastructure. +- [Sweeps](/models/sweeps/): Set up hyperparameter search quickly with a lightweight tool for tuning. diff --git a/models/integrations/ray-tune.mdx b/models/integrations/ray-tune.mdx index c0e742e5ba..b1fe0d25f5 100644 --- a/models/integrations/ray-tune.mdx +++ b/models/integrations/ray-tune.mdx @@ -1,36 +1,36 @@ --- description: "Integrate W&B with Ray Tune to track hyperparameter tuning trials, log metrics, and compare experiment results." title: Ray Tune +keywords: ["TuneCallback", "ASHA scheduler", "ray train"] --- -W&B integrates with [Ray](https://github.com/ray-project/ray) by offering two lightweight integrations. +This page describes how to use W&B with [Ray](https://github.com/ray-project/ray) Tune so you can track hyperparameter tuning trials, log metrics, and compare experiment results across runs. W&B offers two lightweight integrations with Ray, and you can choose the one that best fits your training workflow: -- The`WandbLoggerCallback` function automatically logs metrics reported to Tune to the Wandb API. -- The `setup_wandb()` function, which can be used with the function API, automatically initializes the Wandb API with Tune's training information. You can use the Wandb API as usual. such as by using `run.log()` to log your training process. +- The `WandbLoggerCallback` function automatically logs metrics reported to Tune to the W&B API. +- The `setup_wandb()` function, which you can use with the function API, automatically initializes the W&B API with Tune's training information. You can use the W&B API as usual, such as by calling `run.log()` to log your training process. ## Configure the integration +This section describes how to configure the `WandbLoggerCallback`, which is the most direct way to send Tune trial metrics to W&B. + ```python from ray.air.integrations.wandb import WandbLoggerCallback ``` -Wandb configuration is done by passing a wandb key to the config parameter of `tune.run()` (see example below). +To configure W&B, pass a wandb key to the config parameter of `tune.run()`. See the [example](#example) for usage. -The content of the wandb config entry is passed to `wandb.init()` as keyword arguments. The exception are the following settings, which are used to configure the `WandbLoggerCallback` itself: +The integration passes the content of the wandb config entry to `wandb.init()` as keyword arguments. The exceptions are the settings that configure the `WandbLoggerCallback` itself. ### Parameters -`project (str)`: Name of the Wandb project. Mandatory. - -`api_key_file (str)`: Path to file containing the Wandb API KEY. - -`api_key (str)`: Wandb API Key. Alternative to setting `api_key_file`. +The `WandbLoggerCallback` accepts the following parameters: -`excludes (list)`: List of metrics to exclude from the log. - -`log_config (bool)`: Whether to log the config parameter of the results dictionary. Defaults to False. - -`upload_checkpoints (bool)`: If True, model checkpoints are uploaded as artifacts. Defaults to False. +- `project (str)`: Name of the W&B project. Required. +- `api_key_file (str)`: Path to file containing the W&B API key. +- `api_key (str)`: W&B API key. Alternative to setting `api_key_file`. +- `excludes (list)`: List of metrics to exclude from the log. +- `log_config (bool)`: Whether to log the config parameter of the results dictionary. Defaults to `False`. +- `upload_checkpoints (bool)`: If `True`, uploads model checkpoints as artifacts. Defaults to `False`. ### Example @@ -53,7 +53,7 @@ tuner = tune.Tuner( run_config=train.RunConfig( callbacks=[ WandbLoggerCallback( - project="", api_key="", log_config=True + project="[YOUR-PROJECT]", api_key="[YOUR-API-KEY]", log_config=True ) ] ), @@ -64,11 +64,13 @@ results = tuner.fit() ## setup_wandb +Use `setup_wandb()` when you want direct control over W&B logging from inside your training function, for example, to call `run.log()` with custom metrics alongside Tune's reporting. + ```python from ray.air.integrations.wandb import setup_wandb ``` -This utility function helps initialize Wandb for use with Ray Tune. For basic usage, call `setup_wandb()` in your training function: +This utility function helps initialize W&B for use with Ray Tune. For basic usage, call `setup_wandb()` in your training function: ```python from ray.air.integrations.wandb import setup_wandb @@ -104,7 +106,7 @@ results = tuner.fit() ## Example code -We've created a few examples for you to see how the integration works: +For end-to-end references, see the following examples that show how the integration works: -* [Colab](https://wandb.me/raytune-colab): A simple demo to try the integration. -* [Dashboard](https://wandb.ai/anmolmann/ray_tune): View dashboard generated from the example. \ No newline at end of file +* [Try the integration in Colab](https://wandb.me/raytune-colab): A demo to try the integration. +* [View the example dashboard](https://wandb.ai/anmolmann/ray_tune): View the dashboard generated from the example. \ No newline at end of file diff --git a/models/integrations/sagemaker.mdx b/models/integrations/sagemaker.mdx index 6be62e51ba..59b3704715 100644 --- a/models/integrations/sagemaker.mdx +++ b/models/integrations/sagemaker.mdx @@ -1,36 +1,37 @@ --- description: "Integrate W&B with Amazon SageMaker for experiment tracking, metric logging, and model management on AWS infrastructure." title: SageMaker +keywords: ["AWS training job", "SageMaker estimator", "hyperparameter tuning job"] --- -W&B integrates with [Amazon SageMaker](https://aws.amazon.com/sagemaker/), automatically reading hyperparameters, grouping distributed runs, and resuming runs from checkpoints. +W&B integrates with [Amazon SageMaker](https://aws.amazon.com/sagemaker/) to automatically read hyperparameters, group distributed runs, and resume runs from checkpoints. ## Authentication -W&B looks for a file named `secrets.env` relative to the training script and loads them into the environment when `wandb.init()` is called. You can generate a `secrets.env` file by calling `wandb.sagemaker_auth(path="source_dir")` in the script you use to launch your experiments. Be sure to add this file to your `.gitignore`! +W&B looks for a file named `secrets.env` relative to the training script and loads its contents into the environment when you call `wandb.init()`. To generate a `secrets.env` file, call `wandb.sagemaker_auth(path="source_dir")` in the script you use to launch your experiments. Add this file to your `.gitignore`. ## Existing estimators -If you're using one of SageMakers preconfigured estimators you need to add a `requirements.txt` to your source directory that includes wandb +If you're using one of SageMaker's preconfigured estimators, add a `requirements.txt` file to your source directory that includes `wandb`: ```text wandb ``` -If you're using an estimator that's running Python 2, you'll need to install `psutil` directly from this [wheel](https://pythonwheels.com) before installing wandb: +If you're using an estimator that runs Python 2, install `psutil` from this [wheel](https://pythonwheels.com) before you install `wandb`: ```text https://wheels.galaxyproject.org/packages/psutil-5.4.8-cp27-cp27mu-manylinux1_x86_64.whl wandb ``` -Review a complete example on [GitHub](https://github.com/wandb/examples/tree/master/examples/pytorch/pytorch-cifar10-sagemaker), and read more on our [blog](https://wandb.ai/site/articles/running-sweeps-with-sagemaker). +For a complete example, see the [SageMaker example on GitHub](https://github.com/wandb/examples/tree/master/examples/pytorch/pytorch-cifar10-sagemaker). For more about running sweeps with SageMaker, see the [W&B blog post on SageMaker sweeps](https://wandb.ai/site/articles/running-sweeps-with-sagemaker). -You can also read the [Deploy Sentiment Analyzer Using SageMaker and W&B tutorial](https://wandb.ai/authors/sagemaker/reports/Deploy-Sentiment-Analyzer-Using-SageMaker-and-W-B--VmlldzoxODA1ODE) on deploying a sentiment analyzer using SageMaker and W&B. +For a tutorial on deploying a sentiment analyzer with SageMaker and W&B, see [Deploy Sentiment Analyzer Using SageMaker and W&B](https://wandb.ai/authors/sagemaker/reports/Deploy-Sentiment-Analyzer-Using-SageMaker-and-W-B--VmlldzoxODA1ODE). -The W&B sweep agent behaves as expected in a SageMaker job only if your SageMaker integration is turned off. Turn off the SageMaker integration by modifying your invocation of `wandb.init()`: +The W&B sweep agent works correctly inside a SageMaker job only when the SageMaker integration is turned off. To turn off the SageMaker integration, update your call to `wandb.init()`: ```python wandb.init(..., settings=wandb.Settings(sagemaker_disable=True)) diff --git a/models/integrations/scikit.mdx b/models/integrations/scikit.mdx index d0cc38ff74..c9d6b8e0f1 100644 --- a/models/integrations/scikit.mdx +++ b/models/integrations/scikit.mdx @@ -1,11 +1,12 @@ --- title: Scikit-Learn description: "Use W&B to visualize and compare scikit-learn model performance with experiment tracking and automated plot logging." +keywords: ["sklearn plots", "plot_classifier", "plot_regressor"] --- import ApiKeyCreateStreamlined from "/snippets/_includes/api-key-create-streamlined.mdx"; -You can use wandb to visualize and compare your scikit-learn models' performance with just a few lines of code. [Try an example →](https://wandb.me/scikit-colab) +This page shows scikit-learn users how to use W&B to track experiments and automatically log charts that visualize and compare model performance. You can use wandb to visualize and compare your scikit-learn models' performance with a few lines of code. [Try an example](https://wandb.me/scikit-colab). ## Get started @@ -27,7 +28,7 @@ To install the `wandb` library locally and log in: 1. Set the `WANDB_API_KEY` [environment variable](/models/track/environment-variables/) to your API key. ```bash - export WANDB_API_KEY= + export WANDB_API_KEY=[YOUR-API-KEY] ``` 1. Install the `wandb` library and log in. @@ -61,6 +62,8 @@ wandb.login() ### Log metrics +After installing and logging in, log metrics from your scikit-learn training code so you can compare runs in W&B. + ```python import wandb @@ -78,7 +81,9 @@ wandb.init(project="visualize-sklearn") as run: ### Make plots -#### Step 1: Import wandb and initialize a new run +In addition to logging metrics, you can generate diagnostic plots for your scikit-learn models and log them as part of a run. The following steps initialize a run and then visualize either individual plots or a full set of plots for a given model type. + +#### Import wandb and initialize a new run ```python import wandb @@ -86,20 +91,22 @@ import wandb run = wandb.init(project="visualize-sklearn") ``` -#### Step 2: Visualize plots +#### Visualize plots + +The following sections describe how to visualize individual plots or all plots for a given model type. -#### Individual plots +##### Individual plots -After training a model and making predictions you can then generate plots in wandb to analyze your predictions. See the **Supported Plots** section below for a full list of supported charts. +After training a model and making predictions, you can generate plots in wandb to analyze your predictions. For more information about supported charts, see the **Supported plots** section. ```python # Visualize single plot wandb.sklearn.plot_confusion_matrix(y_true, y_pred, labels) ``` -#### All plots +##### All plots -W&B has functions such as `plot_classifier` that will plot several relevant plots: +W&B has functions such as `plot_classifier` that plot several relevant plots: ```python # Visualize all classifier plots @@ -129,13 +136,13 @@ run.finish() #### Existing Matplotlib plots -Plots created on Matplotlib can also be logged on W&B Dashboard. To do that, it is first required to install `plotly`. +If you already create plots with Matplotlib, you can log them on the W&B dashboard alongside your scikit-learn plots. To do that, you must first install `plotly`. ```bash pip install plotly ``` -Finally, the plots can be logged on W&B's dashboard as follows: +Finally, log the plots on the W&B dashboard as follows: ```python import matplotlib.pyplot as plt @@ -152,13 +159,15 @@ with wandb.init(project="visualize-sklearn") as run: ## Supported plots +The following sections describe each plot type that `wandb.sklearn` can produce, along with the function signature and arguments. Use these as a reference when calling individual plot functions or interpreting the output of `plot_classifier`, `plot_regressor`, and `plot_clusterer`. + ### Learning curve Scikit-learn learning curve -Trains model on datasets of varying lengths and generates a plot of cross validated scores vs dataset size, for both training and test sets. +Trains model on datasets of varying lengths and generates a plot of cross-validated scores versus dataset size, for both training and test sets. `wandb.sklearn.plot_learning_curve(model, X, y)` @@ -172,7 +181,7 @@ Trains model on datasets of varying lengths and generates a plot of cross valida Scikit-learn ROC curve -ROC curves plot true positive rate (y-axis) vs false positive rate (x-axis). The ideal score is a TPR = 1 and FPR = 0, which is the point on the top left. Typically we calculate the area under the ROC curve (AUC-ROC), and the greater the AUC-ROC the better. +ROC curves plot true positive rate (y-axis) versus false positive rate (x-axis). The ideal score is a TPR = 1 and FPR = 0, which is the point on the top left. You calculate the area under the ROC curve (AUC-ROC), and the greater the AUC-ROC the better. `wandb.sklearn.plot_roc(y_true, y_probas, labels)` @@ -202,7 +211,7 @@ Plots the distribution of target classes in training and test sets. Useful for d Computes the tradeoff between precision and recall for different thresholds. A high area under the curve represents both high recall and high precision, where high precision relates to a low false positive rate, and high recall relates to a low false negative rate. -High scores for both show that the classifier is returning accurate results (high precision), as well as returning a majority of all positive results (high recall). PR curve is useful when the classes are very imbalanced. +High scores for both show that the classifier is returning accurate results (high precision), as well as returning a majority of all positive results (high recall). The precision-recall curve is useful when the classes are imbalanced. `wandb.sklearn.plot_precision_recall(y_true, y_probas, labels)` @@ -216,7 +225,7 @@ High scores for both show that the classifier is returning accurate results (hig Scikit-learn feature importance chart -Evaluates and plots the importance of each feature for the classification task. Only works with classifiers that have a `feature_importances_` attribute, like trees. +Evaluates and plots the importance of each feature for the classification task. Only works with classifiers that have a `feature_importances_` attribute, such as trees. `wandb.sklearn.plot_feature_importances(model, ['width', 'height, 'length'])` @@ -231,7 +240,7 @@ Evaluates and plots the importance of each feature for the classification task. Plots how well calibrated the predicted probabilities of a classifier are and how to calibrate an uncalibrated classifier. Compares estimated predicted probabilities by a baseline logistic regression model, the model passed as an argument, and by both its isotonic calibration and sigmoid calibrations. -The closer the calibration curves are to a diagonal the better. A transposed sigmoid like curve represents an overfitted classifier, while a sigmoid like curve represents an underfitted classifier. By training isotonic and sigmoid calibrations of the model and comparing their curves we can figure out whether the model is over or underfitting and if so which calibration (sigmoid or isotonic) might help fix this. +The closer the calibration curves are to a diagonal the better. A transposed sigmoid-like curve represents an overfitted classifier, while a sigmoid-like curve represents an underfitted classifier. By training isotonic and sigmoid calibrations of the model and comparing their curves, you can figure out whether the model is over or underfitting and, if so, which calibration (sigmoid or isotonic) might help fix this. For more details, check out [sklearn's docs](https://scikit-learn.org/stable/auto_examples/calibration/plot_calibration_curve.html). @@ -240,7 +249,7 @@ For more details, check out [sklearn's docs](https://scikit-learn.org/stable/aut * model (clf): Takes in a fitted classifier. * X (arr): Training set features. * y (arr): Training set labels. -* model_name (str): Model name. Defaults to 'Classifier' +* model_name (str): Model name. Defaults to `"Classifier"`. ### Confusion matrix @@ -248,7 +257,7 @@ For more details, check out [sklearn's docs](https://scikit-learn.org/stable/aut Scikit-learn confusion matrix -Computes the confusion matrix to evaluate the accuracy of a classification. It's useful for assessing the quality of model predictions and finding patterns in the predictions the model gets wrong. The diagonal represents the predictions the model got right, such as where the actual label is equal to the predicted label. +Computes the confusion matrix to evaluate the accuracy of a classification. It's useful for assessing the quality of model predictions and finding patterns in incorrect predictions. The diagonal represents the predictions where the actual label is equal to the predicted label. `wandb.sklearn.plot_confusion_matrix(y_true, y_pred, labels)` @@ -270,7 +279,7 @@ Computes the confusion matrix to evaluate the accuracy of a classification. It's * model (clf or reg): Takes in a fitted regressor or classifier. * X (arr): Training set features. * y (arr): Training set labels. - * X_test (arr): Test set features. +* X_test (arr): Test set features. * y_test (arr): Test set labels. ### Elbow plot @@ -292,17 +301,17 @@ Measures and plots the percentage of variance explained as a function of the num Scikit-learn silhouette plot -Measures & plots how close each point in one cluster is to points in the neighboring clusters. The thickness of the clusters corresponds to the cluster size. The vertical line represents the average silhouette score of all the points. +Measures and plots how close each point in one cluster is to points in the neighboring clusters. The thickness of the clusters corresponds to the cluster size. The vertical line represents the average silhouette score of all the points. -Silhouette coefficients near +1 indicate that the sample is far away from the neighboring clusters. A value of 0 indicates that the sample is on or very close to the decision boundary between two neighboring clusters and negative values indicate that those samples might have been assigned to the wrong cluster. +Silhouette coefficients near +1 indicate that the sample is far away from the neighboring clusters. A value of 0 indicates that the sample is on or close to the decision boundary between two neighboring clusters, and negative values indicate that those samples might have been assigned to the wrong cluster. -In general we want all silhouette cluster scores to be above average (past the red line) and as close to 1 as possible. We also prefer cluster sizes that reflect the underlying patterns in the data. +You want all silhouette cluster scores to be above average (past the red line) and as close to 1 as possible. You also prefer cluster sizes that reflect the underlying patterns in the data. `wandb.sklearn.plot_silhouette(model, X_train, ['spam', 'not spam'])` * model (clusterer): Takes in a fitted clusterer. * X (arr): Training set features. - * cluster_labels (list): Names for cluster labels. Makes plots easier to read by replacing cluster indexes with corresponding names. +* cluster_labels (list): Names for cluster labels. Makes plots easier to read by replacing cluster indexes with corresponding names. ### Outlier candidates plot @@ -310,7 +319,7 @@ In general we want all silhouette cluster scores to be above average (past the r Scikit-learn outlier plot -Measures a datapoint's influence on regression model via cook's distance. Instances with heavily skewed influences could potentially be outliers. Useful for outlier detection. +Measures a datapoint's influence on regression model through Cook's distance. Instances with heavily skewed influences could be outliers. Useful for outlier detection. `wandb.sklearn.plot_outlier_candidates(model, X, y)` @@ -324,18 +333,18 @@ Measures a datapoint's influence on regression model via cook's distance. Instan Scikit-learn residuals plot -Measures and plots the predicted target values (y-axis) vs the difference between actual and predicted target values (x-axis), as well as the distribution of the residual error. +Measures and plots the predicted target values (y-axis) versus the difference between actual and predicted target values (x-axis), as well as the distribution of the residual error. -Generally, the residuals of a well-fit model should be randomly distributed because good models will account for most phenomena in a data set, except for random error. +The residuals of a well-fit model should be randomly distributed because good models account for most phenomena in a data set, except for random error. `wandb.sklearn.plot_residuals(model, X, y)` * model (regressor): Takes in a fitted classifier. * X (arr): Training set features. -* y (arr): Training set labels. +* y (arr): Training set labels. - If you have any questions, we'd love to answer them in our [slack community](https://wandb.me/slack). +If you have any questions, ask them in the [Slack community](https://wandb.me/slack). ## Example -* [Run in colab](https://wandb.me/scikit-colab): A simple notebook to get you started. +[Run in colab](https://wandb.me/scikit-colab): A simple notebook to get you started. diff --git a/models/integrations/simpletransformers.mdx b/models/integrations/simpletransformers.mdx index e4e7a762df..28b7943f31 100644 --- a/models/integrations/simpletransformers.mdx +++ b/models/integrations/simpletransformers.mdx @@ -1,35 +1,38 @@ --- description: How to integrate W&B with the Transformers library by Hugging Face. title: Hugging Face Simple Transformers +keywords: ["classification model", "NER model", "question answering"] --- -This library is based on the Transformers library by Hugging Face. Simple Transformers lets you quickly train and evaluate Transformer models. Only 3 lines of code are needed to initialize a model, train the model, and evaluate a model. It supports Sequence Classification, Token Classification \(NER\),Question Answering,Language Model Fine-Tuning, Language Model Training, Language Generation, T5 Model, Seq2Seq Tasks , Multi-Modal Classification and Conversational AI. +This page shows how to integrate Weights & Biases (W&B) with Simple Transformers so you can visualize and track Transformer model training. By the end, you'll know how to enable W&B logging from a Simple Transformers model and where to find examples for common NLP tasks. -To use W&B for visualizing model training. To use this, set a project name for W&B in the `wandb_project` attribute of the `args` dictionary. This logs all hyperparameter values, training losses, and evaluation metrics to the given project. +Simple Transformers is based on the Transformers library by Hugging Face and lets you train and evaluate Transformer models. You need only three lines of code to initialize a model, train the model, and evaluate a model. It supports sequence classification, token classification (NER), question answering, language model fine-tuning, language model training, language generation, T5 model, Seq2Seq tasks, multi-modal classification, and conversational AI. + +To use W&B for visualizing model training, set a project name for W&B in the `wandb_project` attribute of the `args` dictionary. This logs all hyperparameter values, training losses, and evaluation metrics to the given project. ```python model = ClassificationModel('roberta', 'roberta-base', args={'wandb_project': 'project-name'}) ``` -Any additional arguments that go into `wandb.init()` can be passed as `wandb_kwargs`. +You can pass any additional arguments that go into `wandb.init()` as `wandb_kwargs`. ## Structure -The library is designed to have a separate class for every NLP task. The classes that provide similar functionality are grouped together. +The following section outlines how Simple Transformers organizes its classes, so you know which module to import for a given task. The library is designed to have a separate class for every NLP task. The classes that provide similar functionality are grouped together. -* `simpletransformers.classification` - Includes all Classification models. +* `simpletransformers.classification` - Includes all classification models. * `ClassificationModel` * `MultiLabelClassificationModel` -* `simpletransformers.ner` - Includes all Named Entity Recognition models. +* `simpletransformers.ner` - Includes all named entity recognition models. * `NERModel` -* `simpletransformers.question_answering` - Includes all Question Answering models. +* `simpletransformers.question_answering` - Includes all question answering models. * `QuestionAnsweringModel` -Here are some minimal examples +The following sections describe minimal examples for two common tasks, demonstrating how to enable W&B logging through the `wandb_project` argument. -## MultiLabel Classification +## Multi-label classification -```text +```python model = MultiLabelClassificationModel("distilbert","distilbert-base-uncased",num_labels=6, args={"reprocess_input_data": True, "overwrite_output_dir": True, "num_train_epochs":epochs,'learning_rate':learning_rate, 'wandb_project': "simpletransformers"}, @@ -43,7 +46,7 @@ Here are some minimal examples ## Question answering -```text +```python train_args = { 'learning_rate': wandb.config.learning_rate, 'num_train_epochs': 2, @@ -60,10 +63,11 @@ model = QuestionAnsweringModel('distilbert', 'distilbert-base-cased', args=train model.train_model(train_data) ``` +## Global arguments -SimpleTransformers provides classes as well as training scripts for all common natural language tasks. Here is the complete list of global arguments that are supported by the library, with their default arguments. +SimpleTransformers provides classes as well as training scripts for all common natural language tasks. The following is the complete list of global arguments that the library supports, with their default arguments. Refer to this list when you want to customize training behavior beyond the W&B-specific options shown earlier. -```text +```python global_args = { "adam_epsilon": 1e-8, "best_model_dir": "outputs/best_model", @@ -118,6 +122,8 @@ global_args = { } ``` -Refer to [simpletransformers on github](https://github.com/ThilinaRajapakse/simpletransformers) for more detailed documentation. +## Additional resources + +Refer to [simpletransformers on GitHub](https://github.com/ThilinaRajapakse/simpletransformers) for more detailed documentation. -Checkout [this W&B report](https://app.wandb.ai/cayush/simpletransformers/reports/Using-simpleTransformer-on-common-NLP-applications---Vmlldzo4Njk2NA) that covers training transformers on some the most popular GLUE benchmark datasets. [Try it out yourself on colab](https://colab.research.google.com/drive/1oXROllqMqVvBFcPgTKJRboTq96uWuqSz?usp=sharing). \ No newline at end of file +See [Using simpleTransformer on common NLP applications](https://app.wandb.ai/cayush/simpletransformers/reports/Using-simpleTransformer-on-common-NLP-applications---Vmlldzo4Njk2NA), a W&B report that covers training transformers on some of the most popular GLUE benchmark datasets. [Try it yourself on Colab](https://colab.research.google.com/drive/1oXROllqMqVvBFcPgTKJRboTq96uWuqSz?usp=sharing). \ No newline at end of file diff --git a/models/integrations/skorch.mdx b/models/integrations/skorch.mdx index a974308aae..2a72c1fbde 100644 --- a/models/integrations/skorch.mdx +++ b/models/integrations/skorch.mdx @@ -1,35 +1,38 @@ --- description: "Integrate W&B with Skorch to log scikit-learn compatible PyTorch model training metrics and hyperparameters." title: Skorch +keywords: ["NeuralNetClassifier", "skorch net", "epoch callback"] --- -You can use W&B with Skorch to automatically log the model with the best performance, along with all model performance metrics, the model topology and compute resources after each epoch. Every file saved in `wandb_run.dir` is automatically logged to W&B. +This page shows you how to use W&B with [Skorch](https://skorch.readthedocs.io/) so you can track Skorch model training without writing custom logging code. When you integrate the two, W&B automatically logs the model with the best performance, along with all model performance metrics, the model topology, and compute resources after each epoch. W&B automatically logs every file you save in `wandb_run.dir`. -See [example run](https://app.wandb.ai/borisd13/skorch/runs/s20or4ct?workspace=user-borisd13). +For more information, see this [example run](https://app.wandb.ai/borisd13/skorch/runs/s20or4ct?workspace=user-borisd13). ## Parameters +The following table lists the parameters that the `WandbLogger` callback accepts. + | Parameter | Type | Description | | :--- | :--- | :--- | -| `wandb_run` | `wandb.wandb_run`. Run | wandb run used to log data. | -|`save_model` | bool (default=True)| Whether to save a checkpoint of the best model and upload it to your Run on W&B.| -|`keys_ignored`| str or list of str (default=None) | Key or list of keys that should not be logged to tensorboard. Note that in addition to the keys provided by the user, keys such as those starting with `event_` or ending on `_best` are ignored by default.| +| `wandb_run` | `wandb.wandb_run`. Run | The W&B run used to log data. | +|`save_model` | `bool` (default=`True`)| Whether to save a checkpoint of the best model and upload it to your run on W&B.| +|`keys_ignored`| `str` or list of `str` (default=`None`) | Key or list of keys not to log to TensorBoard. In addition to the keys you provide, W&B ignores keys such as those starting with `event_` or ending with `_best` by default.| ## Example code -We've created a few examples for you to see how the integration works: +The following examples show end-to-end usage of `WandbLogger` with Skorch: -* [Colab](https://colab.research.google.com/drive/1Bo8SqN1wNPMKv5Bn9NjwGecBxzFlaNZn?usp=sharing): A simple demo to try the integration -* [A step by step guide](https://app.wandb.ai/cayush/uncategorized/reports/Automate-Kaggle-model-training-with-Skorch-and-W%26B--Vmlldzo4NTQ1NQ): to tracking your Skorch model performance +* [Colab](https://colab.research.google.com/drive/1Bo8SqN1wNPMKv5Bn9NjwGecBxzFlaNZn?usp=sharing): A simple demo to try the integration. +* [Step-by-step guide](https://app.wandb.ai/cayush/uncategorized/reports/Automate-Kaggle-model-training-with-Skorch-and-W%26B--Vmlldzo4NTQ1NQ): A walkthrough for tracking your Skorch model performance. ```python # Install wandb -... pip install wandb +pip install wandb import wandb from skorch.callbacks import WandbLogger -# Create a wandb Run +# Create a wandb run wandb_run = wandb.init() # Log hyper-parameters (optional) @@ -41,13 +44,15 @@ net.fit(X, y) ## Method reference +The following table lists the callback methods that `WandbLogger` provides and when Skorch invokes each one. + | Method | Description | | :--- | :--- | | `initialize`\(\) | \(Re-\)Set the initial state of the callback. | | `on_batch_begin`\(net\[, X, y, training\]\) | Called at the beginning of each batch. | | `on_batch_end`\(net\[, X, y, training\]\) | Called at the end of each batch. | -| `on_epoch_begin`\(net\[, dataset_train, …\]\) | Called at the beginning of each epoch. | -| `on_epoch_end`\(net, \*\*kwargs\) | Log values from the last history step and save best model | -| `on_grad_computed`\(net, named_parameters\[, X, …\]\) | Called once per batch after gradients have been computed but before an update step was performed. | -| `on_train_begin`\(net, \*\*kwargs\) | Log model topology and add a hook for gradients | +| `on_epoch_begin`\(net\[, dataset_train, ...\]\) | Called at the beginning of each epoch. | +| `on_epoch_end`\(net, \*\*kwargs\) | Log values from the last history step and save the best model. | +| `on_grad_computed`\(net, named_parameters\[, X, ...\]\) | Called once per batch after gradients are computed but before an update step is performed. | +| `on_train_begin`\(net, \*\*kwargs\) | Log model topology and add a hook for gradients. | | `on_train_end`\(net\[, X, y\]\) | Called at the end of training. | diff --git a/models/integrations/spacy.mdx b/models/integrations/spacy.mdx index b315fdbad6..00f591a56c 100644 --- a/models/integrations/spacy.mdx +++ b/models/integrations/spacy.mdx @@ -1,11 +1,14 @@ --- title: spaCy description: "Integrate W&B with spaCy v3 to track training metrics and version models and datasets through the WandbLogger config." +keywords: ["spacy train config", "nlp pipeline", "entity recognition"] --- import ApiKeyCreateStreamlined from "/snippets/_includes/api-key-create-streamlined.mdx"; -[spaCy](https://spacy.io) is a popular "industrial-strength" NLP library: fast, accurate models with a minimum of fuss. As of spaCy v3, W&B can now be used with [`spacy train`](https://spacy.io/api/cli#train) to track your spaCy model's training metrics as well as to save and version your models and datasets. And all it takes is a few added lines in your configuration. +[spaCy](https://spacy.io) is an NLP library that provides fast, accurate models. As of spaCy v3, you can use W&B with [`spacy train`](https://spacy.io/api/cli#train) to track your spaCy model's training metrics and to save and version your models and datasets. All it takes is a few added lines in your configuration. + +This page is for spaCy users who want to use W&B to monitor training runs, compare experiments, and version the models and datasets produced by `spacy train`. ## Sign up and create an API key @@ -25,7 +28,7 @@ To install the `wandb` library locally and log in: 1. Set the `WANDB_API_KEY` [environment variable](/models/track/environment-variables/) to your API key. ```bash - export WANDB_API_KEY= + export WANDB_API_KEY=[YOUR-API-KEY] ``` 1. Install the `wandb` library and log in. @@ -59,10 +62,10 @@ wandb.login() ## Add the `WandbLogger` to your spaCy config file -spaCy config files are used to specify all aspects of training, not just logging -- GPU allocation, optimizer choice, dataset paths, and more. Minimally, under `[training.logger]` you need to provide the key `@loggers` with the value `"spacy.WandbLogger.v3"`, plus a `project_name`. +spaCy config files specify all aspects of training, not only logging (GPU allocation, optimizer choice, dataset paths, and more). Minimally, under `[training.logger]` you need to provide the key `@loggers` with the value `"spacy.WandbLogger.v3"`, plus a `project_name`. -For more on how spaCy training config files work and on other options you can pass in to customize training, check out [spaCy's documentation](https://spacy.io/usage/training). +For more on how spaCy training config files work and on other options you can pass in to customize training, see [spaCy's documentation](https://spacy.io/usage/training). ```python @@ -74,22 +77,24 @@ log_dataset_dir = "./corpus" model_log_interval = 1000 ``` +The following table describes the `WandbLogger` configuration options: + | Name | Description | | ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `project_name` | `str`. The name of the W&B Project. The project will be created automatically if it doesn’t exist yet. | -| `remove_config_values` | `List[str]` . A list of values to exclude from the config before it is uploaded to W&B. `[]` by default. | -| `model_log_interval` | `Optional int`. `None` by default. If set, enables [model versioning](/models/registry/) with [Artifacts](/models/artifacts/). Pass in the number of steps to wait between logging model checkpoints. `None` by default. | -| `log_dataset_dir` | `Optional str`. If passed a path, the dataset will be uploaded as an Artifact at the beginning of training. `None` by default. | -| `entity` | `Optional str` . If passed, the run will be created in the specified entity | -| `run_name` | `Optional str` . If specified, the run will be created with the specified name. | +| `project_name` | `str`. The name of the W&B project. W&B creates the project automatically if it doesn't exist yet. | +| `remove_config_values` | `List[str]` . A list of values to exclude from the config before W&B uploads it. `[]` by default. | +| `model_log_interval` | `Optional int`. `None` by default. If set, enables [model versioning](/models/registry/) with [artifacts](/models/artifacts/). Pass in the number of steps to wait between logging model checkpoints. | +| `log_dataset_dir` | `Optional str`. If you pass a path, W&B uploads the dataset as an artifact at the beginning of training. `None` by default. | +| `entity` | `Optional str` . If passed, W&B creates the run in the specified entity. | +| `run_name` | `Optional str` . If specified, W&B creates the run with the specified name. | ## Start training -Once you have added the `WandbLogger` to your spaCy training config you can run `spacy train` as usual. +With the `WandbLogger` added to your spaCy training config, you can run `spacy train` as usual and W&B captures the run automatically. -```python +```bash python -m spacy train \ config.cfg \ --output ./output \ @@ -117,4 +122,4 @@ python -m spacy train \ -When training begins, a link to your training run's [W&B page](/models/runs/) will be output which will take you to this run's experiment tracking [dashboard](/models/track/workspaces/) in the W&B web UI. +When training begins, spaCy outputs a link to your training run's [W&B page](/models/runs/), which takes you to this run's experiment tracking [dashboard](/models/track/workspaces/) in the W&B web UI. diff --git a/models/integrations/stable-baselines-3.mdx b/models/integrations/stable-baselines-3.mdx index bc846e1edf..2d7ab0771d 100644 --- a/models/integrations/stable-baselines-3.mdx +++ b/models/integrations/stable-baselines-3.mdx @@ -1,20 +1,23 @@ --- description: "Integrate W&B with Stable Baselines3 to track reinforcement learning experiments and log training performance." title: Stable Baselines 3 PyTorch +keywords: ["SB3", "PPO training", "WandbCallback RL"] --- -[Stable Baselines 3](https://github.com/DLR-RM/stable-baselines3) \(SB3\) is a set of reliable implementations of reinforcement learning algorithms in PyTorch. W&B's SB3 integration: +[Stable Baselines 3](https://github.com/DLR-RM/stable-baselines3) (SB3) is a set of reliable implementations of reinforcement learning algorithms in PyTorch. The W&B SB3 integration: * Records metrics such as losses and episodic returns. * Uploads videos of agents playing the games. * Saves the trained model. * Logs the model's hyperparameters. -* Logs the model gradient histograms. +* Logs the model's gradient histograms. Review an [example SB3 training run](https://wandb.ai/wandb/sb3/runs/1jyr6z10). ## Log your SB3 experiments +To log SB3 training to W&B, pass `WandbCallback` to your model's `learn` method: + ```python from wandb.integration.sb3 import WandbCallback @@ -25,18 +28,20 @@ model.learn(..., callback=WandbCallback()) Stable Baselines 3 training with W&B -## WandbCallback Arguments +## `WandbCallback` arguments + +The following table describes the arguments you can pass to `WandbCallback`: | Argument | Usage | | :--- | :--- | -| `verbose` | The verbosity of sb3 output | -| `model_save_path` | Path to the folder where the model will be saved, The default value is \`None\` so the model is not logged | -| `model_save_freq` | Frequency to save the model | -| `gradient_save_freq` | Frequency to log gradient. The default value is 0 so the gradients are not logged | +| `verbose` | The verbosity of SB3 output. | +| `model_save_path` | Path to the folder where the model is saved. The default is `None`, so the model isn't logged. | +| `model_save_freq` | Frequency to save the model. | +| `gradient_save_freq` | Frequency to log gradients. The default is `0`, so gradients aren't logged. | ## Basic example -The W&B SB3 integration uses the logs output from TensorBoard to log your metrics +The W&B SB3 integration uses the logs output from TensorBoard to log your metrics. ```python import gym diff --git a/models/integrations/tensorboard.mdx b/models/integrations/tensorboard.mdx index 151c4f6e1d..921eb0ad5f 100644 --- a/models/integrations/tensorboard.mdx +++ b/models/integrations/tensorboard.mdx @@ -1,6 +1,7 @@ --- title: TensorBoard description: "Sync TensorBoard logs to W&B for cloud-hosted visualization, sharing, and centralized analysis alongside system metrics." +keywords: ["wandb.tensorboard.patch", "tb log import", "cloud tensorboard"] --- import { ColabLink } from '/snippets/_includes/colab-link.mdx'; @@ -10,7 +11,7 @@ import { ColabLink } from '/snippets/_includes/colab-link.mdx'; W&B supports embedded TensorBoard for W&B Multi-tenant Cloud. -Upload your TensorBoard logs to the cloud, quickly share your results among colleagues and classmates and keep your analysis in one centralized location. +This page shows how to sync TensorBoard logs to W&B so you can upload your TensorBoard logs to the cloud, share your results among colleagues and classmates, and keep your analysis in one centralized location. This integration is for users who already log to TensorBoard and want cloud-hosted visualization, sharing, and side-by-side comparison with W&B system metrics. TensorBoard integration code @@ -18,6 +19,8 @@ Upload your TensorBoard logs to the cloud, quickly share your results among coll ## Get started +To enable TensorBoard syncing, set `sync_tensorboard=True` when you initialize a W&B run. W&B automatically uploads any TensorBoard events your training code emits. + ```python import wandb @@ -30,25 +33,27 @@ wandb.init(project="my-project", sync_tensorboard=True) as run: Review an [example TensorBoard integration run](https://wandb.ai/rymc/simple-tensorboard-example/runs/oab614zf/tensorboard). -Once your run finishes, you can access your TensorBoard event files in W&B and you can visualize your metrics in native W&B charts, together with additional useful information like the system's CPU or GPU utilization, the `git` state, the terminal command the run used, and more. +After your run finishes, you can access your TensorBoard event files in W&B and visualize your metrics in native W&B charts. W&B also captures additional information such as system CPU or GPU utilization, the `git` state, and the terminal command the run used. -W&B supports TensorBoard with all versions of TensorFlow. W&B also supports TensorBoard 1.14 and higher with PyTorch as well as TensorBoardX. +W&B supports TensorBoard with all versions of TensorFlow. W&B also supports TensorBoard 1.14 and later with PyTorch as well as TensorBoardX. ## Frequently asked questions +The following sections answer common questions about customizing the TensorBoard integration, including logging extra metrics, configuring the patch, syncing historical runs, and using notebook environments. + ### How can I log metrics to W&B that aren't logged to TensorBoard? -If you need to log additional custom metrics that aren't being logged to TensorBoard, you can call `wandb.Run.log()` in your code `run.log({"custom": 0.8})` +If you need to log additional custom metrics that aren't logged to TensorBoard, you can call `wandb.Run.log()` in your code: `run.log({"custom": 0.8})`. -Setting the step argument in `run.log()` is turned off when syncing Tensorboard. If you'd like to set a different step count, you can log the metrics with a step metric as: +Setting the step argument in `run.log()` is turned off when syncing TensorBoard. If you'd like to set a different step count, you can log the metrics with a step metric as: `run.log({"custom": 0.8, "global_step": global_step})` -### How do I configure Tensorboard when I'm using it with `wandb`? +### How do I configure TensorBoard when I'm using it with `wandb`? -If you want more control over how TensorBoard is patched you can call `wandb.tensorboard.patch()` instead of passing `sync_tensorboard=True` to `wandb.init()`. +If you want more control over how W&B patches TensorBoard, call `wandb.tensorboard.patch()` instead of passing `sync_tensorboard=True` to `wandb.init()`. ```python import wandb @@ -60,9 +65,9 @@ run = wandb.init() run.finish() ``` -You can pass `tensorboard_x=False` to this method to ensure vanilla TensorBoard is patched, if you're using TensorBoard > 1.14 with PyTorch you can pass `pytorch=True` to ensure it's patched. Both of these options have smart defaults depending on what versions of these libraries have been imported. +To patch vanilla TensorBoard, pass `tensorboard_x=False` to this method. If you're using TensorBoard later than 1.14 with PyTorch, pass `pytorch=True` to patch it. Both of these options have sensible defaults depending on what versions of these libraries you've imported. -By default, we also sync the `tfevents` files and any `.pbtxt` files. This enables us to launch a TensorBoard instance on your behalf. You will see a [TensorBoard tab](https://www.wandb.com/articles/hosted-tensorboard) on the run page. This behavior can be turned off by passing `save=False` to `wandb.tensorboard.patch` +By default, W&B also syncs the `tfevents` files and any `.pbtxt` files. This lets W&B launch a TensorBoard instance on your behalf. You see a [TensorBoard tab](https://www.wandb.com/articles/hosted-tensorboard) on the run page. To turn off this behavior, pass `save=False` to `wandb.tensorboard.patch`. ```python import wandb @@ -75,7 +80,7 @@ run.finish() ``` -You must call either `wandb.init()` or `wandb.tensorboard.patch()` **before** calling `tf.summary.create_file_writer()` or constructing a `SummaryWriter` via `torch.utils.tensorboard`. +You must call either `wandb.init()` or `wandb.tensorboard.patch()` before calling `tf.summary.create_file_writer()` or constructing a `SummaryWriter` via `torch.utils.tensorboard`. ### How do I sync historical TensorBoard runs? @@ -84,7 +89,7 @@ If you have existing `tfevents` files stored locally and you would like to impor ### How do I use Google Colab or Jupyter with TensorBoard? -If running your code in a Jupyter or Colab notebook, make sure to call `wandb.Run.finish()` and the end of your training. This will finish the wandb run and upload the tensorboard logs to W&B so they can be visualized. This is not necessary when running a `.py` script as wandb finishes automatically when a script finishes. +If you run your code in a Jupyter or Colab notebook, make sure to call `wandb.Run.finish()` at the end of your training. This finishes the `wandb` run and uploads the TensorBoard logs to W&B so they can be visualized. This isn't necessary when you run a `.py` script, because `wandb` finishes automatically when a script finishes. To run shell commands in a notebook environment, you must prepend a `!`, as in `!wandb sync directoryname`. @@ -99,7 +104,7 @@ with wandb.init(project="my-project", sync_tensorboard=True) as run: ### Can I sync tfevents files stored in the cloud? -`wandb` 0.20.0 and above supports syncing `tfevents` files stored in S3, GCS or Azure. `wandb` uses the default credentials for each cloud provider, corresponding to the commands in the following table: +`wandb` 0.20.0 and later supports syncing `tfevents` files stored in S3, GCS, or Azure. `wandb` uses the default credentials for each cloud provider. The following table lists the command to configure credentials and the expected logging directory format for each provider: | Cloud provider | Credentials | Logging directory format | | -------------- | --------------------------------------- | ------------------------------------- | diff --git a/models/integrations/tensorflow.mdx b/models/integrations/tensorflow.mdx index 0c2a708c64..5d59b73b88 100644 --- a/models/integrations/tensorflow.mdx +++ b/models/integrations/tensorflow.mdx @@ -1,14 +1,17 @@ --- title: TensorFlow description: "Integrate W&B with TensorFlow for logging custom metrics, using estimator hooks, and TensorBoard log synchronization." +keywords: ["tf.estimator", "WandbEstimatorHook", "keras tensorboard"] --- import { ColabLink } from '/snippets/_includes/colab-link.mdx'; +This page shows how to integrate W&B with TensorFlow to track experiments, log metrics, and synchronize TensorBoard logs. Follow these patterns to capture training data from TensorFlow models, customize what you log through estimator hooks or manual logging, and reuse your existing TensorBoard workflows with W&B's centralized dashboard. This page targets TensorFlow users who want richer experiment tracking than TensorBoard alone provides. + ## Get started -If you're already using TensorBoard, it's easy to integrate with wandb. +If you already use TensorBoard, you can integrate with W&B. Import both libraries to make the W&B and TensorFlow APIs available in your script. ```python import tensorflow as tf @@ -17,18 +20,22 @@ import wandb ## Log custom metrics -If you need to log additional custom metrics that aren't being logged to TensorBoard, you can call `run.log()` in your code `run.log({"custom": 0.8}) ` +This section covers how to log metrics that TensorBoard doesn't already capture, so you can track additional values alongside your standard TensorBoard summaries. + +If you need to log additional custom metrics that TensorBoard doesn't log, you can call `run.log()` in your code, for example `run.log({"custom": 0.8})`. -Setting the step argument in `run.log()` is turned off when syncing Tensorboard. If you'd like to set a different step count, you can log the metrics with a step metric as: +W&B turns off the step argument in `run.log()` when syncing TensorBoard. To set a different step count, log the metrics with a step metric as: -``` python +```python with wandb.init(config=tf.flags.FLAGS, sync_tensorboard=True) as run: run.log({"custom": 0.8, "global_step":global_step}, step=global_step) ``` ## TensorFlow estimators hook -If you want more control over what gets logged, wandb also provides a hook for TensorFlow estimators. It will log all `tf.summary` values in the graph. +This section describes the W&B hook for TensorFlow estimators, which gives you fine-grained control over what W&B captures during estimator training. + +If you want more control over what gets logged, W&B also provides a hook for TensorFlow estimators. It logs all `tf.summary` values in the graph. ```python import tensorflow as tf @@ -42,7 +49,9 @@ run.finish() ## Log manually -The simplest way to log metrics in TensorFlow is by logging `tf.summary` with the TensorFlow logger: +If you're not using estimators or want to log specific summaries explicitly, this section shows how to send `tf.summary` values to W&B directly. + +One way to log metrics in TensorFlow is to log `tf.summary` with the TensorFlow logger: ```python import wandb @@ -52,7 +61,7 @@ with tf.Session() as sess: wandb.tensorflow.log(tf.summary.merge_all()) ``` -With TensorFlow 2, the recommended way of training a model with a custom loop is via using `tf.GradientTape`. You can read more in the [TensorFlow custom training walkthrough](https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough). If you want to incorporate `wandb` to log metrics in your custom TensorFlow training loops you can follow this snippet: +With TensorFlow 2, the recommended way to train a model with a custom loop is to use `tf.GradientTape`. For more information, see the [TensorFlow custom training walkthrough](https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough). To incorporate W&B to log metrics in your custom TensorFlow training loops, follow this snippet: ```python with tf.GradientTape() as tape: @@ -71,24 +80,26 @@ With TensorFlow 2, the recommended way of training a model with a custom loop is A [full example of customizing training loops in TensorFlow 2](https://www.wandb.com/articles/wandb-customizing-training-loops-in-tensorflow-2) is available. -## How is W&B different from TensorBoard? +## Differences between W&B and TensorBoard + +If you're evaluating whether to adopt W&B alongside or in place of TensorBoard, this section highlights the key differences. -When the cofounders started working on W&B, they were inspired to build a tool for the frustrated TensorBoard users at OpenAI. Here are a few things we've focused on improving: +W&B was built to address common limitations TensorBoard users encountered. Here are areas where W&B differs: -1. **Reproduce models**: W&B is good for experimentation, exploration, and reproducing models later. We capture not just the metrics, but also the hyperparameters and version of the code, and we can save your version-control status and model checkpoints for you so your project is reproducible. -2. **Automatic organization**: Whether you're picking up a project from a collaborator, coming back from a vacation, or dusting off an old project, W&B makes it easy to see all the models that have been tried so no one wastes hours, GPU cycles, or carbon re-running experiments. -3. **Fast, flexible integration**: Add W&B to your project in 5 minutes. Install our free open-source Python package and add a couple of lines to your code, and every time you run your model you'll have nice logged metrics and records. -4. **Persistent, centralized dashboard**: No matter where you train your models, whether on your local machine, in a shared lab cluster, or on spot instances in the cloud, your results are shared to the same centralized dashboard. You don't need to spend your time copying and organizing TensorBoard files from different machines. -5. **Powerful tables**: Search, filter, sort, and group results from different models. It's easy to look over thousands of model versions and find the best performing models for different tasks. TensorBoard isn't built to work well on large projects. -6. **Tools for collaboration**: Use W&B to organize complex machine learning projects. It's easy to share a link to W&B, and you can use private teams to have everyone sending results to a shared project. We also support collaboration via reports— add interactive visualizations and describe your work in markdown. This is a great way to keep a work log, share findings with your supervisor, or present findings to your lab or team. +- **Reproduce models**: W&B supports experimentation, exploration, and reproducing models later. W&B captures metrics, hyperparameters, and the code version, and can save your version-control status and model checkpoints so your project is reproducible. +- **Automatic organization**: When you're picking up a project from a collaborator, returning after time away, or revisiting an old project, W&B lets you see the models you've tried so you don't re-run experiments unnecessarily. +- **Flexible integration**: Add W&B to your project by installing the open-source Python package and adding a few lines to your code. Each run produces logged metrics and records. +- **Persistent, centralized dashboard**: Whether you train your models on your local machine, a shared lab cluster, or spot instances in the cloud, W&B sends your results to the same centralized dashboard. You don't need to copy and organize TensorBoard files from different machines. +- **Tables**: Search, filter, sort, and group results from different models. You can review model versions and find the best-performing models for different tasks. +- **Tools for collaboration**: Use W&B to organize machine learning projects. Share a link to W&B, or use private teams to send results to a shared project. Reports support collaboration through interactive visualizations and Markdown descriptions, which you can use to keep a work log, share findings with your supervisor, or present findings to your lab or team. -Get started with a [free account](https://wandb.ai) +To try W&B, [create a free account](https://wandb.ai). ## Examples -We've created a few examples for you to see how the integration works: +To see these integration patterns applied to complete projects, explore the following examples: -* [Example on Github](https://github.com/wandb/examples/blob/master/examples/tensorflow/tf-estimator-mnist/mnist.py): MNIST example Using TensorFlow Estimators -* [Example on Github](https://github.com/wandb/examples/blob/master/examples/tensorflow/tf-cnn-fashion/train.py): Fashion MNIST example Using Raw TensorFlow -* [Wandb Dashboard](https://app.wandb.ai/l2k2/examples-tf-estimator-mnist/runs/p0ifowcb): View result on W&B -* Customizing Training Loops in TensorFlow 2 - [Article](https://www.wandb.com/articles/wandb-customizing-training-loops-in-tensorflow-2) | [Dashboard](https://app.wandb.ai/sayakpaul/custom_training_loops_tf) \ No newline at end of file +* [MNIST example using TensorFlow Estimators](https://github.com/wandb/examples/blob/master/examples/tensorflow/tf-estimator-mnist/mnist.py). +* [Fashion MNIST example using raw TensorFlow](https://github.com/wandb/examples/blob/master/examples/tensorflow/tf-cnn-fashion/train.py). +* [W&B dashboard for the MNIST example](https://app.wandb.ai/l2k2/examples-tf-estimator-mnist/runs/p0ifowcb). +* Customizing training loops in TensorFlow 2: [article](https://www.wandb.com/articles/wandb-customizing-training-loops-in-tensorflow-2) and [dashboard](https://app.wandb.ai/sayakpaul/custom_training_loops_tf). \ No newline at end of file diff --git a/models/integrations/torchtune.mdx b/models/integrations/torchtune.mdx index c88252d373..fd1fef0bca 100644 --- a/models/integrations/torchtune.mdx +++ b/models/integrations/torchtune.mdx @@ -1,24 +1,29 @@ --- title: PyTorch torchtune description: "Use W&B logging in PyTorch torchtune for tracking LLM fine-tuning experiments with the WandBLogger metric logger." +keywords: ["Meta torchtune", "WandbLogger config", "LLM SFT"] --- import { ColabLink } from '/snippets/_includes/colab-link.mdx'; -[torchtune](https://meta-pytorch.org/torchtune/stable/index.html) is a PyTorch-based library designed to streamline the authoring, fine-tuning, and experimentation processes for large language models (LLMs). Additionally, torchtune has built-in support for [logging with W&B](https://meta-pytorch.org/torchtune/stable/deep_dives/wandb_logging.html), enhancing tracking and visualization of training processes. +[torchtune](https://meta-pytorch.org/torchtune/stable/index.html) is a PyTorch-based library that streamlines authoring, fine-tuning, and experimentation for LLMs. torchtune also has built-in support for [logging with W&B](https://meta-pytorch.org/torchtune/stable/deep_dives/wandb_logging.html), which enhances tracking and visualization of training processes. + +This guide shows you how to enable W&B logging in torchtune recipes, configure the `WandBLogger` metric logger, understand which metrics torchtune tracks by default, and save model checkpoints to W&B Artifacts. It's for practitioners who fine-tune LLMs with torchtune and want to track experiments in W&B. - TorchTune training dashboard + torchtune training dashboard Check the W&B blog post on [Fine-tuning Mistral 7B using torchtune](https://wandb.ai/capecape/torchtune-mistral/reports/torchtune-The-new-PyTorch-LLM-fine-tuning-library---Vmlldzo3NTUwNjM0). -## W&B logging at your fingertips +## Enable W&B logging + +You can enable W&B logging in two ways: override arguments at launch from the command line, or edit the recipe's config file. Choose whichever fits your workflow. -Override command line arguments at launch: +Override command-line arguments at launch: ```bash tune run lora_finetune_single_device --config llama3/8B_lora_single_device \ @@ -44,7 +49,7 @@ log_every_n_steps: 5 Enable W&B logging on the recipe's config file by modifying the `metric_logger` section. Change the `_component_` to `torchtune.utils.metric_logging.WandBLogger` class. You can also pass a `project` name and `log_every_n_steps` to customize the logging behavior. -You can also pass any other `kwargs` as you would to the [wandb.init()](/models/ref/python/functions/init) method. For example, if you are working on a team, you can pass the `entity` argument to the `WandBLogger` class to specify the team name. +You can also pass any other `kwargs` as you would to the [wandb.init()](/models/ref/python/functions/init) method. For example, if you work on a team, you can pass the `entity` argument to the `WandBLogger` class to specify the team name. @@ -59,7 +64,7 @@ metric_logger: log_every_n_steps: 5 ``` - + ```shell tune run lora_finetune_single_device --config llama3/8B_lora_single_device \ metric_logger._component_=torchtune.utils.metric_logging.WandBLogger \ @@ -72,14 +77,14 @@ tune run lora_finetune_single_device --config llama3/8B_lora_single_device \ -## What is logged? +## Logged data -You can explore the W&B dashboard to see the logged metrics. By default W&B logs all of the hyperparameters from the config file and the launch overrides. +After you enable W&B logging, you can explore the W&B dashboard to see the logged metrics. By default, W&B logs all of the hyperparameters from the config file and the launch overrides, so you have a record of each run's configuration alongside its metrics. W&B captures the resolved config on the **Overview** tab. W&B also stores the config in YAML format on the [Files tab](https://wandb.ai/capecape/torchtune/runs/joyknwwa/files). - TorchTune configuration + torchtune configuration ### Logged metrics @@ -88,17 +93,17 @@ Each recipe has its own training loop. Check each individual recipe to see its l | Metric | Description | | --- | --- | -| `loss` | The loss of the model | -| `lr` | The learning rate | -| `tokens_per_second` | The tokens per second of the model | -| `grad_norm` | The gradient norm of the model | -| `global_step` | Corresponds to the current step in the training loop. Takes into account gradient accumulation, basically every time an optimizer step is taken, the model is updated, the gradients are accumulated and the model is updated once every `gradient_accumulation_steps` | +| `loss` | The loss of the model. | +| `lr` | The learning rate. | +| `tokens_per_second` | The tokens per second of the model. | +| `grad_norm` | The gradient norm of the model. | +| `global_step` | Corresponds to the current step in the training loop. Accounts for gradient accumulation. Each time an optimizer step runs, the model updates, the gradients accumulate, and the model updates once every `gradient_accumulation_steps`. | -`global_step` is not the same as the number of training steps. It corresponds to the current step in the training loop. Takes into account gradient accumulation, basically every time an optimizer step is taken the `global_step` is incremented by 1. For example, if the dataloader has 10 batches, gradient accumulation steps is 2 and run for 3 epochs, the optimizer will step 15 times, in this case `global_step` will range from 1 to 15. +`global_step` isn't the same as the number of training steps. It corresponds to the current step in the training loop and accounts for gradient accumulation. Each time an optimizer step runs, `global_step` increments by 1. For example, if the dataloader has 10 batches, gradient accumulation steps is 2, and you run for 3 epochs, the optimizer steps 15 times, so `global_step` ranges from 1 to 15. -The streamlined design of torchtune allows to easily add custom metrics or modify the existing ones. It suffices to modify the corresponding [recipe file](https://github.com/meta-pytorch/torchtune/tree/main/recipes), for example, computing one could log `current_epoch` as a percentage of the total number of epochs as following: +The design of torchtune lets you add custom metrics or modify existing ones. Modify the corresponding [recipe file](https://github.com/meta-pytorch/torchtune/tree/main/recipes). For example, you can log `current_epoch` as a percentage of the total number of epochs like this: ```python # inside `train.py` function in the recipe file @@ -109,23 +114,25 @@ self._metric_logger.log_dict( ``` -This is a fast evolving library, the current metrics are subject to change. If you want to add a custom metric, you should modify the recipe and call the corresponding `self._metric_logger.*` function. +The set of logged metrics can change between torchtune releases. To add a custom metric, modify the recipe and call the corresponding `self._metric_logger.*` function. ## Save and load checkpoints -The torchtune library supports various [checkpoint formats](https://meta-pytorch.org/torchtune/stable/deep_dives/checkpointer.html). Depending on the origin of the model you are using, you should switch to the appropriate [checkpointer class](https://meta-pytorch.org/torchtune/stable/deep_dives/checkpointer.html). +Save checkpoints to W&B Artifacts to version model weights alongside the metrics and configuration of each run, so you can reproduce results and compare model versions later. + +The torchtune library supports several [checkpoint formats](https://meta-pytorch.org/torchtune/stable/deep_dives/checkpointer.html). Depending on the origin of the model you use, you must switch to the appropriate [checkpointer class](https://meta-pytorch.org/torchtune/stable/deep_dives/checkpointer.html). -If you want to save the model checkpoints to [W&B Artifacts](/models/artifacts/), the simplest solution is to override the `save_checkpoint` functions inside the corresponding recipe. +To save the model checkpoints to [W&B Artifacts](/models/artifacts/), the recommended approach is to override the `save_checkpoint` functions inside the corresponding recipe. -Here is an example of how you can override the `save_checkpoint` function to save the model checkpoints to W&B Artifacts. +The following example shows how to override the `save_checkpoint` function to save the model checkpoints to W&B Artifacts. ```python def save_checkpoint(self, epoch: int) -> None: ... - ## Let's save the checkpoint to W&B - ## depending on the Checkpointer Class the file will be named differently - ## Here is an example for the full_finetune case + ## Save the checkpoint to W&B. + ## The file name depends on the Checkpointer Class. + ## The following is an example for the full_finetune case. checkpoint_file = Path.joinpath( self._checkpointer._output_dir, f"torchtune_model_{epoch}" ).with_suffix(".pt") diff --git a/models/integrations/ultralytics.mdx b/models/integrations/ultralytics.mdx index ba0f7362a2..6566e9750a 100644 --- a/models/integrations/ultralytics.mdx +++ b/models/integrations/ultralytics.mdx @@ -1,44 +1,53 @@ --- title: Ultralytics YOLO description: "Use W&B with Ultralytics YOLO models for experiment tracking, model checkpointing, and computer vision visualization." +keywords: ["YOLOv8", "YOLOv11", "ultralytics settings.yaml"] --- import { ColabLink } from '/snippets/_includes/colab-link.mdx'; -[Ultralytics](https://github.com/ultralytics/ultralytics) is the home for cutting-edge, state-of-the-art computer vision models for tasks like image classification, object detection, image segmentation, and pose estimation. Not only it hosts [YOLOv8](https://docs.ultralytics.com/models/yolov8/), the latest iteration in the YOLO series of real-time object detection models, but other powerful computer vision models such as [SAM (Segment Anything Model)](https://docs.ultralytics.com/models/sam/#introduction-to-sam-the-segment-anything-model), [RT-DETR](https://docs.ultralytics.com/models/rtdetr/), [YOLO-NAS](https://docs.ultralytics.com/models/yolo-nas/), etc. Besides providing implementations of these models, Ultralytics also provides us with out-of-the-box workflows for training, fine-tuning, and applying these models using an easy-to-use API. +[Ultralytics](https://github.com/ultralytics/ultralytics) provides computer vision models for tasks like image classification, object detection, image segmentation, and pose estimation. It hosts [YOLOv8](https://docs.ultralytics.com/models/yolov8/), an iteration in the YOLO series of real-time object detection models, along with other computer vision models such as [SAM (Segment Anything Model)](https://docs.ultralytics.com/models/sam/#introduction-to-sam-the-segment-anything-model), [RT-DETR](https://docs.ultralytics.com/models/rtdetr/), and [YOLO-NAS](https://docs.ultralytics.com/models/yolo-nas/). Ultralytics also provides ready-to-use workflows for training, fine-tuning, and applying these models through an API. + +This page shows computer vision practitioners how to integrate W&B with Ultralytics so that W&B automatically tracks and visualizes experiment metrics, model checkpoints, and predictions on validation or inference images. It covers installation, a training and validation workflow, and an inference-only workflow. ## Get started -1. Install `ultralytics` and `wandb`. +To use the integration, you must first install both `ultralytics` and `wandb` and confirm you're using a supported version of `ultralytics`. + +Install `ultralytics` and `wandb`: - - - ```shell - pip install --upgrade ultralytics==8.0.238 wandb + + +```shell +pip install --upgrade ultralytics==8.0.238 wandb - # or - # conda install ultralytics - ``` - - - ```bash - !pip install --upgrade ultralytics==8.0.238 wandb - ``` - - +# or +# conda install ultralytics +``` + + +```bash +!pip install --upgrade ultralytics==8.0.238 wandb +``` + + - The development team has tested the integration with `ultralyticsv8.0.238` and below. To report any issues with the integration, create a [GitHub issue](https://github.com/wandb/wandb/issues/new?template=sdk-bug.yml) with the tag `yolov8`. + +The development team tested the integration with `ultralytics` v8.0.238 and below. To report any issues with the integration, create a [GitHub issue](https://github.com/wandb/wandb/issues/new?template=sdk-bug.yml) with the tag `yolov8`. + + +With both packages installed, you can move on to instrumenting an Ultralytics workflow with W&B. ## Track experiments and visualize validation results -This section demonstrates a typical workflow of using an [Ultralytics](https://docs.ultralytics.com/modes/predict/) model for training, fine-tuning, and validation and performing experiment tracking, model-checkpointing, and visualization of the model's performance using [W&B](https://wandb.ai/site). +This section demonstrates a typical workflow that uses an [Ultralytics](https://docs.ultralytics.com/modes/predict/) model for training, fine-tuning, and validation, and that performs experiment tracking, model checkpointing, and visualization of the model's performance using [W&B](https://wandb.ai/site). -You can also check out about the integration in this report: [Supercharging Ultralytics with W&B](https://wandb.ai/geekyrakshit/ultralytics/reports/Supercharging-Ultralytics-with-Weights-Biases--Vmlldzo0OTMyMDI4) +For more information about the integration, see [Supercharging Ultralytics with W&B](https://wandb.ai/geekyrakshit/ultralytics/reports/Supercharging-Ultralytics-with-Weights-Biases--Vmlldzo0OTMyMDI4). -To use the W&B integration with Ultralytics, import the `wandb.integration.ultralytics.add_wandb_callback` function. +To use the W&B integration with Ultralytics, import the `wandb.integration.ultralytics.add_wandb_callback` function. This callback is the entry point that registers W&B logging with the Ultralytics model. ```python import wandb @@ -47,7 +56,7 @@ from wandb.integration.ultralytics import add_wandb_callback from ultralytics import YOLO ``` -Initialize the `YOLO` model of your choice, and invoke the `add_wandb_callback` function on it before performing inference with the model. This ensures that when you perform training, fine-tuning, validation, or inference, it automatically saves the experiment logs and the images, overlaid with both ground-truth and the respective prediction results using the [interactive overlays for computer vision tasks](/models/track/log/media/#image-overlays-in-tables) on W&B along with additional insights in a [`wandb.Table`](/models/tables/). +Next, initialize the `YOLO` model of your choice, and invoke the `add_wandb_callback` function on it before performing inference with the model. Attaching the callback before training enables automatic logging during each epoch. This ensures that when you perform training, fine-tuning, validation, or inference, W&B automatically saves the experiment logs and the images, overlaid with both ground-truth and the respective prediction results using the [interactive overlays for computer vision tasks](/models/track/log/media/#image-overlays-in-tables), along with additional insights in a [`wandb.Table`](/models/tables/). ```python with wandb.init(project="ultralytics", job_type="train") as run: @@ -65,7 +74,9 @@ with wandb.init(project="ultralytics", job_type="train") as run: model.train(project="ultralytics", data="coco128.yaml", epochs=5, imgsz=640) ``` -Here's how experiments tracked using W&B for an Ultralytics training or fine-tuning workflow looks like: +With the callback attached and training started, your run now streams training metrics, model checkpoints, and per-epoch validation visualizations to your W&B project. + +Here's how experiments tracked using W&B for an Ultralytics training or fine-tuning workflow look:
YOLO Fine-tuning Experiments
@@ -79,13 +90,11 @@ Here's how epoch-wise validation results are visualized using a [W&B Table](/mod -This section demonstrates a typical workflow of using an [Ultralytics](https://docs.ultralytics.com/modes/predict/) model for inference and visualizing the results using [W&B](https://wandb.ai/site). +This section demonstrates a typical workflow that uses an [Ultralytics](https://docs.ultralytics.com/modes/predict/) model for inference and visualizes the results using [W&B](https://wandb.ai/site). You can try out the code in Google Colab: [Open in Colab](https://wandb.me/ultralytics-inference). -You can also check out about the integration in this report: [Supercharging Ultralytics with W&B](https://wandb.ai/geekyrakshit/ultralytics/reports/Supercharging-Ultralytics-with-Weights-Biases--Vmlldzo0OTMyMDI4) - -In order to use the W&B integration with Ultralytics, we need to import the `wandb.integration.ultralytics.add_wandb_callback` function. +As with the training workflow, to use the W&B integration with Ultralytics, import the `wandb.integration.ultralytics.add_wandb_callback` function. ```python import wandb @@ -94,7 +103,7 @@ from wandb.integration.ultralytics import add_wandb_callback from ultralytics.engine.model import YOLO ``` -Download a few images to test the integration on. You can use still images, videos, or camera sources. For more information on inference sources, check out the [Ultralytics docs](https://docs.ultralytics.com/modes/predict/). +Next, download a few images to test the integration on. You can use still images, videos, or camera sources. For more information about inference sources, see the [Ultralytics docs](https://docs.ultralytics.com/modes/predict/). ```bash !wget https://raw.githubusercontent.com/wandb/examples/ultralytics/colabs/ultralytics/assets/img1.png @@ -103,7 +112,7 @@ Download a few images to test the integration on. You can use still images, vide !wget https://raw.githubusercontent.com/wandb/examples/ultralytics/colabs/ultralytics/assets/img5.png ``` -Initialize a W&B [run](/models/runs/) using `wandb.init()`. Next, Initialize your desired `YOLO` model and invoke the `add_wandb_callback` function on it before you perform inference with the model. This ensures that when you perform inference, it automatically logs the images overlaid with your [interactive overlays for computer vision tasks](/models/track/log/media/#image-overlays-in-tables) along with additional insights in a [`wandb.Table`](/models/tables/). +With the test images in place, initialize a W&B [run](/models/runs/) using `wandb.init()`. Next, initialize your desired `YOLO` model and invoke the `add_wandb_callback` function on it before you perform inference with the model. This ensures that when you perform inference, W&B automatically logs the images overlaid with your [interactive overlays for computer vision tasks](/models/track/log/media/#image-overlays-in-tables), along with additional insights in a [`wandb.Table`](/models/tables/). ```python # Initialize W&B Run @@ -126,14 +135,18 @@ with wandb.init(project="ultralytics", job_type="inference") as run: ) ``` -You do not need to explicitly initialize a run using `wandb.init()` in case of a training or fine-tuning workflow. However, if the code involves only prediction, you must explicitly create a run. + +You don't need to explicitly initialize a run using `wandb.init()` for a training or fine-tuning workflow. However, if the code involves only prediction, you must explicitly create a run. + + +After you run inference, W&B logs the predicted bounding boxes and segmentation masks to your W&B run as interactive overlays. -Here's how the interactive bbox overlay looks: +Here's how the interactive bounding box overlay looks:
WandB Image Overlay
-For more details, see the [W&B image overlays guide](/models/track/log/media/#image-overlays). +For more information, see the [W&B image overlays guide](/models/track/log/media/#image-overlays). ## More resources diff --git a/models/integrations/w-and-b-for-julia.mdx b/models/integrations/w-and-b-for-julia.mdx index ad0acbb859..9a29db64d8 100644 --- a/models/integrations/w-and-b-for-julia.mdx +++ b/models/integrations/w-and-b-for-julia.mdx @@ -1,11 +1,12 @@ --- description: "Integrate W&B with Julia to track experiments, log metrics, and visualize model performance from Julia programs." title: W&B for Julia +keywords: ["wandb.jl", "avik-pal", "Flux.jl"] --- -For those running machine learning experiments in the Julia programming language, a community contributor has created an unofficial set of Julia bindings called [wandb.jl](https://github.com/avik-pal/Wandb.jl) that you can use. +If you run machine learning experiments in Julia, you can use [`wandb.jl`](https://github.com/avik-pal/Wandb.jl), an unofficial set of Julia bindings created by a community contributor. -You can find examples [in the documentation](https://github.com/avik-pal/Wandb.jl/tree/main/docs/src/examples) on the wandb.jl repository. Their "Getting Started" example is here: +For more examples, see the [`wandb.jl` examples directory](https://github.com/avik-pal/Wandb.jl/tree/main/docs/src/examples). The following code is the getting started example from the `wandb.jl` repository: ```julia using Wandb, Dates, Logging diff --git a/models/integrations/xgboost.mdx b/models/integrations/xgboost.mdx index 290c29f5c8..65b7d55b29 100644 --- a/models/integrations/xgboost.mdx +++ b/models/integrations/xgboost.mdx @@ -1,12 +1,15 @@ --- description: "Integrate W&B with XGBoost to log gradient boosting metrics, feature importance, and model performance automatically." title: XGBoost +keywords: ["XGBClassifier", "xgb plot importance", "boosting rounds"] --- import { ColabLink } from '/snippets/_includes/colab-link.mdx'; -The `wandb` library has a `WandbCallback` callback for logging metrics, configs and saved boosters from training with XGBoost. Here you can see a [live W&B Dashboard](https://wandb.ai/morg/credit_scorecard) with outputs from the XGBoost `WandbCallback`. +This page shows you how to use the W&B integration with XGBoost to automatically log gradient boosting metrics, model configurations, feature importance, and trained boosters so you can track, compare, and reproduce your XGBoost experiments. + +The `wandb` library has a `WandbCallback` callback that logs metrics, configs, and saved boosters from training with XGBoost. See a [live W&B Dashboard](https://wandb.ai/morg/credit_scorecard) with outputs from the XGBoost `WandbCallback`. W&B Dashboard using XGBoost @@ -14,7 +17,7 @@ The `wandb` library has a `WandbCallback` callback for logging metrics, configs ## Get started -Logging XGBoost metrics, configs and booster models to W&B is as easy as passing the `WandbCallback` to XGBoost: +To log XGBoost metrics, configs, and booster models to W&B, pass the `WandbCallback` to XGBoost: ```python from wandb.integration.xgboost import WandbCallback @@ -28,41 +31,45 @@ with wandb.init() as run: bst.fit(X_train, y_train, callbacks=[WandbCallback(log_model=True)]) ``` -You can open [this notebook](https://wandb.me/xgboost) for a comprehensive look at logging with XGBoost and W&B +For a comprehensive look at logging with XGBoost and W&B, see the [XGBoost and W&B logging notebook](https://wandb.me/xgboost). ## `WandbCallback` reference ### Functionality -Passing `WandbCallback` to a XGBoost model will: -- log the booster model configuration to W&B -- log evaluation metrics collected by XGBoost, such as rmse, accuracy etc to W&B -- log training metrics collected by XGBoost (if you provide data to eval_set) -- log the best score and the best iteration -- save and upload your trained model to W&B Artifacts (when `log_model = True`) -- log feature importance plot when `log_feature_importance=True` (default). -- Capture the best eval metric in `wandb.Run.summary` when `define_metric=True` (default). + +Passing `WandbCallback` to an XGBoost model does the following: +- Logs the booster model configuration to W&B. +- Logs evaluation metrics collected by XGBoost, such as `rmse`, accuracy, and so on to W&B. +- Logs training metrics collected by XGBoost (if you provide data to `eval_set`). +- Logs the best score and the best iteration. +- Saves and uploads your trained model to W&B Artifacts (when `log_model = True`). +- Logs the feature importance plot when `log_feature_importance=True` (default). +- Captures the best eval metric in `wandb.Run.summary` when `define_metric=True` (default). ### Arguments -- `log_model`: (boolean) if True save and upload the model to W&B Artifacts -- `log_feature_importance`: (boolean) if True log a feature importance bar plot +- `log_model`: (boolean) if True, saves and uploads the model to W&B Artifacts. -- `importance_type`: (str) one of `{weight, gain, cover, total_gain, total_cover}` for tree model. weight for linear model. +- `log_feature_importance`: (boolean) if True, logs a feature importance bar plot. -- `define_metric`: (boolean) if True (default) capture model performance at the best step, instead of the last step, of training in your `run.summary`. +- `importance_type`: (str) one of `{weight, gain, cover, total_gain, total_cover}` for tree model. `weight` for linear model. +- `define_metric`: (boolean) if True (default), captures model performance at the best step, instead of the last step, of training in your `run.summary`. -You can review the [source code for WandbCallback](https://github.com/wandb/wandb/blob/main/wandb/integration/xgboost/xgboost.py). -For additional examples, check out the [repository of examples on GitHub](https://github.com/wandb/examples/tree/master/examples/boosting-algorithms). +Review the [source code for WandbCallback](https://github.com/wandb/wandb/blob/main/wandb/integration/xgboost/xgboost.py). + +For more examples, see the [repository of examples on GitHub](https://github.com/wandb/examples/tree/master/examples/boosting-algorithms). ## Tune your hyperparameters with Sweeps -Attaining the maximum performance out of models requires tuning hyperparameters, like tree depth and learning rate. W&B [Sweeps](/models/sweeps/) is a powerful toolkit for configuring, orchestrating, and analyzing large hyperparameter testing experiments. +W&B [Sweeps](/models/sweeps/) is a toolkit for configuring, orchestrating, and analyzing hyperparameter testing experiments. This section shows how to combine the XGBoost integration with W&B Sweeps to search across hyperparameter configurations. + +To improve model performance, tune hyperparameters like tree depth and learning rate. -You can also try this [XGBoost & Sweeps Python script](https://github.com/wandb/examples/blob/master/examples/wandb-sweeps/sweeps-xgboost/xgboost_tune.py). +You can also try this [XGBoost and Sweeps Python script](https://github.com/wandb/examples/blob/master/examples/wandb-sweeps/sweeps-xgboost/xgboost_tune.py). XGBoost performance comparison diff --git a/models/integrations/yolov5.mdx b/models/integrations/yolov5.mdx index 198889c4ff..d133a39f20 100644 --- a/models/integrations/yolov5.mdx +++ b/models/integrations/yolov5.mdx @@ -1,50 +1,52 @@ --- title: YOLOv5 description: "Use the built-in W&B integration in YOLOv5 for experiment tracking, model versioning, and prediction visualization." +keywords: ["train.py", "yolov5 weights", "detect.py"] --- import { ColabLink } from '/snippets/_includes/colab-link.mdx'; -[Ultralytics' YOLOv5](https://www.ultralytics.com/yolo) ("You Only Look Once") model family enables real-time object detection with convolutional neural networks without all the agonizing pain. +[Ultralytics' YOLOv5](https://www.ultralytics.com/yolo) ("You Only Look Once") model family enables real-time object detection with convolutional neural networks. -[W&B](https://wandb.com) is directly integrated into YOLOv5, providing experiment metric tracking, model and dataset versioning, rich model prediction visualization, and more. **It's as easy as running a single `pip install` before you run your YOLO experiments.** +[W&B](https://wandb.com) is directly integrated into YOLOv5, providing experiment metric tracking, model and dataset versioning, rich model prediction visualization, and more. **Run a single `pip install` before you run your YOLO experiments.** All W&B logging features are compatible with data-parallel multi-GPU training, such as with [PyTorch DDP](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html). ## Track core experiments -Simply by installing `wandb`, you'll activate the built-in W&B [logging features](/models/track/log/): system metrics, model metrics, and media logged to interactive [Dashboards](/models/track/workspaces/). -```python +To get started, install `wandb` alongside YOLOv5 and run training as usual. By installing `wandb`, you activate the built-in W&B [logging features](/models/track/log/): system metrics, model metrics, and media logged to interactive [Dashboards](/models/track/workspaces/). + +```bash pip install wandb git clone https://github.com/ultralytics/yolov5.git python yolov5/train.py # train a small network on a small dataset ``` -Just follow the links printed to the standard out by wandb. +Follow the links that wandb prints to standard out. - All these charts and more. + W&B dashboard showing YOLOv5 training metrics and system charts. ## Customize the integration -By passing a few simple command line arguments to YOLO, you can take advantage of even more W&B features. +Once experiment tracking is working, you can enable additional W&B features (such as model versioning, dataset versioning, and prediction visualization) by passing a few command-line arguments to YOLO. * If you pass a number to `--save_period`, W&B saves a [model version](/models/registry/) at the end of every `save_period` epochs. The model version includes the model weights and tags the best-performing model in the validation set. -* Turning on the `--upload_dataset` flag will also upload the dataset for data versioning. -* Passing a number to `--bbox_interval` will turn on [data visualization](../). At the end of every `bbox_interval` epochs, the outputs of the model on the validation set will be uploaded to W&B. +* Turning on the `--upload_dataset` flag also uploads the dataset for data versioning. +* Passing a number to `--bbox_interval` turns on data visualization. At the end of every `bbox_interval` epochs, W&B uploads the outputs of the model on the validation set. - -```python + +```bash python yolov5/train.py --epochs 20 --save_period 1 ``` - -```python + +```bash python yolov5/train.py --epochs 20 --save_period 1 \ --upload_dataset --bbox_interval 1 ``` @@ -66,5 +68,5 @@ Here's what that looks like. -With data and model versioning, you can resume paused or crashed experiments from any device, no setup necessary. Check out [the Colab ](https://wandb.me/yolo-colab) for details. +With data and model versioning, you can resume paused or crashed experiments from any device, no setup necessary. See [the Colab](https://wandb.me/yolo-colab) for details. \ No newline at end of file diff --git a/models/integrations/yolox.mdx b/models/integrations/yolox.mdx index 12da9e6839..cf08c17ca7 100644 --- a/models/integrations/yolox.mdx +++ b/models/integrations/yolox.mdx @@ -1,11 +1,14 @@ --- description: "Integrate W&B with YOLOX to track object detection model training, log metrics, and visualize detection results." title: YOLOX +keywords: ["anchor-free YOLO", "Megvii", "COCO detection"] --- import ApiKeyCreateStreamlined from "/snippets/_includes/api-key-create-streamlined.mdx"; -[YOLOX](https://github.com/Megvii-BaseDetection/YOLOX) is an anchor-free version of YOLO with strong performance for object detection. You can use the YOLOX W&B integration to turn on logging of metrics related to training, validation, and the system, and you can interactively validate predictions with a single command-line argument. +[YOLOX](https://github.com/Megvii-BaseDetection/YOLOX) is an anchor-free version of YOLO for object detection. You can use the YOLOX W&B integration to turn on logging of metrics related to training, validation, and the system, and to interactively validate predictions with a single command-line argument. + +This guide shows you how to authenticate with W&B, install the integration, and enable W&B logging when you train a YOLOX object detection model so you can track metrics and inspect predictions in the W&B UI. ## Sign up and create an API key @@ -21,17 +24,15 @@ An API key authenticates your machine to W&B. You can generate an API key from y To install the `wandb` library locally and log in: - + 1. Set the `WANDB_API_KEY` [environment variable](/models/track/environment-variables/) to your API key. ```bash - export WANDB_API_KEY= + export WANDB_API_KEY=[YOUR-API-KEY] ``` 1. Install the `wandb` library and log in. - - ```shell pip install wandb @@ -59,31 +60,45 @@ wandb.login() ## Log metrics -Use the `--logger wandb` command line argument to turn on logging with wandb. Optionally you can also pass all of the arguments that [`wandb.init()`](/models/ref/python/functions/init) expects; prepend each argument with `wandb-`. +With the `wandb` library installed and your machine authenticated, you can enable W&B logging from the YOLOX training script. + +Use the `--logger wandb` command-line argument to turn on logging with `wandb`. Optionally, you can also pass all of the arguments that [`wandb.init()`](/models/ref/python/functions/init) expects. Prepend each argument with `wandb-`. -`num_eval_imges` controls the number of validation set images and predictions that are logged to W&B tables for model evaluation. +`num_eval_imges` controls the number of validation set images and predictions that W&B logs to tables for model evaluation. + +Replace the following placeholders before you run the command: + +- `[PROJECT-NAME]`: The name of your W&B project. +- `[ENTITY]`: Your W&B entity (username or team name). +- `[RUN-NAME]`: A name for this training run. +- `[RUN-ID]`: A unique identifier for this run. +- `[SAVE-DIR]`: The directory where YOLOX saves checkpoints and logs. +- `[NUM-IMAGES]`: The number of validation images to log. +- `[BOOL]`: Whether to log checkpoints (`true` or `false`). ```shell -# login to wandb +# Log in to W&B wandb login -# call your yolox training script with the `wandb` logger argument +# Call your YOLOX training script with the wandb logger argument python tools/train.py .... --logger wandb \ - wandb-project \ - wandb-entity - wandb-name \ - wandb-id \ - wandb-save_dir \ - wandb-num_eval_imges \ - wandb-log_checkpoints + wandb-project [PROJECT-NAME] \ + wandb-entity [ENTITY] + wandb-name [RUN-NAME] \ + wandb-id [RUN-ID] \ + wandb-save_dir [SAVE-DIR] \ + wandb-num_eval_imges [NUM-IMAGES] \ + wandb-log_checkpoints [BOOL] ``` ## Example -[Example dashboard with YOLOX training and validation metrics ->](https://wandb.ai/manan-goel/yolox-nano/runs/3pzfeom) +After your training run starts, YOLOX streams training, validation, and system metrics to your W&B project, where you can compare runs and inspect predictions. See the following example for what a populated dashboard looks like. + +See the [example dashboard with YOLOX training and validation metrics](https://wandb.ai/manan-goel/yolox-nano/runs/3pzfeom). YOLOX training dashboard -Any questions or issues about this W&B integration? Open an issue in the [YOLOX repository](https://github.com/Megvii-BaseDetection/YOLOX). +If you have questions or issues about this W&B integration, open an issue in the [YOLOX repository](https://github.com/Megvii-BaseDetection/YOLOX).