`.
+
+**StatInference**
+: The shared submodule for datacard creation and limit/fit tooling (used by the HH analyses).
+
+**Task**
+: One stage of the pipeline with defined inputs, outputs and a run step. The unit LAW schedules.
+ See [Tasks & LAW](concepts/tasks-and-law.md) and the [Task reference](reference/tasks.md).
+
+**Version**
+: The `--version` label that namespaces a run's outputs so productions and tests don't collide.
+
+**WLCG**
+: The Worldwide LHC Computing Grid — the federation of sites (`T1_*`, `T2_*`, `T3_*`) where CMS
+ data and FLAF outputs are stored.
+
+**Workflow**
+: A task that splits into many branches, runnable `local` or on `htcondor`.
diff --git a/docs/hh_bbtautau.md b/docs/hh_bbtautau.md
deleted file mode 100644
index c82346a5f..000000000
--- a/docs/hh_bbtautau.md
+++ /dev/null
@@ -1,78 +0,0 @@
-# HH->bb$\tau$$\tau$ analysis steps
-
-**Commands below assume that AnaTuples have already been produced. If not, please produce them following the instruction in the analysis section.**
-
-Remember that:
-
-- `ERA` variable is set. E.g.
- ```sh
- ERA=Run2_2016
- ```
- Alternatively you can add `ERA=Run2_2016; ...` in front of each command.
- Run2 possible eras are: `Run2_2016`,`Run2_2016_HIPM`,`Run2_2017` and `Run2_2018`
-
-- when expliciting `VERSION_NAME` variable, its name contains explicitly the deepTau version: `VERSION_NAME= vXX_deepTauYY_ZZZ`, where:
- - XX is the anaTuple version (if not the first production it can be useful to have `v1,v2,..`),
- - YY is the deepTau version (`2p1` or `2p5`)
- - ZZZ are other eventual addition (e.g. if only tauTau channel `_onlyTauTau` or if `Zmumu` ntuples `_Zmumu`..)
-
-- `--workflow` can be `htcondor` or `local`. It is recommended to develop and test locally and then switch to `htcondor` for production. In examples below `--workflow local` is used for illustration purposes.
-- when running on `htcondor` it is recommended to add `--transfer-logs` to the command to transfer logs to local.
-- `--customisations` argument is used to pass custom parameters to the task in form param1=value1,param2=value2,...
- **IMPORTANT for HHbbTauTau analysis:** if running using deepTau 2p5 add `--customisations deepTauVersion=2p5`
-- if you want to run only on few files, you can specify list of branches to run using `--branches` argument. E.g. `--branches 2,7-10,17`.
-- to get status, use `--print-stauts N,K` where N is depth for task dependencies, K is depths for file dependencies. E.g. `--print-status 3,1`.
-- to remove task output use `--remove-output N,a`, where N is depth for task dependencies. E.g. `--remove-output 0,a`.
-- it is highly recommended to limitate the maximum number of parallel jobs running adding `--parallel-jobs M` where M is the number of the parallel jobs (e.g. M=100)
-
-## Create anaCacheTuple
-
-For each Anatuple, an anaCacheTuple (storing observables which are computationally heavier) will be created.
-
-```sh
-law run AnaCacheTupleTask --period ${ERA} --version ${VERSION_NAME}
-```
-**Note**: at the `AnaCacheTupleTask` stage, the addition of customisation for specifying the version is still needed. For the other tasks, it won't be needed anymore.
-
-
-#### Merge data in anaCache tuples
-
-```sh
-law run DataCacheMergeTask --period ${ERA} --version ${VERSION_NAME}
-```
-
-
-### Histograms Production
-
-This has to be run after AnaTupleTask but **not necessairly** after AnaCacheTupleTask, if the variable to plot is not stored inside AnaCacheTuples.
-
-These task will produce histograms with observables that need to be specified inside the `Analysis/tasks.py` file, specifically inside the `vars_to_plot` list.
-
-The tasks to run are the following:
-
-1. `HistProducerFileTask`: for each AnaTuple an histogram of the corresponding variable will be created.
- ```sh
- law run HistProducerFileTask --period $ERA --version ${VERSION_NAME}
- ```
-1. `HistProducerSampleTask`: all the histogram belonging to a specific sample will be merged in one histogram.
- ```sh
- law run HistProducerSampleTask --period $ERA --version ${VERSION_NAME}
- ```
-1. `MergeTask`: all the histogram will be merged from samples to only one histograms under the folder `${HISTOGRAMS}/all_histograms/` to a specific sample will be merged in one histogram. At this stage, for each norm/shape uncertainty (+ central scenario) will be created one histogram.
- ```sh
- law run MergeTask --period $ERA --version ${VERSION_NAME}
- ```
- Each histograms will be named as: `all_histograms_UNCERTAINTY.root` where uncertainty can be [Central, TauES_DM0, ecc....]
-
-1. `HaddMergedTask`: all the merged histograms (produced separately for each uncertainty) will be merged in only one file.
- ```sh
- law run HaddMergedTask --period $ERA --version ${VERSION_NAME}
- ```
- Tip: It's very fast so it can be convenient to run this task in local.
- The final histogram will be named as: `all_histograms_Hadded.root`
-
-## How to run HHbtag training skim ntuple production
-```sh
-python Studies/HHBTag/CreateTrainingSkim.py --inFile $CENTRAL_STORAGE/prod_v1/nanoAOD/2018/GluGluToBulkGravitonToHHTo2B2Tau_M-350.root --outFile output/skim.root --mass 350 --sample GluGluToBulkGraviton --year 2018 >& EventInfo.txt
-python Common/SaveHisto.txt --inFile $CENTRAL_STORAGE/prod_v1/nanoAOD/2018/GluGluToBulkGravitonToHHTo2B2Tau_M-350.root --outFile output/skim.root
-```
diff --git a/docs/index.md b/docs/index.md
index 397c4f5cb..25a1c7f50 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,50 +1,87 @@
# FLAF
-FLAF - Flexible LAW-based Analysis Framework.
-Task workflow managed is done via [LAW](https://github.com/riga/law) (Luigi Analysis Framework).
-
-## How to install
-1. Setup ssh keys:
- - On GitHub [settings/keys](https://github.com/settings/keys)
- - On CERN GitLab [profile/keys](https://gitlab.cern.ch/-/profile/keys)
-
-1. Clone the repository:
- ```sh
- git clone --recursive git@github.com:cms-flaf/Framework.git FLAF
- ```
-
-1. Create a user customisation file `config/user_custom.yaml`. It should contain all user-specific modifications that you don't want to be committed to the central repository. Below is example of minimal content of the file (replace `USER_NAME` and `ANA_FOLDER` with your values):
- ```yaml
- fs_default:
- - 'T3_CH_CERNBOX:/store/user/USER_NAME/ANA_FOLDER/'
- fs_anaCache:
- - 'T3_CH_CERNBOX:/store/user/USER_NAME/ANA_FOLDER/'
- fs_anaTuple:
- - 'T3_CH_CERNBOX:/store/user/USER_NAME/ANA_FOLDER/'
- fs_anaCacheTuple:
- - 'T3_CH_CERNBOX:/store/user/USER_NAME/ANA_FOLDER/'
- fs_histograms:
- - 'T3_CH_CERNBOX:/store/user/USER_NAME/ANA_FOLDER/histograms/'
- fs_json:
- - 'T3_CH_CERNBOX:/store/user/USER_NAME/ANA_FOLDER/jsonFiles/'
- analysis_config_area: config/HH_bbtautau
- compute_unc_variations: true
- store_noncentral: true
- ```
-
-## How to load environment
-1. Following command activates the framework environment:
- ```sh
- source env.sh
- ```
-
-1. For the new installation or after you implement new law tasks, you need to update the law index:
- ```sh
- law index --verbose
- ```
-
-1. Initialize voms proxy:
- ```sh
- voms-proxy-init -voms cms -rfc -valid 192:00
- ```
+**FLAF** — the **F**lexible **LA**W-based Analysis **F**ramework — is the shared software
+framework behind several CMS Higgs-sector analyses at CERN. It turns CMS
+[NanoAOD](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookNanoAOD) files into the
+analysis ntuples, histograms, plots and statistical results that go into a physics paper.
+FLAF organises this work as a chain of **tasks** managed by
+[LAW](https://github.com/riga/law) (the Luigi Analysis Workflow). You describe *what* you want
+(for example, "the final plots for the 2022 data"); LAW figures out *which* intermediate steps
+are needed, runs only those, and can dispatch them to the CERN HTCondor batch system.
+
+!!! tip "New here? You are in the right place."
+ These docs assume **no prior experience** with LAW, Luigi or batch computing. If your
+ background is physics rather than software engineering, start with
+ [Key terms](getting-started/key-terms.md) and the [Getting started](getting-started/prerequisites.md)
+ track — every concept is introduced from scratch.
+
+## The big picture
+
+FLAF is shared by three analyses — **HH→bb̄ττ**, **HH→bb̄WW** and **H→μμ** — which all run the
+same pipeline. From CMS NanoAOD to final results, the stages are:
+
+```mermaid
+flowchart TD
+ NANO[CMS NanoAOD
on DAS / WLCG] --> IFT[InputFileTask
resolve the file list]
+ IFT --> ATF[AnaTupleFileTask
produce analysis ntuples]
+ ATF --> ATM[AnaTupleMergeTask
merge per dataset]
+ ATM --> HTP[HistTupleProducerTask
compute analysis observables]
+ HTP --> HFN[HistFromNtupleProducerTask
fill histograms]
+ HFN --> HM[HistMergerTask
merge histograms]
+ HM --> HP[HistPlotTask
make plots]
+ HP --> STAT[Statistical inference
datacards, limits, scans]
+```
+
+Each box is a LAW **task**. You normally run only the *last* task you care about — LAW pulls in
+everything upstream automatically. The whole pipeline is explained step by step in the
+[full-workflow walkthrough](workflow/walkthrough.md).
+
+## How to read these docs
+
+The documentation is organised so you can enter at the level you need.
+
+
+
+- :material-rocket-launch: **I'm new — get me running**
+
+ Follow the [Getting started](getting-started/prerequisites.md) track: prerequisites →
+ installation → your first run. Then skim [Key terms](getting-started/key-terms.md).
+
+- :material-lightbulb-on: **I want to understand how it works**
+
+ Read [Concepts](concepts/architecture.md): the architecture, what a LAW task is, the data
+ flow, the configuration system, eras, storage and the environment.
+
+- :material-cog: **I need to run the analysis**
+
+ Use the [Full workflow](workflow/walkthrough.md) walkthrough and the
+ [Command arguments](workflow/arguments.md) cheat-sheet. Scale up with
+ [Running on HTCondor](workflow/htcondor.md).
+
+- :material-book-open-variant: **I'm looking something up**
+
+ Jump to the [Task reference](reference/tasks.md), the [Configuration guide](configuration/user-custom.md),
+ the [Glossary](glossary.md) or [Troubleshooting](troubleshooting.md).
+
+
+
+## What FLAF is — and is not
+
+- FLAF is a **framework**, not an analysis. It lives in the [`cms-flaf/FLAF`](https://github.com/cms-flaf/FLAF)
+ repository and is included as a **git submodule** inside each analysis repository. You never
+ clone FLAF on its own to run an analysis — you clone an analysis repository (which brings FLAF
+ with it). See [Architecture](concepts/architecture.md) and [Installation](getting-started/installation.md).
+- FLAF provides the **common machinery**: the task definitions, the configuration system, the
+ environment setup, the storage abstraction and the CI. The **physics specifics** (which
+ signals, which observables, which categories) live in each analysis repository and are
+ documented there — see [Analyses](analyses.md).
+
+## Getting help
+
+- **Something went wrong?** Check [Troubleshooting](troubleshooting.md) first — it collects the
+ most common pitfalls and their fixes.
+- **An unfamiliar word?** The [Glossary](glossary.md) translates framework vocabulary into
+ analyst terms.
+- **Found a docs problem?** Use the :material-pencil: edit icon on any page to open a pull
+ request, or open an issue on [GitHub](https://github.com/cms-flaf/FLAF/issues).
diff --git a/docs/reference/tasks.md b/docs/reference/tasks.md
new file mode 100644
index 000000000..f8cac1501
--- /dev/null
+++ b/docs/reference/tasks.md
@@ -0,0 +1,80 @@
+# Task reference
+
+A concise reference for every FLAF task: what it does, what it branches over, and its task-specific
+parameters. The **common** parameters (`--version`, `--period`, `--workflow`, `--branches`,
+`--test`, …) apply to all of them and are documented in [Command arguments](../workflow/arguments.md).
+
+Production tasks live in `FLAF/AnaProd/tasks.py` (invoke as `FLAF.AnaProd.tasks.`); analysis
+tasks live in `FLAF/Analysis/tasks.py` (invoke as `FLAF.Analysis.tasks.`). For the order in
+which they run, see the [walkthrough](../workflow/walkthrough.md) and
+[data flow](../concepts/data-flow.md).
+
+## Production tasks (`AnaProd`)
+
+### `InputFileTask`
+Resolves the concrete list of NanoAOD files for the requested datasets and era (from DAS). Runs
+locally (it is a `LocalWorkflow`, not submitted to HTCondor) and is cheap. Every downstream task
+depends on it, so it runs first.
+
+### `AnaTupleFileTask`
+Runs the analysis producer (`AnaProd/anaTupleProducer.py`, inside CMSSW) over input files to create
+**anaTuples**. **Branches over input files** (one branch per NanoAOD file) — the workflow you most
+often submit to HTCondor.
+
+### `AnaTupleFileListBuilderTask` / `AnaTupleFileListTask`
+Helper workflows that assemble the lists of per-file anaTuples to be merged. Normally pulled in
+automatically as dependencies of the merge step; you rarely call them directly.
+
+### `AnaTupleMergeTask`
+Merges the per-file anaTuples into one anaTuple per dataset (data merged across runs).
+
+- **Parameter:** `--delete-inputs-after-merge` (bool, default `false`) — remove the per-file
+ inputs once the merge succeeds, to save space.
+
+## Analysis tasks (`Analysis`)
+
+### `HistTupleProducerTask`
+Reads merged anaTuples and computes the analysis **observables** (the configured "payload
+producers"), writing **histTuples**.
+
+### `HistFromNtupleProducerTask`
+Fills **histograms** of the requested variables from the histTuples, including systematic
+variations. **Branches over variables.**
+
+- **Parameters:** `--variables` (string; restrict which variables), `--n-var-batches` (int,
+ default `10`; how variables are grouped into branches).
+
+### `HistMergerTask`
+Merges the per-piece histograms into per-process histograms ready for plotting and fitting.
+
+- **Parameter:** `--variables` (string; restrict which variables).
+
+### `AnalysisCacheTask`
+Pre-computes a per-event payload that later stages reuse — most importantly the **b-tag shape**
+weights in HH→bb̄WW. Pulled in automatically when an analysis needs it.
+
+- **Parameter:** `--producer-to-run` (which cached payload producer to run).
+- **Caveat:** on a cold cache this can be **time-consuming** (≈ 1 h per branch). Reuse it across
+ runs via a [per-task version override](../workflow/arguments.md#per-task-version-overrides).
+
+### `AnalysisCacheAggregationTask`
+Aggregates the cached payloads produced by `AnalysisCacheTask` into the form the histogram stages
+consume.
+
+- **Parameter:** `--producer-to-aggregate`.
+
+### `HistPlotTask`
+Produces the final **plots**. **Branches over variables** (one branch per variable).
+
+- **Parameter:** `--variables` (string; restrict which variables).
+
+## Statistical-inference tasks
+
+The limit/fit tasks (e.g. `PlotResonantLimits`, `PlotPullsAndImpacts`) come from the
+`StatInference` and `inference`/`dhi` submodules and run inside CMSSW/Combine. They are
+analysis-specific — see each HH analysis's **Statistical inference** page (via
+[Analyses](../analyses.md)) and the [walkthrough](../workflow/walkthrough.md#stage-5-statistical-inference).
+
+!!! tip "Discover parameters from the command line"
+ `law run --help` lists every parameter a task accepts, including the ones inherited from
+ the base classes.
diff --git a/docs/stat_inference.md b/docs/stat_inference.md
deleted file mode 100644
index 39223de62..000000000
--- a/docs/stat_inference.md
+++ /dev/null
@@ -1,36 +0,0 @@
-## How to run limits
-1. As a temporary workaround, if you want to run multiplie commands, to avoid delays to load environment each time run:
- ```sh
- cmbEnv /bin/zsh # or /bin/bash
- ```
- Alternatively add `cmbEnv` in front of each command. E.g.
- ```sh
- cmbEnv python3 -c 'print("hello")'
- ```
-
-1. Create datacards.
- ```sh
- python3 StatInference/dc_make/create_datacards.py --input PATH_TO_SHAPES --output PATH_TO_CARDS --config PATH_TO_CONFIG
- ```
- Available configurations:
- - For X->HH>bbtautau Run 2: [StatInference/config/x_hh_bbtautau_run2.yaml](https://github.com/cms-flaf/StatInference/blob/main/config/x_hh_bbtautau_run2.yaml)
- - For X->HH->bbWW Run 3: [StatInference/config/x_hh_bbww_run3.yaml](https://github.com/cms-flaf/StatInference/blob/main/config/x_hh_bbww_run3.yaml)
-
-1. Run limits.
- ```sh
- law run PlotResonantLimits --version dev --datacards 'PATH_TO_CARDS/*.txt' --xsec fb --y-log
- ```
- Hints:
- - use `--workflow htcondor` to submit on HTCondor (by default it runs locally)
- - add `--remove-output 4,a,y` to remove previous output files
- - add `--print-status 0` to get status of the workflow (where `0` is a depth). Useful to get the output file name.
- - for more details see [cms-hh inference documentation](https://cms-hh.web.cern.ch/tools/inference/)
-
-2. Plot Pulls and Impacts
- ```sh
- PlotPullsAndImpacts --version dev --datacards "PATH_TO_CARDS/specific_card.txt" --hh-model NO_STR --parameter-values r=1 --parameter-ranges r,-100,100 --method robust --PlotPullsAndImpacts-order-by-impact True --mc-stats True --PullsAndImpacts-custom-args="--expectSignal=1"
- ```
- Hints:
- - Don't use datacards as *.txt because pulls should be done for each mass point separately
- - add `--remove-output 4,a,y` to remove previous output files
- - add `--print-status 0` to get status of the workflow (where `0` is a depth). Useful to get the output file name.
\ No newline at end of file
diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md
new file mode 100644
index 000000000..c89f83797
--- /dev/null
+++ b/docs/troubleshooting.md
@@ -0,0 +1,98 @@
+# Troubleshooting & FAQ
+
+The most common ways a FLAF run goes wrong, and how to fix them. If your symptom is not here, check
+the job logs (run with `--transfer-logs` on HTCondor) and the task status
+(`--print-status 3,1`).
+
+## `law: command not found`
+You did not `source env.sh` in this shell. Every new terminal needs it once
+([Installation](getting-started/installation.md)).
+
+## Import errors / empty submodule directories
+You cloned without `--recursive`, so submodules (FLAF, PlotKit, physics tools) are empty. Fix:
+
+```sh
+git submodule update --init --recursive
+```
+
+## A run unexpectedly drops into `InputFileTask` / DAS errors
+For a from-scratch production, `InputFileTask` running first is normal. But if a run that should
+reuse existing outputs keeps re-resolving inputs, or fails here, the cause is almost always a
+**wrong `--period` or `--version`** (so the expected upstream outputs aren't found and LAW falls
+back to regenerating them), or an **expired proxy**. Double-check the era/version, and:
+
+```sh
+voms-proxy-info # is it still valid?
+voms-proxy-init -voms cms -rfc -valid 192:00
+```
+
+## "Permission denied" / "file not found" on storage
+Usually an **expired VOMS proxy** — grid/EOS access needs a valid one. Re-run `voms-proxy-init`. If
+it persists, confirm your `fs_*` paths in `user_custom.yaml` are correct and writable
+([Storage](concepts/storage.md)).
+
+## "Task not found" after adding a task
+LAW's index is stale. Re-run:
+
+```sh
+law index --verbose
+```
+
+Needed after **adding/renaming/moving** a task class (not after editing an existing one's body).
+
+## EOS read-after-write lag
+EOS is eventually consistent: a file you just wrote can be briefly invisible to an existence check
+(seconds, occasionally longer). In normal pipeline use FLAF tolerates this. If **your own** script
+checks for freshly written outputs and intermittently "can't find" them, don't trust a single
+`exists()` — list the parent directory and retry a few times with a short delay.
+
+## Cross-analysis environment contamination
+The environment caches paths in variables (`FLAF_PATH`, `ANALYSIS_PATH`, `ANALYSIS_SOFT_PATH`, …).
+Reusing a shell that already set up a *different* analysis can pick up the wrong `flaf_env` and
+produce baffling failures.
+
+- **Interactive:** use a **fresh shell per analysis** and `source env.sh` there.
+- **Scripted/background runs:** unset the FLAF/analysis variables before sourcing, but **keep**
+ `LD_LIBRARY_PATH`, `HOME` and `PATH`:
+
+```sh
+unset FLAF_ENVIRONMENT_PATH ANALYSIS_SOFT_PATH LAW_HOME LAW_CONFIG_FILE \
+ ANALYSIS_PATH ANALYSIS_DATA_PATH FLAF_PATH FLAF_CMSSW_BASE \
+ FLAF_CMSSW_ARCH FLAF_CMSSW_VERSION FLAF_COMBINE_PATH \
+ X509_USER_PROXY VIRTUAL_ENV PYTHONPATH
+cd /path/to/
+source env.sh
+```
+
+## ROOT/cling library or JIT errors in a background run
+You launched the environment under `env -i`, which strips `LD_LIBRARY_PATH` (ROOT/cling needs it).
+Preserve it (and `HOME`, `PATH`) when starting a clean shell. See
+[The environment](concepts/environment.md#sharp-edges).
+
+## `source env.sh` sets the wrong path / fails to locate itself
+You sourced it via `bash -c "source env.sh"`. That breaks `BASH_SOURCE` self-detection and sets the
+wrong `ANALYSIS_PATH`. Source it directly in your interactive shell, or put your commands in a
+**script file** and run that file.
+
+## HH→bb̄WW: the run sits in `AnalysisCacheTask` for a long time
+Expected on a cold cache: `AnalysisCacheTask` computes the b-tag shape weights and can take roughly
+an hour per branch. Reuse an existing cache across runs with a
+[per-task version override](workflow/arguments.md#per-task-version-overrides) instead of
+recomputing it every time.
+
+## A backgrounded `law run` won't stop when I kill it
+Killing the parent leaves child `law`/job processes alive. Kill by pattern, and remove batch jobs:
+
+```sh
+pkill -f "version="
+condor_rm # if you submitted to HTCondor
+```
+
+## My edits to FLAF/Corrections are ignored
+You edited the submodule copy but the run used a different one — or vice-versa. The run uses
+`FLAF_PATH`/`CORRECTIONS_PATH`; set them to your edited copy **before** `source env.sh`. See
+[Developing shared submodules](concepts/environment.md#developing-shared-submodules).
+
+## The first `source env.sh` takes forever
+Expected: the first time it builds CMSSW and Combine (tens of minutes, a few GB under `soft/`).
+Subsequent sources are quick. Don't interrupt the first build.
diff --git a/docs/workflow/arguments.md b/docs/workflow/arguments.md
new file mode 100644
index 000000000..2b54dd1cc
--- /dev/null
+++ b/docs/workflow/arguments.md
@@ -0,0 +1,93 @@
+# Command arguments
+
+A reference for the options you pass to `law run`. The **common** ones are defined on FLAF's base
+task classes (`FLAF/run_tools/law_customizations.py`), so they work on **every** FLAF task. LAW
+also provides built-in options for status and cleanup.
+
+!!! note "Underscores become dashes on the command line"
+ A parameter named `transfer_logs` in the code is `--transfer-logs` on the CLI;
+ `anaTuple_version` is `--anaTuple-version`, and so on.
+
+## Common task options
+
+| Option | Default | Meaning |
+|---|---|---|
+| `--version` | *(required)* | Label that namespaces this run's outputs. Different versions never collide. |
+| `--period` | *(required)* | The [era](../concepts/eras.md), e.g. `Run3_2022`. |
+| `--workflow` | `local` | `local` (this machine) or `htcondor` (batch). See [HTCondor](htcondor.md). |
+| `--branches` | *(all)* | Which branches to run, e.g. `0`, `0,2`, `5-7`. Restricts only the launched task, not its dependencies. |
+| `--test` | `-1` | Process only N events per input file (`-1` = all). Great for smoke tests. |
+| `--process` | `""` | Restrict to one process (e.g. `custom_CI_Signal`). |
+| `--dataset` | `""` | Restrict to one dataset. |
+| `--model` | `""` | Override the physics model for this run. |
+| `--customisations` | `""` | Ad-hoc `key=value,key=value` overrides (see below). |
+| `--user-custom` | `""` | Path to an extra `user_custom`-style YAML, loaded last (see below). |
+
+## HTCondor options (on every workflow task)
+
+| Option | Default | Meaning |
+|---|---|---|
+| `--transfer-logs` | off | Bring job logs back to `data/`. Recommended. |
+| `--parallel-jobs` | *(unbounded)* | Cap concurrent branches, e.g. `--parallel-jobs 100`. |
+| `--max-runtime` | *(task default)* | Per-job wall-clock limit. |
+| `--n-cpus` | `1` | CPUs requested per job. |
+| `--priority` | `0` | Job priority. |
+| `--bundle` | off | Ship a code/environment tarball to the worker. See [HTCondor → bundles](htcondor.md#bundles-shipping-the-code-to-workers). |
+| `--htcondor-spool` | off | Spool job files to the schedd. |
+
+## Status & cleanup (LAW built-ins)
+
+| Option | Meaning |
+|---|---|
+| `--print-status N,K` | Show the dependency tree status to task depth `N`, file-collection depth `K`. Also prints output paths. `--print-status 3,1` is a good default. |
+| `--print-deps N` | Print the dependency tree to depth `N` without checking outputs. |
+| `--remove-output N,a,y` | Remove outputs to depth `N` (`a` = all branches, `y` = no prompt). Forces a recompute. **Deletes real files** — check the version first. |
+
+## `--customisations`
+
+Pass analysis-specific overrides as a comma-separated list:
+
+```sh
+--customisations key1=value1,key2=value2
+```
+
+!!! info "HH→bb̄ττ: select the DeepTau version"
+ To run with DeepTau 2.5, add `--customisations deepTauVersion=2p5`. (See the HH_bbtautau docs.)
+
+## `--user-custom`: per-run config overlay
+
+`--user-custom ` loads an extra YAML **on top of** your `config/user_custom.yaml` (loaded
+last, so its values win). Use an absolute path or one relative to `$ANALYSIS_PATH`. It is the
+cleanest way to change settings for a single run without editing your committed file:
+
+```sh
+law run FLAF.Analysis.tasks.HistPlotTask \
+ --version test --period Run3_2022 --workflow local --branches 0 --test 1000 \
+ --user-custom /afs/.../config/user_custom/test_local/HH_bbtautau.yaml
+```
+
+See [`user_custom.yaml`](../configuration/user-custom.md).
+
+## Per-task version overrides
+
+Every task carries its own `--version`, so you can make one run **read** an existing upstream
+production while **writing** its downstream outputs under a new version. Override an upstream task's
+version with `---version`:
+
+```sh
+law run FLAF.Analysis.tasks.HistTupleProducerTask \
+ --version my_dev \
+ --AnaTupleMergeTask-version v2605 \
+ --AnaTupleFileListTask-version v2605 \
+ --period Run3_2022EE --workflow local
+```
+
+Here the anaTuples are reused from the central `v2605` production, while the histTuples are written
+under `my_dev`. This is the key to fast, parallel development: many people can share one upstream
+production without recomputing it. The base task also exposes related shortcuts
+(`--anaTuple-version`, `--anaCache-version`, `--ana-version`) used by some stages.
+
+!!! tip "`---` works generally"
+ LAW lets you set *any* parameter of *any* task in the dependency tree by prefixing it with the
+ task's class name. Version overrides are the most common case, but the same mechanism applies
+ to other parameters.
diff --git a/docs/workflow/htcondor.md b/docs/workflow/htcondor.md
new file mode 100644
index 000000000..50be05760
--- /dev/null
+++ b/docs/workflow/htcondor.md
@@ -0,0 +1,75 @@
+# Running on HTCondor
+
+Producing ntuples and histograms for a full era means processing thousands of files — far too much
+for one machine. FLAF tasks are **workflows** ([Tasks & LAW](../concepts/tasks-and-law.md)), so
+their branches can be submitted to CERN's **HTCondor** batch system. The recommended pattern is to
+**develop and test with `--workflow local`, then switch to `--workflow htcondor` for production** —
+the command is otherwise the same.
+
+## Submit a task to the batch system
+
+```sh
+law run FLAF.AnaProd.tasks.AnaTupleFileTask \
+ --period Run3_2022 --version prod \
+ --workflow htcondor \
+ --transfer-logs \
+ --parallel-jobs 100
+```
+
+| Option | Why you want it |
+|---|---|
+| `--workflow htcondor` | Submit branches as batch jobs instead of running locally. |
+| `--transfer-logs` | Bring each job's stdout/stderr back to your `data/` area. **Highly recommended** — without it, debugging a failed job is painful. |
+| `--parallel-jobs 100` | Cap how many jobs are in flight at once. Be a good citizen on the shared pool; very large uncapped submissions are discouraged. |
+| `--branches 0-99` | Submit only a subset (e.g. to retry a range). |
+
+Other HTCondor parameters available on every workflow task: `--max-runtime`, `--n-cpus`,
+`--priority`, `--htcondor-spool`. See [Command arguments](arguments.md).
+
+## Monitor and resume
+
+LAW tracks which branches have finished (by checking their outputs), so a re-run only resubmits the
+missing ones — batch jobs fail and time out, and resuming is normal. Check progress with:
+
+```sh
+law run FLAF.AnaProd.tasks.AnaTupleFileTask \
+ --period Run3_2022 --version prod --print-status 1,1
+```
+
+Standard `condor_q` / `condor_status` work for the underlying jobs.
+
+## Bundles: shipping the code to workers
+
+A batch worker needs your code and environment. FLAF supports two modes:
+
+- **Non-bundle jobs** rely on the shared AFS area being mounted on the worker: the job receives
+ `FLAF_PATH`/`CORRECTIONS_PATH` and runs the code straight from AFS (including any edits you made
+ via the [dev overlay](../concepts/environment.md#developing-shared-submodules)).
+- **Bundle jobs** ship a tarball of the code/environment to the worker (the `--bundle` flag and the
+ `BundleTask` machinery). The worker runs from the tarball and never reaches back to AFS, so it is
+ deliberately *not* given `FLAF_PATH`/`CORRECTIONS_PATH`. Bundles also set `FLAF_NO_INSTALL=1` so
+ the worker never tries to build the environment.
+
+For most work the defaults are correct; you only think about bundles when a stage explicitly needs
+one (e.g. it declares a CMSSW bundle flavour) or when AFS is not available on the target pool.
+
+!!! tip "Your edits to FLAF *do* reach the workers"
+ Thanks to the dev overlay, non-bundle jobs run your edited `FLAF`/`Corrections`, and bundle
+ jobs include them in the tarball — so testing framework changes on HTCondor works without
+ committing first. See [Contributing](../contributing.md).
+
+## Caveats
+
+!!! warning "Keep your proxy valid for the whole run"
+ Jobs that outlive your VOMS proxy lose grid access mid-flight. Create a long-lived proxy
+ (`-valid 192:00`) before a big submission, and refresh it for long campaigns.
+
+!!! warning "Killing a background `law` leaves its jobs/children"
+ Pressing `Ctrl-C` or `kill`-ing a backgrounded `law` process does not necessarily stop the
+ branches it spawned. To stop everything for a run, match the processes by pattern, e.g.
+ `pkill -f "version=prod"`, and `condor_rm` the submitted jobs if needed.
+
+!!! note "Test small, then scale"
+ Validate a task with `--workflow local --branches 0 --test 1000` before submitting the full
+ workflow to HTCondor. A bug found on one local branch is far cheaper than one found across a
+ thousand batch jobs.
diff --git a/docs/workflow/walkthrough.md b/docs/workflow/walkthrough.md
new file mode 100644
index 000000000..797558ae4
--- /dev/null
+++ b/docs/workflow/walkthrough.md
@@ -0,0 +1,159 @@
+# Full workflow walkthrough
+
+This is the end-to-end tour of the pipeline: every stage, in order, with the command that runs it.
+Read it once to understand the chain; in day-to-day work you usually run only the *last* stage you
+need and let LAW produce the rest (see [the shortcut](#the-shortcut-just-ask-for-the-end)).
+
+The commands use `FLAF.AnaProd.tasks.*` for production stages and `FLAF.Analysis.tasks.*` for
+analysis stages — the fully-qualified task paths the framework registers.
+
+## Setup recap
+
+```sh
+cd HH_bbtautau # your analysis repository
+source env.sh # once per shell
+voms-proxy-info # confirm a valid proxy
+
+# Pick a data-taking era and a label for this production:
+ERA=Run3_2022
+VER=dev
+```
+
+Throughout, `--period $ERA` selects the [era](../concepts/eras.md) and `--version $VER` namespaces
+the [outputs](../concepts/data-flow.md#versions-keep-productions-apart). Add `--workflow local`
+to run on this machine; switch to `--workflow htcondor` to scale up
+([HTCondor guide](htcondor.md)).
+
+---
+
+## Stage 0 — Resolve the input files
+
+`InputFileTask` turns "the datasets for this era" into a concrete list of NanoAOD files (from DAS).
+Everything else depends on it, so it runs first — automatically when you launch a later stage, or
+explicitly:
+
+```sh
+law run FLAF.AnaProd.tasks.InputFileTask --period $ERA --version $VER --workflow local
+```
+
+It is fast and cheap. If a from-scratch run unexpectedly *stays* in `InputFileTask` or fails here,
+suspect a wrong `--period`/`--version` or an expired proxy.
+
+## Stage 1 — Produce and merge analysis ntuples (anaTuples)
+
+`AnaTupleFileTask` runs the analysis producer (`AnaProd/anaTupleProducer.py`, inside CMSSW) over
+each NanoAOD file — **one branch per file** — applying the object selections and
+[corrections](../concepts/architecture.md#common-vs-analysis-specific) and writing a slimmed
+**anaTuple**. `AnaTupleMergeTask` then merges the per-file pieces into one anaTuple per dataset.
+
+```sh
+# Produce per-file anaTuples (heavy; normally on HTCondor):
+law run FLAF.AnaProd.tasks.AnaTupleFileTask --period $ERA --version $VER --workflow local
+
+# Merge them per dataset:
+law run FLAF.AnaProd.tasks.AnaTupleMergeTask --period $ERA --version $VER --workflow local
+```
+
+!!! tip "Test on a few files first"
+ `--branches 0,1,2` runs only the first three input files, and `--test 1000` processes only
+ 1000 events per file. Combine both to smoke-test ntuple production quickly.
+
+## Stage 2 — Compute analysis observables (histTuples)
+
+`HistTupleProducerTask` reads the merged anaTuples and computes the heavier analysis
+**observables** (the "payload producers" configured in `global.yaml`), writing **histTuples**:
+
+```sh
+law run FLAF.Analysis.tasks.HistTupleProducerTask --period $ERA --version $VER --workflow local
+```
+
+!!! note "HH→bb̄WW: a caching step runs here first"
+ In HH→bb̄WW, `AnalysisCacheTask` (and `AnalysisCacheAggregationTask`) pre-compute and aggregate
+ per-event payloads — notably the **b-tag shape** weights — before histogramming. They are
+ pulled in automatically and can be **time-consuming** (budget roughly an hour per branch on a
+ cold cache). See the [HH_bbWW docs](../analyses.md) and the [Task reference](../reference/tasks.md).
+
+## Stage 3 — Fill and merge histograms
+
+`HistFromNtupleProducerTask` fills **histograms** of the requested variables from the histTuples —
+**one branch per variable** — including systematic variations. `HistMergerTask` merges the pieces
+into per-process histograms ready for plotting and fitting.
+
+```sh
+# Fill histograms (restrict variables with --variables, batch with --n-var-batches):
+law run FLAF.Analysis.tasks.HistFromNtupleProducerTask --period $ERA --version $VER --workflow local
+
+# Merge them:
+law run FLAF.Analysis.tasks.HistMergerTask --period $ERA --version $VER --workflow local
+```
+
+Which variables are produced is controlled by the analysis config and can be narrowed with the
+`--variables` parameter or the `variables:` list in `user_custom.yaml`.
+
+## Stage 4 — Make the plots
+
+`HistPlotTask` produces the final plots — **one branch per variable**:
+
+```sh
+law run FLAF.Analysis.tasks.HistPlotTask --period $ERA --version $VER --workflow local
+# one variable only:
+law run FLAF.Analysis.tasks.HistPlotTask --period $ERA --version $VER --workflow local --branches 0
+```
+
+This is the task you most often launch directly: asking for the plots makes LAW produce every
+upstream product that is missing.
+
+## Stage 5 — Statistical inference
+
+The two HH analyses turn the merged histograms into datacards and then run limits and diagnostics
+with [Combine](https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/), via the
+`StatInference` and `inference` submodules. H→μμ does not include this stage.
+
+Because these commands run inside CMSSW/Combine, prefix them with `cmsEnv` (or open a `cmsEnv`
+subshell once):
+
+```sh
+# 1) Create datacards from the produced shapes:
+cmsEnv python3 StatInference/dc_make/create_datacards.py \
+ --input \
+ --output \
+ --config # e.g. StatInference/config/x_hh_bbww_run3.yaml
+
+# 2) Run resonant limits:
+law run PlotResonantLimits --version $VER --datacards '/*.txt' --xsec fb --y-log
+
+# 3) Pulls & impacts (per mass point — point at a single card):
+PlotPullsAndImpacts --version $VER --datacards "/.txt" ...
+```
+
+The exact configs and options are analysis-specific — see each analysis's **Statistical
+inference** page (linked from [Analyses](../analyses.md)) and the
+[cms-hh inference docs](https://cms-hh.web.cern.ch/tools/inference/).
+
+---
+
+## The shortcut: just ask for the end
+
+You rarely run the stages one by one. Because every task knows its dependencies, launching a late
+stage runs all missing upstream stages automatically:
+
+```sh
+law run FLAF.Analysis.tasks.HistPlotTask --period $ERA --version $VER --workflow local
+```
+
+Run the individual stages explicitly only when you want to **stop at** an intermediate product
+(e.g. produce anaTuples for someone else to use), or to inspect/debug one stage.
+
+## See progress and redo selectively
+
+```sh
+# Status of the whole tree (task depth 3, file depth 1) — also prints output paths:
+law run FLAF.Analysis.tasks.HistPlotTask --period $ERA --version $VER --print-status 3,1
+
+# Force one stage to be recomputed:
+law run FLAF.Analysis.tasks.HistMergerTask --period $ERA --version $VER --remove-output 0,a,y
+```
+
+See [Command arguments](arguments.md) for the full option list, and [Running on HTCondor](htcondor.md)
+to take any of these commands to the batch system by swapping `--workflow local` for
+`--workflow htcondor`.
diff --git a/mkdocs.yml b/mkdocs.yml
index 0b8ce6fb1..f3cd8003c 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -1,13 +1,19 @@
site_name: FLAF
-repo_name: GitHub
-repo_url: https://github.com/cms-flaf/Framework
+site_description: >-
+ FLAF — the Flexible LAW-based Analysis Framework. Documentation for the shared
+ CMS analysis framework used by HH->bbtautau, HH->bbWW and H->mumu.
+repo_name: cms-flaf/FLAF
+repo_url: https://github.com/cms-flaf/FLAF
+edit_uri: edit/main/docs/
+
theme:
name: material
font: false
features:
- content.action.edit
- content.action.view
- - navigation.expand
+ - content.code.copy
+ - content.code.annotate
- navigation.footer
- navigation.indexes
- navigation.sections
@@ -26,6 +32,8 @@ markdown_extensions:
- def_list
- footnotes
- meta
+ - md_in_html
+ - tables
- toc:
permalink: true
# Python Markdown Extensions
@@ -38,7 +46,8 @@ markdown_extensions:
- pymdownx.emoji:
emoji_index: !!python/name:material.extensions.emoji.twemoji
emoji_generator: !!python/name:material.extensions.emoji.to_svg
- - pymdownx.highlight
+ - pymdownx.highlight:
+ anchor_linenums: true
- pymdownx.inlinehilite
- pymdownx.keys
- pymdownx.mark
@@ -53,26 +62,42 @@ markdown_extensions:
- pymdownx.tasklist:
custom_checkbox: true
- pymdownx.tilde
- - pymdownx.blocks.admonition
- - pymdownx.blocks.details
- - pymdownx.blocks.tab
plugins:
- search
-extra_css:
- - stylesheets/fonts.css
-
extra_javascript:
- https://unpkg.com/mermaid@9.3/dist/mermaid.min.js
- - https://polyfill.io/v3/polyfill.min.js?features=es6
- https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js
-
nav:
- Home: index.md
- - Analysis: analysis.md
- - HH->bbtautau: hh_bbtautau.md
- - Statistical inference: stat_inference.md
-
-#theme: readthedocs
\ No newline at end of file
+ - Getting started:
+ - Prerequisites: getting-started/prerequisites.md
+ - Installation: getting-started/installation.md
+ - Your first run: getting-started/first-run.md
+ - Key terms: getting-started/key-terms.md
+ - Concepts:
+ - Architecture: concepts/architecture.md
+ - Tasks & LAW: concepts/tasks-and-law.md
+ - Data flow: concepts/data-flow.md
+ - Configuration system: concepts/configuration.md
+ - Eras & periods: concepts/eras.md
+ - Storage & filesystems: concepts/storage.md
+ - The environment: concepts/environment.md
+ - Full workflow:
+ - Walkthrough: workflow/walkthrough.md
+ - Running on HTCondor: workflow/htcondor.md
+ - Command arguments: workflow/arguments.md
+ - Configuration guide:
+ - user_custom.yaml: configuration/user-custom.md
+ - Datasets: configuration/datasets.md
+ - Processes & models: configuration/processes-and-models.md
+ - Task reference: reference/tasks.md
+ - CI / CD:
+ - GitHub Actions: ci/github-actions.md
+ - Integration pipeline: ci/integration-pipeline.md
+ - Troubleshooting: troubleshooting.md
+ - Glossary: glossary.md
+ - Contributing: contributing.md
+ - Analyses: analyses.md