Skip to content

ryZhangHason/critband

Repository files navigation

critband — Critical Bandwidth Analysis of Multimodal Distributions

PyPI version Python versions arXiv License

critband is the release name for v0.2.3 on GitHub and PyPI.

critband is a Python package for analyzing critical bandwidth and related multimodality structure in kernel density estimation (KDE).

It finds the smallest bandwidth where a KDE transitions from k-modal to fewer modes, and it also exposes related summaries such as bimodality_strength() and excess_mass().

Paper: Zhang, R., & Wang, Q. (2026). critband: Critical Bandwidth Analysis for Multimodal Distributions in Python. arXiv:2605.18686. https://arxiv.org/abs/2605.18686

Quick Start

pip install critband
import numpy as np
from critband import critical_bandwidth

# Generate bimodal data
x = np.concatenate([np.random.normal(-2, 0.5, 200),
                    np.random.normal( 2, 0.5, 200)])

h_crit, success = critical_bandwidth(x)
print(f"Critical bandwidth: {h_crit:.4f}")
print(f"Converged: {success}")
# Output: Critical bandwidth: ~0.97
#         Converged: True

Features

Core Algorithm

Function Description
silverman_bandwidth(x) Silverman's rule-of-thumb bandwidth: 1.06 · min(σ, IQR/1.34) · n^(-1/5)
gaussian_kde(x, grid, h, use_fft=None) KDE via direct O(n·g) or FFT O(g·log(g)) computation; auto-selects FFT for n > 5000
critical_bandwidth(x, k=2, return_ci=False) Critical bandwidth for mode-count detection; auto/binary/brent methods; optionally returns bootstrap CI
find_modes(x, h) Detect all KDE modes at bandwidth h; returns ModeResult with per-mode position, height, width, prominence
find_trough(x, h) Find valley between the two highest KDE peaks; returns x-coordinate or None
detect_components(x) Decompose bimodal distribution into two Gaussian components; returns BimodalDecomposition
bimodality_strength(x) Comprehensive bimodality assessment; returns interpretable strength label + score in BimodalityStrength
silverman_test(x) Silverman's bootstrap test for bimodality; returns SilvermanTestResult dataclass
excess_mass(x) Müller & Sawitzki excess mass test for multimodality; detects any number of modes, returns ExcessMassResult

The critical_bandwidth function uses automatic method selection: Brent's method for well-separated data (converges in fewer iterations), binary search for weak/small-n cases. All three methods (auto, binary, brent) produce consistent results. Hartigan's dip test is included as a complementary unimodality check, not as a core differentiator of the package.

Methodology

Critical Bandwidth

The critical bandwidth is the smallest bandwidth $h_{\text{crit}}$ at which the kernel density estimate $\hat{f}(x; h)$ transitions from being bimodal to unimodal. When $h < h_{\text{crit}}$ the estimate has two modes; for $h \ge h_{\text{crit}}$ it is unimodal. A large critical bandwidth relative to the data scale indicates strong bimodality — a wide smoothing window is needed to merge the two modes.

Silverman's Rule of Thumb

Silverman (1986) proposed a reference bandwidth for Gaussian KDE:

$$h = 1.06 \cdot \min(\sigma, \text{IQR} / 1.34) \cdot n^{-1/5}$$

This is used as the baseline for the critical bandwidth search: the solver searches for $h_{\text{crit}}$ in the interval $[h/20, 10h]$, which captures the transition for typical bimodal distributions.

Critical Bandwidth Solver

The critical bandwidth is found via binary search on KDE mode counts (for any k). The solver:

  1. Automatically computes bounds from Silverman's rule: $[h_{\text{silverman}}/20, 10 \cdot h_{\text{silverman}}]$
  2. Uses method="auto" (default): Brent's method for large, well-separated data (fewer iterations); binary search for weak/small-n cases
  3. For k > 2, critical_bandwidth(x, k=3) finds the bandwidth where trimodality disappears, etc.
  4. Optionally returns a bootstrap confidence interval via return_ci=True

The dip_ratio (via _trough_ratio) is available as a descriptive bimodality strength measure but is not used as the optimizer objective. Hartigan's dip test is included as a complementary unimodality check, not as a core differentiator of the package.

Excess Mass Test

The excess_mass function implements the Müller & Sawitzki (1991) test for multimodality. It measures the amount of probability mass above threshold levels in a KDE, and uses bootstrap calibration to estimate the number of modes — unlike the Silverman test, it can detect any number of modes in a single call.

Related Work

  • Silverman (1981) — The critical bandwidth test for multimodality in kernel density estimates
  • Hartigan & Hartigan (1985) — The Dip Test of Unimodality, a complementary non-parametric approach
  • Müller & Sawitzki (1991) — Excess mass estimates and tests for multimodality
  • Hall & York (2001) — On the calibration of Silverman's test for multimodality, refinements to the original method

References

  1. Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall.
  2. Hartigan, J.A. & Hartigan, P.M. (1985). The Dip Test of Unimodality. The Annals of Statistics, 13(1), 70-84.
  3. Müller, D.W. & Sawitzki, G. (1991). Excess Mass Estimates and Tests for Multimodality. JASA, 86(415), 738-746.
  4. Cheng, M.-Y. & Hall, P. (1998). Calibrating the excess mass and dip tests of modality. Journal of the Royal Statistical Society: Series B, 60(3), 579-589.

Installation

# From PyPI
pip install critband

# Or using uv
uv add critband

# Development
git clone git@github.com:ryZhangHason/critband.git
cd critband
uv sync
uv run python -m pytest tests/ -v

Requirements

  • Python >= 3.10
  • Runtime: numpy, scipy, openpyxl, xlrd, python-docx, pdfplumber
  • Test only (not needed at runtime): pytest, pytest-cov, xlwt, reportlab
  • Visualization (optional): matplotlib >= 3.5

Current validation status

WS1 calibration has been run on the fixed design grid and is kept as a validation artifact only. The repository now passes the strict full-suite acceptance run, but WS1 remains descriptive rather than threshold-setting.

Why critband?

  • Critical bandwidth analysis: One function call computes the critical bandwidth for mode-count structure, and you can use bimodality_strength() as a descriptive score or excess_mass() to estimate how many modes your data has.
  • Mode-count detection: critical_bandwidth(x, k=3) detects the bandwidth where trimodality disappears and generalizes the mode-count parameter.
  • Any-file input: 9 formats from a single API. CSV, Excel, PDF, Word, JSON, HTML, Markdown — just point read_data at the file and go.
  • One-command install: pip install critband installs everything. No system packages, no manual steps, no Tesseract OCR.
  • Pure Python dependencies: All 4 additional libraries (openpyxl, xlrd, python-docx, pdfplumber) are pure Python — no compiled extensions.
  • Built for production and research: Works in CLI scripts and Jupyter notebooks equally well.

Example: Bimodality Test

import numpy as np
from critband import critical_bandwidth, bimodality_strength, excess_mass

# Unimodal data — the function tells you it's already unimodal
unimodal = np.random.normal(0, 1, 500)
h_crit, success = critical_bandwidth(unimodal)
print(f"Unimodal: h_crit={h_crit:.4f}, converged={success}")
# → success may be False (already unimodal at minimum bandwidth)

# Bimodal data — full pipeline
bimodal = np.concatenate([
    np.random.normal(-3, 0.5, 300),
    np.random.normal( 3, 0.5, 300)
])

h_crit, success = critical_bandwidth(bimodal)
print(f"Bimodal: h_crit={h_crit:.4f}, converged={success}")

# Interpretable strength assessment
s = bimodality_strength(bimodal)
print(f"Strength: {s.strength} (score={s.strength_score:.2f})")

# Excess mass test — detects number of modes
e = excess_mass(bimodal, n_boot=199)
print(f"Estimated modes: {e.n_modes_estimated}")

# Trimodal data with mode-count detection
trimodal = np.concatenate([
    np.random.normal(-4, 0.3, 150),
    np.random.normal( 0, 0.3, 150),
    np.random.normal( 4, 0.3, 150),
])
h_k3, ok = critical_bandwidth(trimodal, k=3)
print(f"Trimodal (k=3): h_crit={h_k3:.4f}")

Visualization Examples

Generate 3 matplotlib figures that illustrate the core algorithms:

uv sync --group viz
uv run python examples/bandwidth_demo.py

The demo uses well-separated bimodal data from the benchmark suite and produces:

  • kde_bandwidth_sweep.png — 2×2 subplots showing KDE at 4 bandwidths (too small, Silverman, critical, too large), with peaks and troughs marked.
  • critical_bandwidth_transition.png — Three overlaid KDE curves: bimodal (h_crit × 0.85), critical (h_crit), and unimodal (h_crit × 1.15), with the trough position annotated.
  • component_decomposition.png — Histogram, KDE, detected Gaussian components (scaled by weight), and the trough-based separation point.

matplotlib is an optional dependency — the core package does not require it.

Jupyter Notebook Demo

An interactive Jupyter Notebook is available for hands-on exploration:

uv sync --group viz --group jupyter
uv run jupyter notebook examples/bandwidth_demo.ipynb

The notebook contains 10 cells that walk through the same three visualizations with explanatory markdown, and includes an extension exercise that compares dip ratios across all 7 benchmark cases.

Performance Benchmarking

Measure execution time of core functions across data sizes:

uv run python examples/benchmark_performance.py

Customize the number of runs and data sizes:

uv run python examples/benchmark_performance.py --runs 10 --sizes 100 500 2000
uv run python examples/benchmark_performance.py --output results.csv

Historical Validation: R Package Comparison

This section records validation against R's multimode package across 12 benchmark cases.

Feature Comparison

Feature critband multimode
Critical bandwidth ✅ (k-based mode-count test, k >= 2) ✅ (mod0-based mode-count test, mod0 >= 1)
Mode-count detection ✅ (threshold parameter k) ✅ (threshold parameter mod0)
Bimodality strength ✅ (interpretable)
Excess mass test
Silverman's bootstrap test
Component decomposition
9-format I/O
Dependencies Pure Python R + compiled

Quantitative Results ($h_{\text{crit}}$ Accuracy)

Case n critband h_crit R modetest p Agreement
Well-separated 400 1.8650 0.000 ✅ Both detect bimodality
Moderate separation 500 1.0964 0.000 ✅ Both detect bimodality
Barely separated 600 0.2791 0.251 ✅ critband flags weak bimodality, R agrees (n.s.)
Unequal variance 400 1.7849 0.000 ✅ Both detect bimodality
Unequal weights 500 1.2591 0.000 ✅ critband correct; R yields spurious 772 modes
Extreme separation 400 4.6987 0.000 ✅ Both detect bimodality
Trimodal 450 1.3824 0.000 ✅ critband finds 3 modes; R detects multimodality
Skewed bimodal 500 1.1417 0.000 ✅ Both detect bimodality
Wide-component bimodal 400 2.7109 0.000 ✅ Both detect bimodality
Near unimodal 600 0.4186 0.055 ✅ critband flags weak; R agrees (n.s.)
Small sample bimodal 60 1.8608 0.000 ✅ Both detect bimodality (small n)
Overlapping variances 500 0.4598 0.045 ✅ critband flags weak; R agrees (n.s.)

Across these benchmark cases, critband achieves <0.5% mean absolute relative error vs high-precision reference values.

Performance Comparison (median runtime per case, Apple M1 chip)

Operation critband R (multimode) Comparison
critical_bandwidth 0.04–0.79 s 0.82–1.58 s 3–10× difference
find_modes (mode counting) <0.01 s 0.03–0.05 s Comparable

The runtime difference reflects critband's pure-Python numerical stack (NumPy/SciPy) and adaptive grid sizing, compared to R's compiled package dispatch overhead with bootstrap calibration (modetest(B = 199)). These timing numbers were collected on Apple M1-class Apple Silicon hardware.

Multi-Format Data Loading

Read data from 9 file formats without learning separate tools:

from critband.io import read_data

# Auto-detect — just point at any supported file
x = read_data("measurements.csv")
x = read_data("survey.xlsx", sheet="Sheet1")
x = read_data("results.pdf", column="Score")
y = read_data("report.docx", column=0, return_all=False)
Format Extension Library
CSV .csv Built-in csv (zero dependencies)
TSV / TXT .tsv, .txt Built-in (zero dependencies)
JSON .json Built-in json (zero dependencies)
Markdown (tables) .md Built-in parser (zero dependencies)
HTML (tables) .html, .htm Built-in html.parser (zero dependencies)
Excel .xlsx openpyxl
Excel (legacy) .xls xlrd
Word (tables) .docx python-docx
PDF (tables) .pdf pdfplumber

All dependencies are declared in pyproject.tomlpip install critband installs everything automatically. No system-level tools required.

Column / Sheet Selection

from critband.io import read_data, read_buffer

# By sheet name (Excel, PDF, Markdown multi-section)
x = read_data("data.xlsx", sheet="Measurements")

# By index
x = read_data("data.xlsx", sheet=0)

# By column name
x = read_data("data.csv", column="Value")

# By column index
x = read_data("data.csv", column=1)

# Get all numerical columns as a dict
cols = read_data("data.csv", return_all=True)
# cols = {"x": array([...]), "y": array([...])}

Buffer API

Works with in-memory file buffers for pipeline and service integration:

from critband.io import read_buffer

# From a BytesIO object
buf = io.BytesIO(uploaded_file.read())
x = read_buffer(buf, filename="data.csv")

License

Apache-2.0

About

Python package for critical bandwidth analysis and multimodality diagnostics in kernel density estimation

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors