質 dqt

A Data Questioning Tool that tells you the what and surfaces the why.

64 detectors across 5 families (drift, outlier, time series, distribution, rule) · best F1 0.933 (holt_winters / wasserstein_1) · full results

質 (shitsu) - quality, substance, the inner nature of a thing. The kanji points to what something truly is, not how it appears. dqt is meant to work the same way: concerned with the truth of the data, not its surface. The mark is also a quiet acknowledgment of a tradition I have learned much from - one in which quality is one of its most distinguishing characteristics, and craft and precision are understood to be the same thing. — Anton Barr

Unifies your scattered data into one source of truth. Upgrades your existing models, dashboards, and queries into a causal semantic layer you didn't have to write. Picks up on trends and surfaces business insights, all wrapped in a quality harness that puts guardrails on the AI so the reports it generates stay on-spec.

The problem it solves

Without dqt: orders.amount null_fraction >= 0.05 -- threshold exceeded. Now what? Go dig through git log, dbt docs, warehouse history.

With dqt:

orders.amount null_fraction = 12.4% (baseline 0.3%)
Lineage: stg_payments -> orders -> revenue
Schema break in stg_payments 6h ago.
Causal candidate: stg_payments -> orders.amount (E-value 3.2, pending human review)

Four layers

Statistical detectors - MAD, double-MAD, isolation forest, KS, STL residuals, adjusted boxplot fences. Plus completeness, validity, freshness, schema-change, and SQL-assertion checks. Every detector returns (verdict, score, plain_english).
Column-level lineage - walks your dbt manifest and warehouse DDL with sqlglot. From any incident, automatic blast radius across downstream tables and metrics.
LLM Wiki + Semantic layer - dump tickets, SQL, and BI reports into raw/. Point Claude Code at the vault. It synthesises dataset descriptions, metric definitions, and causal edges into wiki/ from the artifacts your team already has. Based on Karpathy's LLM Wiki pattern.
Causal discovery - Granger causality, PCMCI+, Transfer Entropy across your metric time series. Edges are proposed, human-reviewed, then enter the production DAG annotated with lag, confidence, and E-values.

Quick start

pip install dqtlib

from dqt import Check, Runner, MemoryStore

check = Check(
    schema_name="public",
    table_name="orders",
    column_name="amount",
    detector_slug="mad_outlier_fraction",
)

result = Runner(MemoryStore()).run(check, adapter)
print(result.plain_english)
# "0.82% of values are outliers -- within the 1% warn threshold"

# Or from YAML
dqt run checks.yaml

# Exit codes: 0 = all pass, 2 = one or more failed

Installation

pip install dqtlib                # core library + CLI
pip install "dqtlib[wiki]"        # + LLM Wiki synthesis (Anthropic Claude)
pip install "dqtlib[dashboard]"   # + local browser dashboard
pip install "dqtlib[reports]"     # + HTML profiling reports
pip install "dqtlib[causal]"      # + PCMCI+ causal discovery
pip install "dqtlib[all]"         # everything

Requires Python >= 3.12.

Warehouse support

Built for ClickHouse and BigQuery first. Snowflake, Databricks, Postgres - WIP.

Engine	Status
ClickHouse	Supported
BigQuery	Supported
PostgreSQL	Supported
DuckDB / CSV / Parquet	Supported
Snowflake	WIP
Databricks SQL	WIP

All adapters are cost-guarded (dryRun/EXPLAIN before any query) and read-only.

Integrations

dbt - reads manifest.json and semantic_models.yml directly
Airflow / Dagster / Prefect - runs as one Python task
OpenLineage - ingests events from any non-dbt pipeline
Claude Code - Context7 plugin for live dqt docs, Superpowers for agentic check-suite builds

Screenshots

Overview - fleet KPIs, dataset health table with sparklines, live activity feed

Incident detail - statistical evidence, distribution overlay, causal trace, AI explanation

Documentation

Doc	Description
Getting started	First check in 5 min, drift detection, CLI, dashboard, quick-reference slug table
Detectors reference	All detectors with parameters and examples
YAML check format	Complete YAML config reference
CLI reference	All CLI commands including `dqt wiki`, `dqt report`
Python API	Check model, CheckScope, Runner, MemoryStore
LLM Wiki	Semantic layer synthesis from raw docs
Adapters	Warehouse adapter protocol
Local dashboard	Browser UI for check results
Benchmarks	F1, recall, precision across 30 trials
Architecture	System design, module boundaries, project layout
Comparison	dqt vs GE, Soda, Elementary, Dataplex
Release notes	Per-version changelog

About

Anton Barr is an engineer and data geek with 25+ years building data systems. A student of 質 (shitsu): quality, substance, the inner nature of a thing. dqt is a personal project built by a practitioner who believes craft and precision are the same thing - and got tired of tools that answer what but never why.

License

MIT - see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 393 Commits
.ai		.ai
.claude		.claude
.github/workflows		.github/workflows
.playwright-mcp		.playwright-mcp
apps		apps
benchmarks		benchmarks
docs		docs
examples		examples
packages		packages
run_local		run_local
scripts		scripts
setup		setup
shared		shared
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
STABILITY.md		STABILITY.md
checks-after-run.png		checks-after-run.png
checks-before-2.png		checks-before-2.png
checks-before-run.png		checks-before-run.png
checks-final-after.png		checks-final-after.png
checks-final-before.png		checks-final-before.png
checks-ks-drift-panel.png		checks-ks-drift-panel.png
checks-page.png		checks-page.png
checks-running.png		checks-running.png
checks-while-running.png		checks-while-running.png
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

質 dqt

The problem it solves

Four layers

Quick start

Installation

Warehouse support

Integrations

Screenshots

Documentation

About

License

About

Uh oh!

Releases 2

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

質 dqt

The problem it solves

Four layers

Quick start

Installation

Warehouse support

Integrations

Screenshots

Documentation

About

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Contributors

Uh oh!

Languages