A dose-contextualized consumer product safety platform. Currently in active development.
The dose makes the poison. Most platforms rate hazard in isolation. TheDose rates products at actual dose, using regulatory data from CosIng (EU), CIR (US), SCCS (EU), FDA, and Prop 65.
Every major consumer product safety platform (EWG, Yuka, Think Dirty) rates ingredient hazard, not product safety. 79% of toxicologists say EWG overstates risk because hazard ratings ignore dose, formulation context, and product type. No credible science-based free alternative exists.
TheDose differentiates from EWG on five axes:
- Rate products, not ingredients. Dose exposure depends on the product, not just the molecule.
- Expose conflict of interest. Affiliate-sponsored ratings vs. methodology-driven ratings.
- Track manufacturing byproducts. Contamination data is a separate axis, not folded into ingredient hazard.
- Don't penalize effective preservation. A working preservative system is safer than an unpreserved product that grows pathogens.
- Keep data current. Reformulations are tracked; ratings update.
Four-tier scoring with a separate confidence dimension:
- VERIFIED_SAFE — multiple independent jurisdictions, dose-margin verified
- GENERALLY_SAFE — consensus regulatory finding, sufficient evidence
- INSUFFICIENT_DATA — first-class verdict, not a fallback
- CONCERNS_IDENTIFIED — documented hazard at exposure-relevant dose
URL-level provenance on every data point: source URL, retrieval date, retrieval method, verifier identity. The platform refuses to score incomplete products. Soft signals (sparse evidence, stale research) feed the confidence dimension; hard signals (missing coverage, identity conflicts) block the operation.
The research pipeline has three layers, with editorial judgment never automated:
- Layer 1 (auto): data retrieval — pull regulatory records, parse INCI lists, fetch PubMed abstracts
- Layer 2 (auto with mandatory review): analysis drafting — apply scoring algorithm, flag anomalies
- Layer 3 (never automated): editorial judgment — final safety verdicts, "insufficient data" vs "likely safe" calls
The admissibility matrix and other load-bearing interfaces are frozen during the Grind Phase. Edge cases discovered in research are logged to deferred_judgment/, not re-classified in the moment. Any matrix change requires full re-verification of all affected prior packets, making rule changes deliberately expensive (cheap to defer, costly to enact).
Clean architecture with strict layer-import rules:
domain/— pure Python, zero framework importsapplication/— use cases plus abstract port interfaces plus plain dataclass DTOsinfrastructure/— SQLAlchemy ORM, Pydantic validation, concrete repositoriescli/— click commands, composition root, output formatters
Tech stack: Python 3.12 (uv), SQLAlchemy 2.0, Alembic, Pydantic v2 (infrastructure boundary only), pytest, click. SQLite for now; Postgres planned. Web layer in Next.js 16 + React 19.
See CLAUDE.md for the full architecture conventions.
Every PR must pass .github/workflows/ci.yml:
uv run pytest tests/ -q(all tests green)git status --porcelainoverweb/data/andweb/verification.jsonafter pytest (test-isolation regression guard)uv run thedose verify-projection --json-outputagainst a freshly bootstrapped DB (zero drifted packets)uv run thedose audit --json-output(zero errors; warnings are by-design honest-insufficiency for ingredients like Aqua and Parfum)research/andweb/data/stay in sync afternpm run build(porcelain drift detection)- Frontend lint and build pass
| Path | What it is |
|---|---|
src/thedose/ |
Backend: clean-architecture domain/application/infrastructure/cli layers |
web/ |
Next.js 16 frontend; static-generated from web/data/*.json |
research/ |
Canonical ingredient research packets (273 ingredients) and class templates |
tests/ |
418 tests; unit + integration |
migrations/ |
Alembic schema migrations |
docs/ops/ |
Operational runbooks, audit scripts, CI baselines |
docs/ops/scripts/ |
Bounded-audit scripts (audit_phase0_pmids.py, audit_qrt_cap_omission.py, verify_sources.py) |
docs/ops/matrix_checkpoint_log.md |
Append-only log of every admissibility-matrix review checkpoint |
deferred_judgment/ |
Edge cases logged for batched review — the "show your work" layer |
calibration/ |
Sonnet vs. Opus benchmark with stratified sample + dual review |
.github/workflows/ci.yml |
CI gate (pytest + integrity + web build + sync drift) |
MIT — code is permissive. The corpus is published under the same terms; attribution welcome.
This is a one-person reference work in active grind toward launch. Issues are welcome (see CONTRIBUTING.md). Pull requests are not currently accepted; the methodology and admissibility matrix are frozen during the Grind Phase to maintain audit integrity.
Built by Jin Choi.