Skip to content

gorajing/the-dose

Repository files navigation

TheDose

A dose-contextualized consumer product safety platform. Currently in active development.

The dose makes the poison. Most platforms rate hazard in isolation. TheDose rates products at actual dose, using regulatory data from CosIng (EU), CIR (US), SCCS (EU), FDA, and Prop 65.

Why this exists

Every major consumer product safety platform (EWG, Yuka, Think Dirty) rates ingredient hazard, not product safety. 79% of toxicologists say EWG overstates risk because hazard ratings ignore dose, formulation context, and product type. No credible science-based free alternative exists.

TheDose differentiates from EWG on five axes:

  1. Rate products, not ingredients. Dose exposure depends on the product, not just the molecule.
  2. Expose conflict of interest. Affiliate-sponsored ratings vs. methodology-driven ratings.
  3. Track manufacturing byproducts. Contamination data is a separate axis, not folded into ingredient hazard.
  4. Don't penalize effective preservation. A working preservative system is safer than an unpreserved product that grows pathogens.
  5. Keep data current. Reformulations are tracked; ratings update.

How it works

Four-tier scoring with a separate confidence dimension:

  • VERIFIED_SAFE — multiple independent jurisdictions, dose-margin verified
  • GENERALLY_SAFE — consensus regulatory finding, sufficient evidence
  • INSUFFICIENT_DATA — first-class verdict, not a fallback
  • CONCERNS_IDENTIFIED — documented hazard at exposure-relevant dose

URL-level provenance on every data point: source URL, retrieval date, retrieval method, verifier identity. The platform refuses to score incomplete products. Soft signals (sparse evidence, stale research) feed the confidence dimension; hard signals (missing coverage, identity conflicts) block the operation.

The research pipeline has three layers, with editorial judgment never automated:

  • Layer 1 (auto): data retrieval — pull regulatory records, parse INCI lists, fetch PubMed abstracts
  • Layer 2 (auto with mandatory review): analysis drafting — apply scoring algorithm, flag anomalies
  • Layer 3 (never automated): editorial judgment — final safety verdicts, "insufficient data" vs "likely safe" calls

Method freeze (Grind Phase)

The admissibility matrix and other load-bearing interfaces are frozen during the Grind Phase. Edge cases discovered in research are logged to deferred_judgment/, not re-classified in the moment. Any matrix change requires full re-verification of all affected prior packets, making rule changes deliberately expensive (cheap to defer, costly to enact).

Architecture

Clean architecture with strict layer-import rules:

  • domain/ — pure Python, zero framework imports
  • application/ — use cases plus abstract port interfaces plus plain dataclass DTOs
  • infrastructure/ — SQLAlchemy ORM, Pydantic validation, concrete repositories
  • cli/ — click commands, composition root, output formatters

Tech stack: Python 3.12 (uv), SQLAlchemy 2.0, Alembic, Pydantic v2 (infrastructure boundary only), pytest, click. SQLite for now; Postgres planned. Web layer in Next.js 16 + React 19.

See CLAUDE.md for the full architecture conventions.

CI invariants

Every PR must pass .github/workflows/ci.yml:

  • uv run pytest tests/ -q (all tests green)
  • git status --porcelain over web/data/ and web/verification.json after pytest (test-isolation regression guard)
  • uv run thedose verify-projection --json-output against a freshly bootstrapped DB (zero drifted packets)
  • uv run thedose audit --json-output (zero errors; warnings are by-design honest-insufficiency for ingredients like Aqua and Parfum)
  • research/ and web/data/ stay in sync after npm run build (porcelain drift detection)
  • Frontend lint and build pass

What's in the repo

Path What it is
src/thedose/ Backend: clean-architecture domain/application/infrastructure/cli layers
web/ Next.js 16 frontend; static-generated from web/data/*.json
research/ Canonical ingredient research packets (273 ingredients) and class templates
tests/ 418 tests; unit + integration
migrations/ Alembic schema migrations
docs/ops/ Operational runbooks, audit scripts, CI baselines
docs/ops/scripts/ Bounded-audit scripts (audit_phase0_pmids.py, audit_qrt_cap_omission.py, verify_sources.py)
docs/ops/matrix_checkpoint_log.md Append-only log of every admissibility-matrix review checkpoint
deferred_judgment/ Edge cases logged for batched review — the "show your work" layer
calibration/ Sonnet vs. Opus benchmark with stratified sample + dual review
.github/workflows/ci.yml CI gate (pytest + integrity + web build + sync drift)

Live site

the-dose.vercel.app

License

MIT — code is permissive. The corpus is published under the same terms; attribution welcome.

Contributing

This is a one-person reference work in active grind toward launch. Issues are welcome (see CONTRIBUTING.md). Pull requests are not currently accepted; the methodology and admissibility matrix are frozen during the Grind Phase to maintain audit integrity.


Built by Jin Choi.

About

Science-based skincare ingredient safety. Methodology open. Receipts on every claim.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors