Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 10 additions & 5 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,16 @@ unread — **may have ordered, never auto-retried**); `challenge`.

**Screenshot tags** (suffix on files in the shots dir, written on each failed
step) tell you which stage died: `product` / `no_price` / `unavailable` /
`guard_block` / `no_buy_button` / `no_place_order` / `signin_*` / `challenge_*`
/ `submitted_unconfirmed` / `confirmation` / `review` / `timeout` / `crash` /
`dump`. `verify-selectors` and `dump-dom` also write `*_dom.html` (rendered
page) and `*_probe.txt` (per-selector match counts) — `Read` those to find the
real selector instead of guessing.
`guard_block` / `no_buy_button` / `no_place_order` / `cart_mismatch` (the
cart-singleton guard saw more than the intended item — NOT placed) / `signin_*`
/ `challenge_*` / `blocked_*` / `left_checkout` / `submitted_unconfirmed` /
`confirmation` / `review` / `timeout` / `crash` / `dump`. Diagnostic tags are
captured full-page (below-the-fold banners included); the `review`/`confirmation`
/`dump` shots stay header-only. `verify-selectors` and `dump-dom` also write
`*_dom.html` (rendered page) and `*_probe.txt` (per-selector match counts) —
`Read` those to find the real selector instead of guessing. The shots dir is
pruned automatically (worker) and via `roomieorder prune-shots`
(`ROOMIEORDER_SHOTS_RETENTION_DAYS`, default 30).

## 1. Green CI does not mean the buy flow works

Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,10 @@ Intake is always-on; execution needs a live graphical session. Requests sit in t
- [`dry-run ITEM_KEY --provider costco|amazon`](./src/roomieorder/cli.py) — navigate one store to checkout and screenshot without placing the order
- [`dump-dom ITEM_KEY --provider costco|amazon`](./src/roomieorder/cli.py) — read-only DOM + selector probe for bring-up
- [`verify-selectors [ITEM_KEY] --provider costco|amazon`](./src/roomieorder/cli.py) — probe live product pages and report PASS/MISS per item for the price + add-to-cart selectors (operator-run; hits the store, never orders)
- [`doctor`](./src/roomieorder/cli.py) — one-shot, read-only health check of config, Chrome, the graphical session, per-store profiles, the DB/queue, and the catalog
- [`doctor [--check-login]`](./src/roomieorder/cli.py) — one-shot, read-only health check of config, Chrome, the graphical session, per-store profiles, the DB/queue, and the catalog; `--check-login` also relaunches each store profile to report whether it's still signed in
- [`failures [--limit N]`](./src/roomieorder/cli.py) — list recent failed/blocked orders with their notes and the newest screenshots to open
- [`retry ROW_ID [--resume]`](./src/roomieorder/cli.py) — re-enqueue a failed row (refuses rows that may already have placed an order)
- [`prune-shots [--days N]`](./src/roomieorder/cli.py) — delete old screenshots/DOM dumps from the shots dir (the worker also prunes automatically)

## Configuration

Expand Down
4 changes: 4 additions & 0 deletions examples/env.example
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@ ROOMIEORDER_DB=data/state.sqlite
# Sign into each with `roomieorder login --provider costco|amazon`.
ROOMIEORDER_PROFILE_DIR=data/profile
ROOMIEORDER_SHOTS_DIR=data/shots
# Delete screenshots / DOM dumps older than this many days (the worker prunes at
# startup and after each order; `roomieorder prune-shots` runs it by hand). The
# shots dir grows unbounded otherwise. 0 disables pruning.
ROOMIEORDER_SHOTS_RETENTION_DAYS=30

# ─────────── Stores ───────────
# Costco is tried first; Amazon is the fallback when Costco is sold out, not
Expand Down
71 changes: 64 additions & 7 deletions src/roomieorder/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@
* ``dry-run KEY`` — drive one item to its review page and screenshot, no order.
* ``dump-dom KEY`` — read-only DOM dump + selector probe for bring-up.
* ``verify-selectors`` — probe live pages for stale buy-flow selectors.
* ``doctor`` — one-shot, read-only health check of every subsystem.
* ``doctor`` — one-shot, read-only health check of every subsystem
(``--check-login`` adds a per-store signed-in probe).
* ``prune-shots`` — delete old screenshots/DOM dumps from the shots dir.
* ``failures`` — list recent failed/blocked orders and their screenshots.
* ``retry ID`` — re-enqueue a failed row for another attempt.
* ``resume`` / ``pause`` / ``status`` — manage the worker-pause flag.
Expand All @@ -33,6 +35,7 @@
from roomieorder.config import Config, load_config
from roomieorder.guards import check_price_ceiling, check_spend_cap
from roomieorder.notify import build_notifier
from roomieorder.retention import prune_shots
from roomieorder.sheets import build_sheets
from roomieorder.store import Store

Expand Down Expand Up @@ -431,14 +434,25 @@ def verify_selectors(item_key: Optional[str], provider: str) -> None:


@main.command()
def doctor() -> None:
@click.option(
"--check-login",
is_flag=True,
help="Also launch each store profile read-only and report whether it's still "
"signed in (needs a graphical session; slower).",
)
def doctor(check_login: bool) -> None:
"""Print a one-shot, read-only health check of every subsystem.

Never launches a browser or touches a store, so it's safe and instant.
Reports config/anti-bot, the graphical session the worker needs, the
per-store profiles, the DB/queue, and the catalog. Exits non-zero when a
hard check fails (a pinned Chrome that doesn't exist, an unopenable DB, an
By default never launches a browser or touches a store, so it's safe and
instant. Reports config/anti-bot, the graphical session the worker needs, the
per-store profiles, the DB/queue, and the catalog. Exits non-zero when a hard
check fails (a pinned Chrome that doesn't exist, an unopenable DB, an
unparseable catalog), so it doubles as a smoke test.

``--check-login`` adds a read-only session probe: it relaunches each store's
saved profile and reports LOGGED-IN / LOGGED-OUT (reusing the buy flow's
``verify_session``) so an expired session is caught here instead of at the
next real order. It opens a browser and needs a graphical session.
"""
config = load_config()
hard_fail = False
Expand Down Expand Up @@ -488,9 +502,26 @@ def line(state: str, label: str, detail: str) -> None:
for label, path in (("costco", config.costco_profile_dir), ("amazon", config.amazon_profile_dir)):
if path.exists():
stamp = datetime.fromtimestamp(path.stat().st_mtime, tz=timezone.utc).isoformat()
line("ok", f"profile/{label}", f"present, mtime {stamp} (login unverified — run dump-dom)")
present = "present" if check_login else "present (login unverified — run dump-dom)"
line("ok", f"profile/{label}", f"{present}, mtime {stamp}")
else:
line("warn", f"profile/{label}", f"missing {path} — run `roomieorder login --provider {label}`")
continue
if not check_login:
continue
# Read-only session probe: relaunch the saved profile and report whether
# it reloads signed in. Best-effort — a launch failure (no display, no
# Chrome) is a warn, not a hard fail, so the rest of doctor still reports.
try:
logged_in = _purchaser_for(config, label).verify_session() # type: ignore[attr-defined]
except Exception as exc: # noqa: BLE001 — surface, don't crash the check
line("warn", f"login/{label}", f"probe failed: {str(exc).splitlines()[0][:80]}")
continue
line(
"ok" if logged_in else "warn",
f"login/{label}",
"LOGGED-IN" if logged_in else f"LOGGED-OUT — run `roomieorder login --provider {label}`",
)

# ── DB / queue ──
try:
Expand Down Expand Up @@ -522,6 +553,32 @@ def line(state: str, label: str, detail: str) -> None:
raise SystemExit(1)


@main.command(name="prune-shots")
@click.option(
"--days",
type=int,
default=None,
help="Delete shots older than this many days (default: ROOMIEORDER_SHOTS_RETENTION_DAYS).",
)
def prune_shots_cmd(days: Optional[int]) -> None:
"""Delete old screenshots / DOM dumps from the shots dir.

The buy flow writes a PNG (and dump-dom an HTML + probe) on every attempt
with no rotation, so the shots dir grows unbounded. The worker prunes
automatically; this runs the same sweep by hand. ``--days`` overrides the
configured retention window; 0 (or an unset window) disables pruning.
"""
config = load_config()
retention = days if days is not None else config.shots_retention_days
if retention <= 0:
click.echo(
"retention disabled — pass --days N or set ROOMIEORDER_SHOTS_RETENTION_DAYS > 0"
)
return
removed = prune_shots(config.shots_dir, retention)
click.echo(f"pruned {removed} file(s) older than {retention}d from {config.shots_dir}")


@main.command()
@click.option("--limit", default=10, show_default=True, help="Max rows / screenshots to show.")
def failures(limit: int) -> None:
Expand Down
6 changes: 6 additions & 0 deletions src/roomieorder/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,11 @@ class Config(BaseModel):
db_path: Path = Path("data/state.sqlite")
profile_dir: Path = Path("data/profile")
shots_dir: Path = Path("data/shots")
# Delete screenshots / DOM dumps older than this many days (the worker prunes
# at startup and after each order; `roomieorder prune-shots` runs it by hand).
# The shots dir is the systemd StateDirectory and grows unbounded otherwise.
# 0 disables pruning.
shots_retention_days: int = Field(default=30, ge=0)

# Stores
costco_domain: str = "costco.com"
Expand Down Expand Up @@ -180,6 +185,7 @@ def load_config() -> Config:
db_path=Path(_env_str("ROOMIEORDER_DB", "data/state.sqlite")),
profile_dir=Path(_env_str("ROOMIEORDER_PROFILE_DIR", "data/profile")),
shots_dir=Path(_env_str("ROOMIEORDER_SHOTS_DIR", "data/shots")),
shots_retention_days=_env_int("ROOMIEORDER_SHOTS_RETENTION_DAYS", 30),
costco_domain=_env_str("ROOMIEORDER_COSTCO_DOMAIN", "costco.com"),
amazon_domain=_env_str("ROOMIEORDER_AMAZON_DOMAIN", "amazon.com"),
costco_store_id=_env_str("ROOMIEORDER_COSTCO_STORE_ID", "10301"),
Expand Down
38 changes: 38 additions & 0 deletions src/roomieorder/logutil.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
"""Correlation logging — tag a buy's log lines so they grep together.

A single order produces log lines across the worker loop (``main.py``) and the
Playwright buy flow (``purchase.py``), plus screenshots and a Sheet row. Without
a shared token, tracing one buy through a busy journal is manual. This wraps a
stdlib logger in a :class:`logging.LoggerAdapter` that prefixes every record
with a short ``key=value`` correlation token — the same ``provider``/``item``
that already names the screenshot files (``{ts}_{provider}_{item}_{tag}.png``),
so logs ↔ shots ↔ Sheet rows line up under one grep.
"""

from __future__ import annotations

import logging
from typing import Any, MutableMapping


class _CorrelatedLogger(logging.LoggerAdapter): # type: ignore[type-arg]
"""A LoggerAdapter that prefixes each message with its correlation token."""

def process(
self, msg: Any, kwargs: MutableMapping[str, Any]
) -> tuple[Any, MutableMapping[str, Any]]:
corr = self.extra.get("corr") if self.extra else ""
return (f"[{corr}] {msg}" if corr else msg), kwargs


def correlated(logger: logging.Logger, **fields: object) -> _CorrelatedLogger:
"""Wrap ``logger`` so every line is prefixed with a ``key=value`` token.

Empty/None field values are dropped, so ``correlated(log, provider="costco",
item="paper_towels")`` prefixes ``[provider=costco item=paper_towels]`` and
``correlated(log, row=7, item="dish_soap")`` prefixes ``[row=7 item=dish_soap]``.
The adapter forwards every logging method (info/warning/exception/…) to the
wrapped logger unchanged apart from the prefix.
"""
corr = " ".join(f"{k}={v}" for k, v in fields.items() if v not in (None, ""))
return _CorrelatedLogger(logger, {"corr": corr})
26 changes: 21 additions & 5 deletions src/roomieorder/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,11 @@
from roomieorder.catalog import Catalog, CatalogError, CatalogItem, load_catalog
from roomieorder.config import Config, load_config
from roomieorder.guards import check_intake
from roomieorder.logutil import correlated
from roomieorder.notify import Notifier, build_notifier
from roomieorder.orchestrator import Orchestrator
from roomieorder.purchase import PurchaseResult
from roomieorder.retention import prune_shots
from roomieorder.sheets import SheetsClient, build_sheets
from roomieorder.store import QueueRow, Store

Expand Down Expand Up @@ -168,6 +170,9 @@ def stop_worker(self) -> None:
self._thread.join(timeout=10.0)

def _worker_loop(self) -> None:
# Sweep stale shots once at startup so a long-idle service still reclaims
# disk even before the next order; each order re-prunes via _process.
self._prune_shots()
while not self._stop.is_set():
if self.store.is_paused():
self._stop.wait(_WORKER_POLL_SECONDS)
Expand All @@ -179,13 +184,25 @@ def _worker_loop(self) -> None:
try:
self._process(row)
except Exception: # noqa: BLE001 — a crash must not kill the loop
_logger.exception("worker failed processing row %d", row.id)
correlated(_logger, row=row.id, item=row.item_key).exception(
"worker failed processing row"
)
self.store.mark(row.id, "failed", notes="worker crashed")
self.store.set_paused(True, f"worker crashed on row {row.id}")
finally:
self._prune_shots()

def _prune_shots(self) -> None:
"""Best-effort shots retention sweep — never disrupts the worker loop."""
try:
prune_shots(self.config.shots_dir, self.config.shots_retention_days)
except Exception: # noqa: BLE001 — disk hygiene must never crash the loop
_logger.exception("shots prune failed")

# ─────────── per-row processing ───────────

def _process(self, row: QueueRow) -> None:
log = correlated(_logger, row=row.id, item=row.item_key)
item = self.catalog.get(row.item_key)
if item is None:
self.store.mark(row.id, "failed", notes="item_key not in catalog")
Expand Down Expand Up @@ -215,7 +232,7 @@ def _process(self, row: QueueRow) -> None:

if result.status in _PAUSE_STATUSES:
self.store.set_paused(True, result.message)
_logger.warning("worker paused: %s", result.message)
log.warning("worker paused: %s", result.message)
elif result.status == "placed":
self._enforce_recorded_cap()

Expand All @@ -236,9 +253,8 @@ def _maybe_auto_retry(self, row: QueueRow, result: PurchaseResult) -> bool:
return False
self._transient_attempts[row.item_key] = count + 1
new_id = self.store.enqueue(row.item_key, row.requester)
_logger.info(
"auto-retry %s: transient pre-cart failure (%d/%d) — re-enqueued as #%d",
row.item_key,
correlated(_logger, row=row.id, item=row.item_key).info(
"auto-retry: transient pre-cart failure (%d/%d) — re-enqueued as #%d",
count + 1,
self.config.auto_retry_max,
new_id,
Expand Down
40 changes: 37 additions & 3 deletions src/roomieorder/purchase.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@
from roomieorder.catalog import AmazonSource, CatalogItem, CostcoSource
from roomieorder.config import Config
from roomieorder.guards import GuardResult
from roomieorder.logutil import correlated
from roomieorder.store import Status

# Each purchaser drives exactly one store's source shape; bind it so the buy
Expand Down Expand Up @@ -104,6 +105,34 @@ def _playwright_api() -> object:

_JSONLD_SELECTOR = "script[type='application/ld+json']"

# Diagnostic screenshot tags worth a full-page capture: the error banner / the
# mismatched cart line / the disabled control you need for triage often sits
# below the fold, which a header-only shot crops out. The happy-path `review`
# and `confirmation` shots and the `dump` bring-up shot stay header-only — they
# go out over the notifier, where a tall full-page PNG is just bulk. The
# blocked_/challenge_/signin_ families carry a `_{where}` suffix, so they match
# by prefix.
_FULL_PAGE_TAGS = frozenset(
{
"no_price",
"no_buy_button",
"no_place_order",
"unavailable",
"guard_block",
"cart_mismatch",
"submitted_unconfirmed",
"left_checkout",
"timeout",
"crash",
}
)
_FULL_PAGE_TAG_PREFIXES = ("blocked_", "challenge_", "signin_")


def _is_full_page_tag(tag: str) -> bool:
"""True when a screenshot ``tag`` is a diagnostic worth capturing full-page."""
return tag in _FULL_PAGE_TAGS or tag.startswith(_FULL_PAGE_TAG_PREFIXES)

# First number-ish run in a blob: digits with optional grouping/decimal
# separators, e.g. "24.99", "1,234.56", "11,99".
_PRICE_RE = re.compile(r"[0-9][0-9.,]*[0-9]|[0-9]")
Expand Down Expand Up @@ -459,6 +488,10 @@ def buy(

url = self._resolve_url(source)
title = item.title
# Correlation token shared with the screenshot filenames
# ({ts}_{provider}_{item}_{tag}.png) and the worker/Sheet row, so one
# buy's log lines grep together.
log = correlated(_logger, provider=self.PROVIDER, item=item_key)

with api.sync_playwright() as pw: # type: ignore[attr-defined]
context = self._launch_context(pw)
Expand Down Expand Up @@ -751,11 +784,11 @@ def buy(
# A programmer error (bad attr/type/name, missing override, …).
# Screenshot for context, then re-raise so it can't hide as
# "store flakiness" — the worker loop records it and pauses.
_logger.exception("buy flow hit a programmer error for %s", item_key)
log.exception("buy flow hit a programmer error")
self._screenshot(page, item_key, "crash")
raise
except Exception as exc: # noqa: BLE001 — convert any flake to a safe result
_logger.exception("buy flow crashed for %s", item_key)
log.exception("buy flow crashed")
detail = f"buy flow error: {exc}".split("\n")[0]
if submitted:
return self._submitted_unconfirmed(
Expand Down Expand Up @@ -1406,8 +1439,9 @@ def _signin_required(self, page: "Page", item_key: str, where: str) -> PurchaseR

def _screenshot(self, page: "Page", item_key: str, tag: str) -> Optional[Path]:
path = self._shot_path(item_key, tag)
full_page = _is_full_page_tag(tag)
try:
page.screenshot(path=str(path), full_page=False)
page.screenshot(path=str(path), full_page=full_page)
return path
except Exception as exc: # noqa: BLE001
_logger.warning("screenshot failed (%s): %s", tag, exc)
Expand Down
Loading