Skip to content

feat(vendor-charge): ingest ChittyScrape portal charges into the books#134

Open
chitcommit wants to merge 1 commit into
feat/comptroller-cost-bridgefrom
feat/vendor-charge-ingest
Open

feat(vendor-charge): ingest ChittyScrape portal charges into the books#134
chitcommit wants to merge 1 commit into
feat/comptroller-cost-bridgefrom
feat/vendor-charge-ingest

Conversation

@chitcommit

@chitcommit chitcommit commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

What

Canonical "ChittyScrape feeds the cost flow" wiring. Turns a ChittyScrape-extracted vendor charge (registered-agent fee, utility bill, mortgage statement) into an idempotent ChittyFinance expense transaction, reusing the PR #133 comptroller pattern.

Based on feat/comptroller-cost-bridge (#133, unmerged) because the partial unique index transactions_tenant_external_idx and getAccountByExternalId land there. Merge after / rebase onto main once #133 merges.

Changes

  • storage.upsertVendorCharge — idempotent by external_id = scrape:{portalId}:{period}:{vendor}, type='expense', partial-index arbiter. amount REQUIRED (>0), never defaults to 0; exact USD + paymentStatus + category preserved in metadata.
  • POST /api/vendor-charge/ingest (server/routes/vendor-charge.ts) — zod-validates envelope+charge; rejects success=false (422) and missing/zero amount (400); maps vendor category → real COA; books against account by external_id (default chittyos-infra); audit-logs via ledgerLog.

COA mapping (real codes, verified on Neon solitary-rice-14149088 — no invented codes)

category COA name
registered-agent 5050 Legal & Professional Fees (private vendor service fee, not a govt license)
utility electric/gas/water/trash/internet 5100/5110/5120/5130/5140 Utilities - *
mortgage 5300 Mortgage Interest (principal is liability 2500, not an expense)

nw-registered-agent → ingest field mapping

scraper (NWAgentResult) ingest contract
annualFeeUsd charge.amountUsd (required)
paymentStatus 'ok'|'failed' charge.paymentStatus (recorded, does not gate the expense)
portal nw-registered-agent category registered-agent → COA 5050
filing year charge.period
envelope.scrapedAt charge.date fallback

(Scraper side — annualFeeUsd extraction — is chittyos/chittyscrape#21.)

Verification — what was actually exercised

  • Storage upsert + partial-index conflict behavior: executed on real Neon (solitary-rice-14149088). Ran the exact upsertVendorCharge INSERT…ON CONFLICT SQL with a real Northwest Registered Agent $125/yr charge → row created (COA 5050, type=expense); re-ran → same row id, 1 row, no duplicate (idempotency proven). The verification row was then deleted — it was a fixture, not a real NW bill, and must not persist in IT CAN BE LLC's books.
  • Route handler + zod guards (400 missing-amount / 422 failed-scrape / category→COA branches): verified by typecheck only, not executed (needs full system-mode env). npm run check clean.

Credential-gated follow-up (chico/ChittyConnect)

The live portal scrape (NW login) is credential-gated. Registration + ingest wiring + mapping are real and verified per above; the live scrape→ingest hop needs ChittyConnect creds.

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 712c7cb4-b582-4754-bab3-a6662d3a1f83

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/vendor-charge-ingest

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown
Contributor

@coderabbitai review

Please evaluate:

  • Security implications
  • Credential exposure risk
  • Dependency supply chain concerns
  • Breaking API changes

@claude

claude Bot commented Jun 11, 2026

Copy link
Copy Markdown

Code Review — feat(vendor-charge): ingest ChittyScrape portal charges into the books

Overall the wiring is clean and well-motivated. The Zod validation, idempotent upsert pattern, and COA-justification comments are all solid. A few issues need addressing before merge, one of them security-critical.


🔴 Critical — Endpoint is unauthenticated

/api/vendor-charge/ingest is not in protectedPrefixes in server/app.ts. None of the auth + tenant middleware (hybridAuth, tenantMiddleware) applies to it.

// server/app.ts — protectedPrefixes does NOT include '/api/vendor-charge'
const protectedPrefixes = [
  '/api/accounts', '/api/transactions', ...
  // '/api/vendor-charge' is missing
];

In practice c.get('tenantId') resolves to undefined without the middleware, so getAccountByExternalId('chittyos-infra', undefined) returns null and the route returns a 400 before any DB write — so it's not directly exploitable for data injection today. But the endpoint is completely open: no authentication check, no tenant resolution. Any caller can probe it and observe internal error messages (No account with external_id='…').

Fix: add '/api/vendor-charge' to protectedPrefixes, matching the pattern all other resource routes use. Note: the parent branch feat/comptroller-cost-bridge has the same gap for /api/comptroller — worth fixing there too before this lands.


🟠 Bug — ...input.metadata spread can silently override hardcoded fields

// server/storage/system.ts
const metadata = {
  source: 'chittyscrape',
  portalId: input.portalId,
  ...
  ...input.metadata,          // ← comes LAST — clobbers the fields above
};

If the caller passes metadata: { source: 'untrusted-source' } the spread wins. Reorder so explicit fields come last (or omit the spread entirely and use a structured type for metadata).


🟡 Fragile idempotency key — vendor name can drift

const externalId = `scrape:${input.portalId}:${input.period}:${input.vendor}`;

input.vendor is a human-readable string from the scraper (e.g. "Northwest Registered Agent"). Trivial variation — extra comma, trailing LLC, casing change — breaks idempotency and creates duplicate rows despite the partial index. portalId + period already uniquely identifies "which portal's charge for which billing cycle"; the vendor name is redundant and risky. Consider scrape:${input.portalId}:${input.period} as the key, keeping vendor as a display field only.


🟡 utility fallback maps to internet (5140) — misleading

utility: '5140', // generic utility fallback (internet/cable) when unspecified

5140 is Utility - Internet. A caller sending category: 'utility' (meaning "some unspecified utility") gets silently miscategorised as internet. Either remove this fallback and require callers to be specific, or add a dedicated catch-all code (if one exists in the COA) and document it explicitly.


🟡 Mortgage amount — total payment vs interest-only

The route maps category: 'mortgage' → COA 5300 (Mortgage Interest) and books the full amountUsd. Typical mortgage statements carry total payment (principal + interest + escrow). If the scraper is extracting the full payment the expense amount will be wrong — principal belongs on liability 2500 and escrow may need its own treatment. If the scraper already extracts interest-only, add that invariant as a comment here so reviewers don't have to re-derive it.


🔵 Minor

  • ledgerLog errors are swallowed. It's fire-and-forget, so audit-log failures are silent. Consistent with the comptroller pattern, but worth noting if audit completeness is required.
  • No test coverage. This is financial write path — upsert correctness and the idempotency behaviour are the most important things to exercise. Even a single integration test (real SQLite, insert then re-upsert, assert 1 row) would significantly reduce risk.
  • accountExternalId leaks in error messages. "No account with external_id='chittyos-infra' for this tenant" confirms internal account names to unauthenticated callers (today) or any authenticated caller (after the auth fix). Consider a generic account_missing message without the id value in production.

What's working well

  • Zod schema with .positive() on amountUsd — the "never default to 0" invariant is correctly enforced at both the route and storage layers.
  • COA mapping with accounting rationale in comments (registered-agent → 5050 not 6040, mortgage interest-only) is exactly the right level of comment for non-obvious domain decisions.
  • onConflictDoUpdate with the targetWhere partial-index predicate is correct Drizzle idiom.
  • Separating the ingest contract from the raw scrape data shape is the right call — portal data formats diverge, the charge envelope normalises them.

Must fix before merge: the missing protectedPrefixes entry (#1) and the metadata spread clobber (#2). The idempotency key fragility (#3) and utility fallback (#4) are strong-prefer-fix. The mortgage amount question (#5) needs a comment at minimum confirming the scraper's contract.

Canonical "ChittyScrape feeds the cost flow" wiring. ChittyScrape
(scrape.chitty.cc) extracts vendor charges from portals with no API
(registered-agent fees, utilities, mortgage). This adds a real ChittyFinance
ingest that turns a ChittyScrape result envelope + resolved charge into an
idempotent expense transaction, reusing the PR #133 comptroller pattern.

- storage.upsertVendorCharge: idempotent by external_id
  `scrape:{portalId}:{period}:{vendor}`, type='expense', enforced by the
  partial unique index transactions_tenant_external_idx (external_id NOT NULL).
  amount is REQUIRED (> 0) and decimal(12,2); exact USD + paymentStatus +
  category preserved in metadata. onConflictDoUpdate targetWhere matches the
  partial arbiter.
- POST /api/vendor-charge/ingest: validates the envelope + charge (zod),
  rejects success=false (422) and a missing/zero amount (400, never defaults
  to 0), maps vendor category -> real COA code, books against the account by
  external_id (default chittyos-infra), audit-logs via ledgerLog.
- COA mapping (verified against seeded COA on Neon solitary-rice-14149088, no
  invented codes): registered-agent -> 5050 Legal & Professional Fees;
  utilities -> 5100/5110/5120/5130/5140; mortgage -> 5300 Mortgage Interest.

nw-registered-agent mapping: scraper annualFeeUsd -> charge.amountUsd,
paymentStatus ('ok'|'failed') -> charge.paymentStatus, portal
'nw-registered-agent' -> category 'registered-agent' -> COA 5050, period =
filing year.

Verified end-to-end on real Neon (solitary-rice-14149088): upserted a real
Northwest Registered Agent $125/yr charge via the exact storage SQL, selected
it back (COA 5050, type expense), re-ran the upsert — same row id, 1 row, no
duplicate (idempotency proven). npm run check clean.

NOTE: the live portal scrape is credential-gated (NW login via ChittyConnect);
the registration + ingest wiring + mapping are real and verified, but the
live scrape -> ingest hop is the chico/ChittyConnect follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 993a28dea4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread server/app.ts
app.route('/', classificationRoutes);
app.route('/', emailRoutes);
app.route('/', comptrollerRoutes);
app.route('/', vendorChargeRoutes);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Add vendor-charge to protected API middleware

Mounting this route without adding /api/vendor-charge to protectedPrefixes leaves POST /api/vendor-charge/ingest outside the auth/tenant/storage middleware that the handler assumes. I checked server/app.ts: only the prefixes in lines 102-107 get storageMiddleware, hybridAuth, and tenantMiddleware, so a valid ingest request reaches storage.getAccountByExternalId(...) with storage/tenantId unset and fails instead of booking the charge; it also bypasses the intended service-token/tenant checks for a route that writes transactions.

Useful? React with 👍 / 👎.

Comment thread server/storage/system.ts
classifiedAt: now,
metadata,
})
.onConflictDoUpdate({

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prevent scraper upserts from changing reconciled rows

When the same scrape:{portal}:{period}:{vendor} charge already exists, this conflict branch updates the transaction unconditionally, including amount, coaCode, and metadata. If the finance team reconciles that transaction and the scrape cron later replays or extracts a corrected amount/category, the locked row is mutated without the reconciled-row guard used by trust-path updates, corrupting a reconciled period.

Useful? React with 👍 / 👎.

);
}

const category = charge.category ?? PORTAL_DEFAULT_CATEGORY[envelope.portal];

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not trust the scraper-supplied category as authoritative

Because charge.category takes precedence over the portal default and is immediately mapped into an authoritative coaCode, any caller that reaches this ingest can book a known portal under a different COA (for example, posting category: 'mortgage' for nw-registered-agent). Even with the route protected, this delegates classification authority to the scrape payload rather than validating the category against the portal or requiring an executor role, so a bad scrape result can misclassify tax/reporting data.

Useful? React with 👍 / 👎.

Comment thread server/storage/system.ts

const [row] = await this.db
.insert(schema.transactions)
.values({

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve property attribution for portal expenses

The new ingest path supports utilities and mortgage statements, but the inserted transaction never carries a propertyId (and the request schema has no way to pass one). Property P&L is built from getPropertyTransactions(propertyId, ...), and Schedule E puts no-property transactions into entity-level handling, so scraped property-specific bills will be missing from the property's financials/tax columns even though they were booked.

Useful? React with 👍 / 👎.

@chitcommit chitcommit force-pushed the feat/vendor-charge-ingest branch from 993a28d to 456ef72 Compare June 11, 2026 01:07
@github-actions

Copy link
Copy Markdown
Contributor

@coderabbitai review

Please evaluate:

  • Security implications
  • Credential exposure risk
  • Dependency supply chain concerns
  • Breaking API changes

@claude

claude Bot commented Jun 11, 2026

Copy link
Copy Markdown

Code Review — feat(vendor-charge): ingest ChittyScrape portal charges into the books

Overall this is a clean, well-scoped PR. The idempotency approach, Zod validation, and tenant-scoping pattern all follow established project conventions. A few issues to address before merge.


Bugs / Correctness

utility generic COA mapping silently picks Internet/Cable (5140)

utility: '5140', // generic utility fallback (internet/cable) when unspecified

A caller passing category: 'utility' gets billed under Internet/Cable, which is wrong for electric, gas, or water. This is a silent misclassification — no error, no warning. Better to fail loudly:

// Remove the 'utility' fallback entry entirely.
// If the portal default or explicit category is 'utility', the unknown_category
// guard below already returns 400 with the full list of known codes.

If a true parent-level "utility catch-all" COA code exists in the chart, add it with the right code and a clear comment; otherwise let the 400 surface the problem.


ExternalId uses vendor name verbatim — vendor-name drift creates duplicate rows

const externalId = `scrape:${input.portalId}:${input.period}:${input.vendor}`;

If the scraper emits "Northwest Registered Agent" on one run and "Northwest Registered Agent LLC" on another (different capitalisation, trailing suffix, etc.), a second row is silently created instead of the upsert firing. Since the PR's stated goal is idempotency, the caller should normalise vendor names, or the key should omit the vendor and use only portalId + period (which is likely sufficient for uniqueness per tenant + portal + billing cycle):

const externalId = `scrape:${input.portalId}:${input.period}`;

Worth an explicit decision here — at minimum document the normalisation contract.


Upsert does not update date, payee, or description on re-run

The onConflictDoUpdate.set updates amount, coaCode, metadata, updatedAt but leaves the original date, payee, and description frozen. If a re-scrape carries a corrected date or a fixed vendor name, those fields silently stay stale. Intentional "preserve the original booking date" behaviour is fine, but payee and description should probably follow the latest scrape. At a minimum, add a comment explaining the deliberate freeze.


Security

No explicit auth contract for ChittyCommand → this endpoint

The PR description says the route is "driven by whatever dispatches the scrape (ChittyCommand cron)". The route sits behind authAndContext, which is correct for user-facing calls, but automated cron callers typically need a service token. It is not obvious from this PR whether hybridAuth handles a service token issued to ChittyCommand or whether a user JWT is expected. Clarify in the PR description (or a short inline comment) how ChittyCommand authenticates — this is the kind of thing that bites during an on-call at 2 am.


Code Quality / Conventions

Comment density exceeds CLAUDE.md guidance

CLAUDE.md: "Only add [a comment] when the WHY is non-obvious." The top-of-file architectural narrative and the multi-paragraph storage docblock mostly duplicate the PR description. The partial-index targetWhere comment is the one genuinely non-obvious piece and belongs; everything else can be trimmed. Specifically:

  • The 17-line opening block in vendor-charge.ts should be removed (it's already in the PR body).
  • The 10-line docblock above upsertVendorCharge in system.ts can collapse to one line calling out the idempotency key and the "never default to $0" invariant.

COA comments in VENDOR_CATEGORY_COA

The inline comments on each COA code (explaining why 5050 vs 6040, etc.) are the right kind of non-obvious context — keep those.


Performance / Minor

  • The two sequential DB calls (getAccountByExternalIdupsertVendorCharge) are fine; no N+1 concern for a low-frequency cron endpoint.
  • ledgerLog fire-and-forget via waitUntil is correct and matches the comptroller route pattern ✓
  • Zod .positive() on amountUsd correctly enforces > 0 at the schema layer ✓

Test Coverage

Route handler + zod branches are typecheck-only (acknowledged). Before this ships to production, the following cases should have at minimum a happy-path integration test:

  1. First ingest creates the row (returns recorded: true).
  2. Re-ingest with the same key updates amount without creating a duplicate.
  3. success=false envelope → 422.
  4. Zero/missing amount → 400.

Dependency Note

This is stacked on unmerged #133. The getAccountByExternalId method, INFRA_ACCOUNT_EXTERNAL_ID constant, and transactions_tenant_external_idx partial index all land in #133. Merge order must be #133#134; CI on this branch will remain red until #133 merges.


Summary

utility5140 silent fallback Fix before merge
Vendor name drift in externalId Decide + document
Upsert field freeze Document or fix payee/description
Service auth contract for ChittyCommand Clarify in PR
Comment density Trim to match CLAUDE.md standard
Test coverage Add before production promotion

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 456ef7250b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

);
}

const date = charge.date ? new Date(charge.date) : new Date(envelope.scrapedAt);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require a period date before booking scraped charges

When charge.date is omitted, this records the transaction on envelope.scrapedAt even though charge.period is the billing period. A late or backfilled scrape such as period: "2025" scraped on 2026-01-15 will be filtered into the 2026 tax/reporting window because reports use transactions.date rather than the metadata period, so the expense is omitted from the intended year. Require charge.date for periodized bills or derive the transaction date from charge.period before inserting.

Useful? React with 👍 / 👎.

Comment thread server/storage/system.ts
paymentStatus?: string; // scraped 'ok' | 'failed' — recorded, does not gate the expense
metadata?: Record<string, unknown>;
}) {
const externalId = `scrape:${input.portalId}:${input.period}:${input.vendor}`;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include account identity in the scrape idempotency key

For portals that can have more than one bill in the same period, such as two ComEd utility accounts or two mortgage accounts, both ingests use the same scrape:{portalId}:{period}:{vendor} key even when the caller passes different accountExternalIds. The second charge conflicts with the first and updates its amount/metadata instead of creating a separate transaction, so one account's bill is lost from the books; include an account, meter, or statement identifier in the key for multi-account vendors.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant