Skip to content

feat: canonical model and synthetic data#1

Merged
ImadCreates merged 2 commits into
mainfrom
feat/model-and-data
Jun 21, 2026
Merged

feat: canonical model and synthetic data#1
ImadCreates merged 2 commits into
mainfrom
feat/model-and-data

Conversation

@ImadCreates

@ImadCreates ImadCreates commented Jun 21, 2026

Copy link
Copy Markdown
Owner

FleetBridge phase 2: common model, seeded generator, three raw provider formats, data tests

Summary by CodeRabbit

  • New Features
    • Added a synthetic fleet data generator that produces a canonical ground-truth dataset plus provider-specific raw ping files.
    • Introduced a shared core data model for providers, vehicles, locations, and safety events.
    • Added a deterministic random number generator utility for reproducible data generation.
  • Tests
    • Added data-integrity checks validating canonical datasets, safety event consistency, and raw/provider completeness.
  • Chores
    • Updated TypeScript configuration to include Node.js type definitions.

Add canonical model (model.ts), seeded PRNG (rng.ts), and a deterministic generator (scripts/generate.ts) producing truth.json plus three incompatible raw provider formats (northwind/haulix/tracpoint) under src/data. Add data integrity tests. Expose node types to the app tsconfig so fs-based tests type-check.
Copilot AI review requested due to automatic review settings June 21, 2026 01:49
@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a deterministic fleet data generator (scripts/generate.ts) that writes a canonical truth dataset and three provider-specific raw ping files to src/data/. Introduces a core TypeScript data model (src/core/model.ts), a seeded mulberry32 RNG (src/core/rng.ts), and a Vitest fixture validation suite (src/__tests__/data.test.ts). Updates tsconfig.app.json to include the node type package.

Changes

Fleet Data Generator and Validation

Layer / File(s) Summary
Core data model and deterministic RNG
src/core/model.ts, src/core/rng.ts, tsconfig.app.json
Defines Provider, Vehicle, Location, SafetyEventType, and SafetyEvent interfaces plus provider constants (NORTHWIND, HAULIX, TRACPOINT, PROVIDERS). Adds mulberry32-backed Rng type with DEFAULT_SEED, createRng, randRange, and randInt helpers. Adds node to compilerOptions.types.
Generator setup, corridor config, and geospatial helpers
scripts/generate.ts (lines 1–178)
Imports Node fs/path, model, and RNG modules; declares LatLng tuple type; defines GTA corridor waypoint arrays, the eight-vehicle FLEET assignment table across providers and corridors, provider code mapping, timing constants (SAMPLE_INTERVAL_S, SAMPLES, BASE_TIME), and haversine distance, bearing, degree/radian conversion, and rounding math helpers.
Speed profile, track construction, and safety-event derivation
scripts/generate.ts (lines 179–347)
Generates per-vehicle speed profiles with ramp-up/ramp-down phases, injects random idle windows, guarantees two >100 km/h speeding bursts; samples corridor polylines using arc-length integration; derives GPS headings from successive points; builds Location[] records and deduplicates SafetyEvent[] (harsh accel/brake, speeding, idling) sorted by timestamp.
Provider encoders and generator main routine
scripts/generate.ts (lines 348–527)
Encodes canonical locations into Northwind (mph + kind strings), Haulix (km/h + event codes), and Tracpoint (VIN + position/velocity/numeric codes) ping shapes with unit conversions; implements main() to seed the RNG, generate all vehicles/tracks/events, partition pings by provider, write truth.json and three raw JSON files to src/data/raw, and log aggregate ping counts.
Data fixture validation tests
src/__tests__/data.test.ts
Loads generated truth.json and three raw provider fixtures; asserts truth non-emptiness, strictly increasing per-vehicle timestamps, all coordinates within GTA bounds, non-negative speeds, and that safety events exist, fall within vehicle timestamp ranges, reference valid vehicle IDs, and that the sum of raw ping counts equals truth location count.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 Hop along the GTA roads so wide,
Eight little trucks on corridors glide,
A seeded RNG plants the trail,
Northwind, Haulix, Tracpoint detail—
Truth.json blooms where the test files bide!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: canonical model and synthetic data' clearly and concisely summarizes the main changes: introduction of core data model interfaces and a synthetic data generation system.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/model-and-data

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds FleetBridge “phase 2” data foundations: a canonical telematics model plus deterministic, seeded synthetic data generation that emits a ground-truth dataset and incompatible raw provider fixtures, along with tests that sanity-check the generated data.

Changes:

  • Introduces canonical model types (Provider, Vehicle, Location, SafetyEvent) and provider constants.
  • Adds a deterministic PRNG and a scripts/generate.ts generator to produce reproducible truth + raw provider datasets.
  • Adds data integrity tests and updates TypeScript config to support Node APIs used by the tests.

Reviewed changes

Copilot reviewed 5 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tsconfig.app.json Adds Node type definitions so tests under src/ can use node:* imports.
src/data/raw/tracpoint.json Adds TracPoint raw fixture data produced by the generator.
src/core/rng.ts Adds a seeded PRNG helper (mulberry32) plus randRange/randInt.
src/core/model.ts Adds canonical FleetBridge data model interfaces and provider constants.
src/tests/data.test.ts Adds sanity checks for truth dataset invariants and raw-vs-truth counts.
scripts/generate.ts Adds deterministic synthetic data generator and raw provider encoders.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/generate.ts Outdated
Comment on lines +220 to +225
// Guarantee a couple of clear speeding bursts above 100 km/h.
const burst1 = randInt(rng, ramp + 2, Math.floor(n / 2))
const burst2 = randInt(rng, Math.floor(n / 2), n - decel - 2)
for (const b of [burst1, burst2]) {
if (!inAnyIdle(idleWindows, b)) speed[b] = randRange(rng, 103, 109)
}

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
scripts/generate.ts (1)

220-225: 💤 Low value

Comment says "Guarantee" but bursts can be skipped.

If a randomly selected burst index falls within an idle window, the burst is silently skipped (line 224). While the probability of both bursts being skipped is very low (~0.4%), the comment "Guarantee a couple of clear speeding bursts" is technically inaccurate.

This is unlikely to cause test failures since the cruise speed (98-110 km/h) will trigger speeding events anyway, but consider either:

  • Retrying burst placement until it lands outside idle windows, or
  • Adjusting the comment to "Attempt to inject..."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/generate.ts` around lines 220 - 225, The comment "Guarantee a couple
of clear speeding bursts above 100 km/h" is inaccurate because the burst
placement logic in the loop iterating over burst1 and burst2 can silently skip
bursts if they fall within idle windows (via the inAnyIdle check). Either
implement retry logic to guarantee burst placement by repeatedly calling randInt
until inAnyIdle returns false, or update the comment to reflect that bursts are
only "attempted" rather than "guaranteed" to be placed outside idle windows.
src/__tests__/data.test.ts (1)

7-11: ⚡ Quick win

Consider adding validation for the events array.

The Truth interface includes events: SafetyEvent[], and events are loaded at module scope but never tested. Adding basic validation (non-empty check, timestamp alignment with locations, vehicleId references) would improve coverage of the canonical data integrity.

🧪 Suggested test case
   it('never reports a negative speed', () => {
     for (const loc of truth.locations) {
       expect(loc.speedKmh).toBeGreaterThanOrEqual(0)
     }
   })
+
+  it('includes valid safety events', () => {
+    expect(truth.events.length).toBeGreaterThan(0)
+    const vehicleIds = new Set(truth.vehicles.map((v) => v.id))
+    const locationTimestamps = new Set(truth.locations.map((l) => l.timestamp))
+    for (const event of truth.events) {
+      expect(vehicleIds.has(event.vehicleId)).toBe(true)
+      expect(locationTimestamps.has(event.timestamp)).toBe(true)
+    }
+  })
 })
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/__tests__/data.test.ts` around lines 7 - 11, The Truth interface's events
array property is not being validated in tests. Add test cases that verify the
canonical events data integrity by checking that the events array is not empty,
that event timestamps align with the location data, and that all vehicleId
references in events correspond to valid vehicles in the vehicles array. This
will ensure the SafetyEvent array maintains data consistency with the other
properties of the Truth interface.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@scripts/generate.ts`:
- Around line 220-225: The comment "Guarantee a couple of clear speeding bursts
above 100 km/h" is inaccurate because the burst placement logic in the loop
iterating over burst1 and burst2 can silently skip bursts if they fall within
idle windows (via the inAnyIdle check). Either implement retry logic to
guarantee burst placement by repeatedly calling randInt until inAnyIdle returns
false, or update the comment to reflect that bursts are only "attempted" rather
than "guaranteed" to be placed outside idle windows.

In `@src/__tests__/data.test.ts`:
- Around line 7-11: The Truth interface's events array property is not being
validated in tests. Add test cases that verify the canonical events data
integrity by checking that the events array is not empty, that event timestamps
align with the location data, and that all vehicleId references in events
correspond to valid vehicles in the vehicles array. This will ensure the
SafetyEvent array maintains data consistency with the other properties of the
Truth interface.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: e1b53d94-4382-4e48-9562-2c102773ff78

📥 Commits

Reviewing files that changed from the base of the PR and between cf33b61 and aad61db.

📒 Files selected for processing (10)
  • scripts/generate.ts
  • src/__tests__/data.test.ts
  • src/core/model.ts
  • src/core/rng.ts
  • src/data/raw/.gitkeep
  • src/data/raw/haulix.json
  • src/data/raw/northwind.json
  • src/data/raw/tracpoint.json
  • src/data/truth.json
  • tsconfig.app.json

Replace the burst placement with a pickBurst helper that retries randInt to find an index outside all idle windows (linear-scan then range-start fallback) and always writes both bursts, so a burst is no longer silently skipped. Regenerate committed data. Add truth.events tests: non-empty, every event within its vehicle's location time span, and every referenced vehicleId exists.
@ImadCreates ImadCreates requested a review from Copilot June 21, 2026 02:04

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/__tests__/data.test.ts`:
- Around line 69-70: The variable `span` is declared twice in the same scope
within the test file, causing a TypeScript compile error. Remove the duplicate
declaration of the Map variable `span` that appears at lines 69-70 in the
data.test.ts file, keeping only one declaration of the span variable with the
type Map<string, { min: number; max: number }>.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 6bc88243-1246-416d-a698-8afd3c6021f4

📥 Commits

Reviewing files that changed from the base of the PR and between aad61db and 76e38a5.

📒 Files selected for processing (6)
  • scripts/generate.ts
  • src/__tests__/data.test.ts
  • src/data/raw/haulix.json
  • src/data/raw/northwind.json
  • src/data/raw/tracpoint.json
  • src/data/truth.json
🚧 Files skipped from review as they are similar to previous changes (1)
  • scripts/generate.ts

Comment on lines +69 to +70
const span = new Map<string, { min: number; max: number }>()
for (const loc of truth.locations) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Remove duplicate span declaration (compile-time blocker).

span is declared twice in the same scope, which causes a TypeScript compile error and prevents the test suite from running.

Suggested fix
   it('places every event within its vehicle location time span', () => {
-    const span = new Map<string, { min: number; max: number }>()
     const span = new Map<string, { min: number; max: number }>()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/__tests__/data.test.ts` around lines 69 - 70, The variable `span` is
declared twice in the same scope within the test file, causing a TypeScript
compile error. Remove the duplicate declaration of the Map variable `span` that
appears at lines 69-70 in the data.test.ts file, keeping only one declaration of
the span variable with the type Map<string, { min: number; max: number }>.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 10 changed files in this pull request and generated 3 comments.

Comment thread scripts/generate.ts
Comment on lines +35 to +37

const CORRIDORS: Record<string, Corridor> = {
hwy401: {
Comment thread src/core/model.ts
Comment on lines +16 to +24
export interface Location {
vehicleId: string
/** ISO 8601 timestamp, e.g. "2026-06-15T13:00:10.000Z". */
timestamp: string
lat: number
lng: number
speedKmh: number
headingDeg: number
}
Comment thread src/core/model.ts
Comment on lines +42 to +46
export const NORTHWIND: Provider = { id: 'northwind', name: 'Northwind' }
export const HAULIX: Provider = { id: 'haulix', name: 'Haulix' }
export const TRACPOINT: Provider = { id: 'tracpoint', name: 'TracPoint' }

export const PROVIDERS: Provider[] = [NORTHWIND, HAULIX, TRACPOINT]
@ImadCreates ImadCreates merged commit a1b41df into main Jun 21, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants