127-Fix inflated Code Output LoC + add net new AI LoC by TamasBoncz · Pull Request #131 · microsoft/AI-Engineering-Coach

TamasBoncz · 2026-06-16T09:54:23Z

Fix inflated Code Output LoC + add net new AI LoC

Fixes #127

Problem

The Code Output metric (totalAiLoc / "AI-Generated LoC") massively over-counted lines for VS Code Copilot agent sessions. Most counted lines were never produced by the model — they were unchanged lines that VS Code re-serializes into whole-file edit snapshots after every small patch.

Mechanism:

The agent edits a file with a small apply_patch diff (a few lines).
After each patch, VS Code writes the entire resulting file as a textEdit op into chatEditingSessions/<sid>/state.json → timeline.operations.
The parser counted \n in every textEdit payload and summed them with no dedup or diffing, so a file edited N times had its full line count counted N times.

The bias also ran the other way for ranged string-replace edits: it tracked the edit tool, which differs by model family.

Model family	Edit tool	Counted vs real
OpenAI / Codex	`apply_patch` (whole-file rewrite)	1.48× over
Anthropic	`replace_string_in_file` (ranged)	0.87× under
Mixed (copilot/auto)	~68% apply_patch	1.06×

At the workspace level this inflated one heavy workspace from a true ~94k to ~149k LoC (~37% over-count).

Fix

New module src/core/edit-loc-diff.ts replaces the naive newline-sum with incremental, tool-agnostic line counting:

Reconstructs each file version from its baseline and counts only lines that are new versus the previous version of the same file.
Tracks both added and removed lines per (request, file).
Diff is a linear multiset comparison of line hashes (djb2, no substring allocation), so it stays O(payload chars) — the same asymptotic class as the old scan.
Fast paths: single-write files (~74%) and first snapshots skip the diff entirely; only repeated whole-file snapshots (the over-count case) pay the full cost.

This removes the apply_patch inflation and corrects the Anthropic under-count in one pass. Benchmarked extra cost across an entire install: ~57 ms total — negligible against multi-second parsing.

New: Net AI LoC

Because we now track removals, the Code Output tab gains a net view of the lasting footprint of AI edits:

New metrics: totalRemovedAiLoc, totalNetAiLoc, plus daily removedLoc and per-dimension dailyRemovedBy{Workspace,Model,Harness}.
Daily AI Code Output chart now shows net new lines per day (each edit compared to the previous file version, so re-saves and rewrites aren't double-counted).
New Net Code Output charts diverge added lines (above zero) against removed lines (below zero) with a net overlay, broken down by Model, Workspace, and Harness.

Changes

src/core/edit-loc-diff.ts — new incremental diff/counting engine (+ tests).
src/core/parser-vscode.ts — count via the new engine; track added/removed.
src/core/analyzer-production.ts — surface totalRemovedAiLoc / totalNetAiLoc and daily/per-dimension removed series.
src/core/types/analytics-types.ts — new net/removed fields.
src/webview/page-output.ts — Daily AI Code Output (net) + Net Code Output charts.
docs/content/measure/output.md — updated metric docs.
Tests added across edit-loc-diff.test.ts, parser-vscode.test.ts, analyzer.test.ts, summary-export.test.ts.

Validation

New unit tests cover whole-file re-serialization (no double-count), ranged edits (no under-count), additions, removals, and net.
npm run build and npx vitest run pass.

- Implemented `edit-loc-diff.ts` to handle incremental line counting and track added/removed lines during editing sessions. - Introduced functions to apply text edits, count lines, and determine added/removed lines using a multiset approach. - Created unit tests in `edit-loc-diff.test.ts` to validate the functionality of line counting and edit operations. - Ensured compatibility with different editing models (e.g., apply_patch and ranged edits) to accurately reflect changes.

github-actions · 2026-06-16T09:54:41Z

⚠️ Deprecation Warning: The deny-licenses option is deprecated for possible removal in the next major release. For more information, see issue 997.

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Copilot

Pull request overview

This PR fixes inflated “AI-Generated LoC” for VS Code Copilot agent edit sessions by replacing naive whole-payload newline summation with an incremental per-file diff that counts only newly introduced lines, and it introduces removed/net AI LoC metrics surfaced through analytics, UI, docs, and tests.

Changes:

Add a new incremental edit LoC diff engine to deduplicate repeated whole-file snapshots and track added vs removed lines.
Thread new {added, removed} edit LoC through parsing, caching/serialization, analyzers, and analytics types (including net totals and daily breakdowns).
Update the Output webview and documentation to show net-oriented charts and clarify metric meaning, with expanded unit test coverage.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/webview/page-output.ts	Updates Code Output UI to include net metrics and added/removed/net charts with new tab sets.
src/webview/page-dashboard.ts	Extends dashboard’s empty `CodeProductionData` shape to include removed/net fields.
src/core/warm-up-worker.ts	Updates warm-up worker typing to accept the new `EditLocIndex` shape.
src/core/types/analytics-types.ts	Adds removed/net fields to `CodeProductionData` and daily breakdown records.
src/core/summary-export.test.ts	Updates summary export test fixtures to include removed/net production fields.
src/core/parser.ts	Switches parse pipeline and worker payload typing to the new `EditLocIndex` entry shape.
src/core/parser-vscode.ts	Replaces naive edit newline counting with `accumulateEditLoc` + initial-content resolver support.
src/core/parser-vscode.test.ts	Adds tests for `parseEditState` incremental counting, corrupt JSON, and baseline seeding.
src/core/parser-shared.ts	Updates `ParseContext.editLocIndex` type to `EditLocIndex`.
src/core/edit-loc-diff.ts	New module implementing incremental line diffing and edit application for VS Code timelines.
src/core/edit-loc-diff.test.ts	New unit tests covering diff semantics, baselines, request attribution, and removed lines.
src/core/cache.ts	Bumps cache version and updates serialization/deserialization for `{added, removed}` edit LoC.
src/core/analyzer.ts	Updates analyzer facade to use the new `EditLocIndex` type.
src/core/analyzer.test.ts	Adds coverage for deduplicated edit LoC integration and removed/net rollups in production analytics.
src/core/analyzer-production.ts	Surfaces `totalRemovedAiLoc` / `totalNetAiLoc` and daily/per-dimension removed series.
src/core/analyzer-patterns.ts	Adjusts estimated LoC computation to use `loc.added` from edit LoC cells.
src/core/analyzer-config.ts	Updates constructor typing to accept `EditLocIndex`.
src/core/analyzer-base.ts	Updates shared request LoC calculation to use `v.added` from edit LoC cells.
scripts/benchmark-reload-stability.ts	Updates worker payload typing for the new edit LoC cell shape.
docs/content/measure/output.md	Updates metric documentation to explain net output and the new charts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

TamasBoncz · 2026-06-16T11:19:10Z

+/** Invokes `cb` once per newline-delimited segment (matching `split('\n')`), including a trailing empty segment. */
+function forEachLineHash(text: string, cb: (h: number) => void): void {
+  let segStart = 0;
+  for (let i = 0; i < text.length; i++) {
+    if (text.charCodeAt(i) === NEWLINE) {
+      cb(hashLineSlice(text, segStart, i));
+      segStart = i + 1;
+    }
+  }
+  cb(hashLineSlice(text, segStart, text.length));
+}


TamasBoncz · 2026-06-16T11:19:16Z

+/** Parses an edit-state JSON payload and accumulates AI-produced LoC into `editLocIndex`. */
+export function parseEditState(raw: string, editLocIndex: ParseContext['editLocIndex'], stateDir: string): void {
+  if (!raw.includes('"textEdit"')) return;
+  let state: EditState;
+  try { state = JSON.parse(raw) as EditState; } catch (e) {
+    warnCore('parser-vscode', `Corrupt state payload`, e);
+    return;
  }
+  accumulateEditLoc(state.timeline, editLocIndex, makeInitialContentResolver(state, stateDir));


Fix logic in line counting and improve error logging for corrupted state payloads

…-output-loc-from-vs-code-edit-session-payloads # Conflicts: # cspell.json

aymenfurter

Built this branch and tested it in a clean VS Code profile. Works as expected.

One optimization idea while I was in the edit-loc diff code, not blocking this PR.

Incremental reconstruct + hash. Each edit op currently rebuilds the whole file string and re-hashes every line, which dominates the cost. Touch only the changed line region and splice the hash array, falling back to full rebuild for out-of-range edits (whole-file replaces, column overshoot). Expected ~2x faster diff with identical output. Can send as a follow-up.

TamasBoncz requested a review from Copilot June 16, 2026 09:54

Copilot started reviewing on behalf of TamasBoncz June 16, 2026 09:55 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

Tamas Boncz added 2 commits June 16, 2026 13:18

Fix Code Review Findings

9f18a8c

Fix logic in line counting and improve error logging for corrupted state payloads

Merge remote-tracking branch 'origin/main' into 127-bug-inflated-code…

4b2632f

…-output-loc-from-vs-code-edit-session-payloads # Conflicts: # cspell.json

aymenfurter approved these changes Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

127-Fix inflated Code Output LoC + add net new AI LoC#131

127-Fix inflated Code Output LoC + add net new AI LoC#131
TamasBoncz wants to merge 3 commits into
mainfrom
127-bug-inflated-code-output-loc-from-vs-code-edit-session-payloads

TamasBoncz commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

TamasBoncz Jun 16, 2026

Uh oh!

TamasBoncz Jun 16, 2026

Uh oh!

aymenfurter left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

TamasBoncz commented Jun 16, 2026

Fix inflated Code Output LoC + add net new AI LoC

Problem

Fix

New: Net AI LoC

Changes

Validation

Uh oh!

github-actions Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

TamasBoncz Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

TamasBoncz Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

aymenfurter left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Jun 16, 2026 •

edited

Loading