127-Fix inflated Code Output LoC + add net new AI LoC#131
Conversation
- Implemented `edit-loc-diff.ts` to handle incremental line counting and track added/removed lines during editing sessions. - Introduced functions to apply text edits, count lines, and determine added/removed lines using a multiset approach. - Created unit tests in `edit-loc-diff.test.ts` to validate the functionality of line counting and edit operations. - Ensured compatibility with different editing models (e.g., apply_patch and ranged edits) to accurately reflect changes.
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
There was a problem hiding this comment.
Pull request overview
This PR fixes inflated “AI-Generated LoC” for VS Code Copilot agent edit sessions by replacing naive whole-payload newline summation with an incremental per-file diff that counts only newly introduced lines, and it introduces removed/net AI LoC metrics surfaced through analytics, UI, docs, and tests.
Changes:
- Add a new incremental edit LoC diff engine to deduplicate repeated whole-file snapshots and track added vs removed lines.
- Thread new
{added, removed}edit LoC through parsing, caching/serialization, analyzers, and analytics types (including net totals and daily breakdowns). - Update the Output webview and documentation to show net-oriented charts and clarify metric meaning, with expanded unit test coverage.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/webview/page-output.ts | Updates Code Output UI to include net metrics and added/removed/net charts with new tab sets. |
| src/webview/page-dashboard.ts | Extends dashboard’s empty CodeProductionData shape to include removed/net fields. |
| src/core/warm-up-worker.ts | Updates warm-up worker typing to accept the new EditLocIndex shape. |
| src/core/types/analytics-types.ts | Adds removed/net fields to CodeProductionData and daily breakdown records. |
| src/core/summary-export.test.ts | Updates summary export test fixtures to include removed/net production fields. |
| src/core/parser.ts | Switches parse pipeline and worker payload typing to the new EditLocIndex entry shape. |
| src/core/parser-vscode.ts | Replaces naive edit newline counting with accumulateEditLoc + initial-content resolver support. |
| src/core/parser-vscode.test.ts | Adds tests for parseEditState incremental counting, corrupt JSON, and baseline seeding. |
| src/core/parser-shared.ts | Updates ParseContext.editLocIndex type to EditLocIndex. |
| src/core/edit-loc-diff.ts | New module implementing incremental line diffing and edit application for VS Code timelines. |
| src/core/edit-loc-diff.test.ts | New unit tests covering diff semantics, baselines, request attribution, and removed lines. |
| src/core/cache.ts | Bumps cache version and updates serialization/deserialization for {added, removed} edit LoC. |
| src/core/analyzer.ts | Updates analyzer facade to use the new EditLocIndex type. |
| src/core/analyzer.test.ts | Adds coverage for deduplicated edit LoC integration and removed/net rollups in production analytics. |
| src/core/analyzer-production.ts | Surfaces totalRemovedAiLoc / totalNetAiLoc and daily/per-dimension removed series. |
| src/core/analyzer-patterns.ts | Adjusts estimated LoC computation to use loc.added from edit LoC cells. |
| src/core/analyzer-config.ts | Updates constructor typing to accept EditLocIndex. |
| src/core/analyzer-base.ts | Updates shared request LoC calculation to use v.added from edit LoC cells. |
| scripts/benchmark-reload-stability.ts | Updates worker payload typing for the new edit LoC cell shape. |
| docs/content/measure/output.md | Updates metric documentation to explain net output and the new charts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /** Invokes `cb` once per newline-delimited segment (matching `split('\n')`), including a trailing empty segment. */ | ||
| function forEachLineHash(text: string, cb: (h: number) => void): void { | ||
| let segStart = 0; | ||
| for (let i = 0; i < text.length; i++) { | ||
| if (text.charCodeAt(i) === NEWLINE) { | ||
| cb(hashLineSlice(text, segStart, i)); | ||
| segStart = i + 1; | ||
| } | ||
| } | ||
| cb(hashLineSlice(text, segStart, text.length)); | ||
| } |
| /** Parses an edit-state JSON payload and accumulates AI-produced LoC into `editLocIndex`. */ | ||
| export function parseEditState(raw: string, editLocIndex: ParseContext['editLocIndex'], stateDir: string): void { | ||
| if (!raw.includes('"textEdit"')) return; | ||
| let state: EditState; | ||
| try { state = JSON.parse(raw) as EditState; } catch (e) { | ||
| warnCore('parser-vscode', `Corrupt state payload`, e); | ||
| return; | ||
| } | ||
| accumulateEditLoc(state.timeline, editLocIndex, makeInitialContentResolver(state, stateDir)); |
Fix logic in line counting and improve error logging for corrupted state payloads
…-output-loc-from-vs-code-edit-session-payloads # Conflicts: # cspell.json
aymenfurter
left a comment
There was a problem hiding this comment.
Built this branch and tested it in a clean VS Code profile. Works as expected.
One optimization idea while I was in the edit-loc diff code, not blocking this PR.
Incremental reconstruct + hash. Each edit op currently rebuilds the whole file string and re-hashes every line, which dominates the cost. Touch only the changed line region and splice the hash array, falling back to full rebuild for out-of-range edits (whole-file replaces, column overshoot). Expected ~2x faster diff with identical output. Can send as a follow-up.
Fix inflated Code Output LoC + add net new AI LoC
Fixes #127
Problem
The Code Output metric (
totalAiLoc/ "AI-Generated LoC") massively over-counted lines for VS Code Copilot agent sessions. Most counted lines were never produced by the model — they were unchanged lines that VS Code re-serializes into whole-file edit snapshots after every small patch.Mechanism:
apply_patchdiff (a few lines).textEditop intochatEditingSessions/<sid>/state.json→timeline.operations.\nin everytextEditpayload and summed them with no dedup or diffing, so a file edited N times had its full line count counted N times.The bias also ran the other way for ranged string-replace edits: it tracked the edit tool, which differs by model family.
apply_patch(whole-file rewrite)replace_string_in_file(ranged)At the workspace level this inflated one heavy workspace from a true ~94k to ~149k LoC (~37% over-count).
Fix
New module src/core/edit-loc-diff.ts replaces the naive newline-sum with incremental, tool-agnostic line counting:
O(payload chars)— the same asymptotic class as the old scan.This removes the apply_patch inflation and corrects the Anthropic under-count in one pass. Benchmarked extra cost across an entire install: ~57 ms total — negligible against multi-second parsing.
New: Net AI LoC
Because we now track removals, the Code Output tab gains a net view of the lasting footprint of AI edits:
totalRemovedAiLoc,totalNetAiLoc, plus dailyremovedLocand per-dimensiondailyRemovedBy{Workspace,Model,Harness}.Changes
src/core/edit-loc-diff.ts— new incremental diff/counting engine (+ tests).src/core/parser-vscode.ts— count via the new engine; track added/removed.src/core/analyzer-production.ts— surfacetotalRemovedAiLoc/totalNetAiLocand daily/per-dimension removed series.src/core/types/analytics-types.ts— new net/removed fields.src/webview/page-output.ts— Daily AI Code Output (net) + Net Code Output charts.docs/content/measure/output.md— updated metric docs.edit-loc-diff.test.ts,parser-vscode.test.ts,analyzer.test.ts,summary-export.test.ts.Validation
npm run buildandnpx vitest runpass.