FD-324: real-chat LCC measurement + failure-mode classification by patrickkidd · Pull Request #123 · patrickkidd/btcopilot

patrickkidd · 2026-06-02T05:59:05Z

Summary

Adds --accumulate and --dump-disconnected modes to connectivity_check.py for reproducible real-chat LCC measurement (accumulates discussions in order, committing PDP to DiagramData between discussions — mirrors live diagram growth)
Documents real-chat LCC baseline/fixed measurements and failure-mode classification in PROMPT_ENGINEERING_LOG.md

Results

Source	Baseline LCC%	Fixed LCC%	AC2 met?
1924 Patrick (discs 55,58,60)	23.1%	30.0%	No — content-bounded
1589 Guillermo (discs 28,57)	84.6%	88.5%	Yes
Synthetic GT avg (6 discs)	79.1%	86.2%	Yes

F1 no-regression (2 runs, 6 GT synthetic): Agg 0.654/0.633 — within known ±0.05–0.10 noise vs 0.651 baseline.

Failure-mode classification (AC4)

(a) Duplicates: Accepted — rare, no systematic pattern
(b) Implicit spouse/sibling: Accepted out-of-scope — fix requires name-matching (explicitly rejected by ticket)
(c) Truly isolated mentions: Accepted out-of-scope — source text lacks relationship context; fabricating structure would be hallucination

Patrick's 30% LCC is content-bounded: the source conversation does not provide enough relationship structure to reach 80% regardless of prompt engineering.

Reproducible verification commands

# Real-chat LCC (Patrick)
FDSERVER_PROMPTS_PATH=~/worktrees/FD-324/fdserver/prompts/private_prompts.py \
  GOOGLE_GEMINI_API_KEY=<key> \
  uv run python -m btcopilot.training.connectivity_check --accumulate 55,58,60

# Real-chat LCC (Guillermo)
FDSERVER_PROMPTS_PATH=~/worktrees/FD-324/fdserver/prompts/private_prompts.py \
  GOOGLE_GEMINI_API_KEY=<key> \
  uv run python -m btcopilot.training.connectivity_check --accumulate 28,57

# F1 (6 GT synthetic)
FDSERVER_PROMPTS_PATH=~/worktrees/FD-324/fdserver/prompts/private_prompts.py \
  GOOGLE_GEMINI_API_KEY=<key> \
  uv run python -m btcopilot.training.run_extract_full_f1

🤖 Generated with Claude Code

…cript Deterministic post-extraction repair sets Person.parents from birth events when the LLM fails to forward-reference the PairBond ID. ParentChild F1 0.366→0.815 N=3 avg (+123%), LCC 51%→89.5% (target ≥80% met), no regression. Ships connectivity_check.py script and 5 unit tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- connectivity_check.py: add --accumulate and --dump-disconnected modes for reproducible real-chat LCC measurement (accumulates discussions in order, commits PDP to DiagramData between discussions) - PROMPT_ENGINEERING_LOG.md: document real-chat LCC baseline/fixed results, failure-mode classification for both diagrams, F1 no-regression (2 runs) Real-chat results with fix: Patrick 1924: 23.1% → 30.0% (content-bounded, not fixable) Guillermo 1589: 84.6% → 88.5% (AC2 met) Synthetic avg: 79.1% → 86.2% (AC2 met) F1: Agg 0.654/0.633 (2 runs, within noise vs 0.651 baseline) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces a deterministic repair function, infer_parents_from_birth_events, to resolve a forward-referencing issue where Person.parents fields were left null because pair bond IDs were not yet computed. This change significantly improves the ParentChild F1 score and family-tree connectivity (LCC %). Additionally, a new connectivity_check.py script is added to measure LCC percentages across various scenarios, along with comprehensive tests and updated documentation. Feedback on the changes suggests optimizing dyad normalization in the repair function to avoid sorting overhead and improve type safety, as well as correcting a comment in the connectivity check script that incorrectly describes a DFS implementation as BFS.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-02T06:00:57Z

+    bond_by_dyad: dict[tuple[int, int], int] = {}
+    for pb in deltas.pair_bonds:
+        if pb.id is not None and pb.person_a is not None and pb.person_b is not None:
+            bond_by_dyad[tuple(sorted([pb.person_a, pb.person_b]))] = pb.id  # type: ignore[arg-type]
+
+    person_map = {p.id: p for p in deltas.people if p.id is not None}
+
+    for event in deltas.events:
+        if event.kind not in (EventKind.Birth, EventKind.Adopted):
+            continue
+        if event.child is None or event.person is None or event.spouse is None:
+            continue
+        dyad = tuple(sorted([event.person, event.spouse]))
+        pb_id = bond_by_dyad.get(dyad)  # type: ignore[arg-type]


The current implementation uses sorted() and tuple() to normalize the person dyads, which introduces unnecessary list creation and sorting overhead. Additionally, it requires # type: ignore[arg-type] to bypass type-checking warnings.

We can optimize this by using a simple conditional expression (a, b) if a < b else (b, a) to construct the sorted tuple. This is faster, avoids memory allocation for lists, and is fully type-safe without needing any type ignores.

Suggested change

bond_by_dyad: dict[tuple[int, int], int] = {}

for pb in deltas.pair_bonds:

if pb.id is not None and pb.person_a is not None and pb.person_b is not None:

bond_by_dyad[tuple(sorted([pb.person_a, pb.person_b]))] = pb.id # type: ignore[arg-type]

person_map = {p.id: p for p in deltas.people if p.id is not None}

for event in deltas.events:

if event.kind not in (EventKind.Birth, EventKind.Adopted):

continue

if event.child is None or event.person is None or event.spouse is None:

continue

dyad = tuple(sorted([event.person, event.spouse]))

pb_id = bond_by_dyad.get(dyad) # type: ignore[arg-type]

bond_by_dyad: dict[tuple[int, int], int] = {}

for pb in deltas.pair_bonds:

if pb.id is not None and pb.person_a is not None and pb.person_b is not None:

a, b = pb.person_a, pb.person_b

bond_by_dyad[(a, b) if a < b else (b, a)] = pb.id

person_map = {p.id: p for p in deltas.people if p.id is not None}

for event in deltas.events:

if event.kind not in (EventKind.Birth, EventKind.Adopted):

continue

if event.child is None or event.person is None or event.spouse is None:

continue

p_a, p_b = event.person, event.spouse

dyad = (p_a, p_b) if p_a < p_b else (p_b, p_a)

pb_id = bond_by_dyad.get(dyad)

gemini-code-assist · 2026-06-02T06:00:57Z

+    # BFS connected components
+    visited: set[int] = set()
+    component_sizes: list[int] = []
+    for start in nodes:
+        if start in visited:
+            continue
+        queue = [start]
+        visited.add(start)
+        size = 0
+        while queue:
+            cur = queue.pop()
+            size += 1
+            for nb in adj[cur]:
+                if nb not in visited:
+                    visited.add(nb)
+                    queue.append(nb)
+        component_sizes.append(size)


The comment states # BFS connected components, but the implementation uses queue.pop() (without an index), which removes and returns the last element of the list. This implements a Last-In-First-Out (LIFO) stack, resulting in a Depth-First Search (DFS) rather than a Breadth-First Search (BFS).

While both DFS and BFS are perfectly correct for finding connected components and their sizes, we should update the comment to accurately reflect the DFS implementation, or use collections.deque with popleft() if BFS is specifically desired.

Suggested change

# BFS connected components

visited: set[int] = set()

component_sizes: list[int] = []

for start in nodes:

if start in visited:

continue

queue = [start]

visited.add(start)

size = 0

while queue:

cur = queue.pop()

size += 1

for nb in adj[cur]:

if nb not in visited:

visited.add(nb)

queue.append(nb)

component_sizes.append(size)

# DFS connected components

visited: set[int] = set()

component_sizes: list[int] = []

for start in nodes:

if start in visited:

continue

queue = [start]

visited.add(start)

size = 0

while queue:

cur = queue.pop()

size += 1

for nb in adj[cur]:

if nb not in visited:

visited.add(nb)

queue.append(nb)

component_sizes.append(size)

lcc_percent keeps the User as a connecting node (the proband hub) and excludes only User+Assistant from the count, per the agreed AC#2 definition; the Assistant is dropped from the graph entirely. Deleting the User node fragmented correctly- extracted families (everyone connects through the proband). Adds 4 tests; fixes a DFS-mislabeled-as-BFS comment (Gemini review). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Disproved by transcript evidence: disc 60 explicitly states the orphans' parents (Jim+Sheila → Anthony/Joseph/Julia) and the user demanded the link, yet extraction sets ~0-3 parents. The gap is architectural (single-shot under-extraction of a 200+ statement conversation + no cross-session parent back-fill), addressable via the FD-319 cursor/windowing re-extraction — not a content ceiling. Four prompt-directive variants left Patrick within noise. Original wording struck through, not deleted. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

# Conflicts: # btcopilot/tests/personal/test_pdp.py # btcopilot/training/connectivity_check.py # doc/PROMPT_ENGINEERING_LOG.md

patrickkidd and others added 2 commits May 20, 2026 08:50

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

patrickkidd and others added 3 commits June 2, 2026 08:44

Merge remote-tracking branch 'origin/master' into FD-324

3be144c

# Conflicts: # btcopilot/tests/personal/test_pdp.py # btcopilot/training/connectivity_check.py # doc/PROMPT_ENGINEERING_LOG.md

patrickkidd merged commit f910022 into master Jun 2, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FD-324: real-chat LCC measurement + failure-mode classification#123

FD-324: real-chat LCC measurement + failure-mode classification#123
patrickkidd merged 5 commits into
masterfrom
FD-324

patrickkidd commented Jun 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

patrickkidd commented Jun 2, 2026

Summary

Results

Failure-mode classification (AC4)

Reproducible verification commands

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant