Skip to content

FD-324: real-chat LCC measurement + failure-mode classification#123

Merged
patrickkidd merged 5 commits into
masterfrom
FD-324
Jun 2, 2026
Merged

FD-324: real-chat LCC measurement + failure-mode classification#123
patrickkidd merged 5 commits into
masterfrom
FD-324

Conversation

@patrickkidd
Copy link
Copy Markdown
Owner

Summary

  • Adds --accumulate and --dump-disconnected modes to connectivity_check.py for reproducible real-chat LCC measurement (accumulates discussions in order, committing PDP to DiagramData between discussions — mirrors live diagram growth)
  • Documents real-chat LCC baseline/fixed measurements and failure-mode classification in PROMPT_ENGINEERING_LOG.md

Results

Source Baseline LCC% Fixed LCC% AC2 met?
1924 Patrick (discs 55,58,60) 23.1% 30.0% No — content-bounded
1589 Guillermo (discs 28,57) 84.6% 88.5% Yes
Synthetic GT avg (6 discs) 79.1% 86.2% Yes

F1 no-regression (2 runs, 6 GT synthetic): Agg 0.654/0.633 — within known ±0.05–0.10 noise vs 0.651 baseline.

Failure-mode classification (AC4)

  • (a) Duplicates: Accepted — rare, no systematic pattern
  • (b) Implicit spouse/sibling: Accepted out-of-scope — fix requires name-matching (explicitly rejected by ticket)
  • (c) Truly isolated mentions: Accepted out-of-scope — source text lacks relationship context; fabricating structure would be hallucination

Patrick's 30% LCC is content-bounded: the source conversation does not provide enough relationship structure to reach 80% regardless of prompt engineering.

Reproducible verification commands

# Real-chat LCC (Patrick)
FDSERVER_PROMPTS_PATH=~/worktrees/FD-324/fdserver/prompts/private_prompts.py \
  GOOGLE_GEMINI_API_KEY=<key> \
  uv run python -m btcopilot.training.connectivity_check --accumulate 55,58,60

# Real-chat LCC (Guillermo)
FDSERVER_PROMPTS_PATH=~/worktrees/FD-324/fdserver/prompts/private_prompts.py \
  GOOGLE_GEMINI_API_KEY=<key> \
  uv run python -m btcopilot.training.connectivity_check --accumulate 28,57

# F1 (6 GT synthetic)
FDSERVER_PROMPTS_PATH=~/worktrees/FD-324/fdserver/prompts/private_prompts.py \
  GOOGLE_GEMINI_API_KEY=<key> \
  uv run python -m btcopilot.training.run_extract_full_f1

🤖 Generated with Claude Code

patrickkidd and others added 2 commits May 20, 2026 08:50
…cript

Deterministic post-extraction repair sets Person.parents from birth events
when the LLM fails to forward-reference the PairBond ID. ParentChild F1
0.366→0.815 N=3 avg (+123%), LCC 51%→89.5% (target ≥80% met), no regression.
Ships connectivity_check.py script and 5 unit tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- connectivity_check.py: add --accumulate and --dump-disconnected modes for
  reproducible real-chat LCC measurement (accumulates discussions in order,
  commits PDP to DiagramData between discussions)
- PROMPT_ENGINEERING_LOG.md: document real-chat LCC baseline/fixed results,
  failure-mode classification for both diagrams, F1 no-regression (2 runs)

Real-chat results with fix:
  Patrick 1924: 23.1% → 30.0% (content-bounded, not fixable)
  Guillermo 1589: 84.6% → 88.5% (AC2 met)
  Synthetic avg: 79.1% → 86.2% (AC2 met)
F1: Agg 0.654/0.633 (2 runs, within noise vs 0.651 baseline)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a deterministic repair function, infer_parents_from_birth_events, to resolve a forward-referencing issue where Person.parents fields were left null because pair bond IDs were not yet computed. This change significantly improves the ParentChild F1 score and family-tree connectivity (LCC %). Additionally, a new connectivity_check.py script is added to measure LCC percentages across various scenarios, along with comprehensive tests and updated documentation. Feedback on the changes suggests optimizing dyad normalization in the repair function to avoid sorting overhead and improve type safety, as well as correcting a comment in the connectivity check script that incorrectly describes a DFS implementation as BFS.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread btcopilot/pdp.py
Comment on lines +187 to +200
bond_by_dyad: dict[tuple[int, int], int] = {}
for pb in deltas.pair_bonds:
if pb.id is not None and pb.person_a is not None and pb.person_b is not None:
bond_by_dyad[tuple(sorted([pb.person_a, pb.person_b]))] = pb.id # type: ignore[arg-type]

person_map = {p.id: p for p in deltas.people if p.id is not None}

for event in deltas.events:
if event.kind not in (EventKind.Birth, EventKind.Adopted):
continue
if event.child is None or event.person is None or event.spouse is None:
continue
dyad = tuple(sorted([event.person, event.spouse]))
pb_id = bond_by_dyad.get(dyad) # type: ignore[arg-type]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation uses sorted() and tuple() to normalize the person dyads, which introduces unnecessary list creation and sorting overhead. Additionally, it requires # type: ignore[arg-type] to bypass type-checking warnings.

We can optimize this by using a simple conditional expression (a, b) if a < b else (b, a) to construct the sorted tuple. This is faster, avoids memory allocation for lists, and is fully type-safe without needing any type ignores.

Suggested change
bond_by_dyad: dict[tuple[int, int], int] = {}
for pb in deltas.pair_bonds:
if pb.id is not None and pb.person_a is not None and pb.person_b is not None:
bond_by_dyad[tuple(sorted([pb.person_a, pb.person_b]))] = pb.id # type: ignore[arg-type]
person_map = {p.id: p for p in deltas.people if p.id is not None}
for event in deltas.events:
if event.kind not in (EventKind.Birth, EventKind.Adopted):
continue
if event.child is None or event.person is None or event.spouse is None:
continue
dyad = tuple(sorted([event.person, event.spouse]))
pb_id = bond_by_dyad.get(dyad) # type: ignore[arg-type]
bond_by_dyad: dict[tuple[int, int], int] = {}
for pb in deltas.pair_bonds:
if pb.id is not None and pb.person_a is not None and pb.person_b is not None:
a, b = pb.person_a, pb.person_b
bond_by_dyad[(a, b) if a < b else (b, a)] = pb.id
person_map = {p.id: p for p in deltas.people if p.id is not None}
for event in deltas.events:
if event.kind not in (EventKind.Birth, EventKind.Adopted):
continue
if event.child is None or event.person is None or event.spouse is None:
continue
p_a, p_b = event.person, event.spouse
dyad = (p_a, p_b) if p_a < p_b else (p_b, p_a)
pb_id = bond_by_dyad.get(dyad)

Comment on lines +108 to +124
# BFS connected components
visited: set[int] = set()
component_sizes: list[int] = []
for start in nodes:
if start in visited:
continue
queue = [start]
visited.add(start)
size = 0
while queue:
cur = queue.pop()
size += 1
for nb in adj[cur]:
if nb not in visited:
visited.add(nb)
queue.append(nb)
component_sizes.append(size)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment states # BFS connected components, but the implementation uses queue.pop() (without an index), which removes and returns the last element of the list. This implements a Last-In-First-Out (LIFO) stack, resulting in a Depth-First Search (DFS) rather than a Breadth-First Search (BFS).

While both DFS and BFS are perfectly correct for finding connected components and their sizes, we should update the comment to accurately reflect the DFS implementation, or use collections.deque with popleft() if BFS is specifically desired.

Suggested change
# BFS connected components
visited: set[int] = set()
component_sizes: list[int] = []
for start in nodes:
if start in visited:
continue
queue = [start]
visited.add(start)
size = 0
while queue:
cur = queue.pop()
size += 1
for nb in adj[cur]:
if nb not in visited:
visited.add(nb)
queue.append(nb)
component_sizes.append(size)
# DFS connected components
visited: set[int] = set()
component_sizes: list[int] = []
for start in nodes:
if start in visited:
continue
queue = [start]
visited.add(start)
size = 0
while queue:
cur = queue.pop()
size += 1
for nb in adj[cur]:
if nb not in visited:
visited.add(nb)
queue.append(nb)
component_sizes.append(size)

patrickkidd and others added 3 commits June 2, 2026 08:44
lcc_percent keeps the User as a connecting node (the proband hub) and excludes
only User+Assistant from the count, per the agreed AC#2 definition; the Assistant
is dropped from the graph entirely. Deleting the User node fragmented correctly-
extracted families (everyone connects through the proband). Adds 4 tests; fixes a
DFS-mislabeled-as-BFS comment (Gemini review).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Disproved by transcript evidence: disc 60 explicitly states the orphans' parents
(Jim+Sheila → Anthony/Joseph/Julia) and the user demanded the link, yet extraction
sets ~0-3 parents. The gap is architectural (single-shot under-extraction of a 200+
statement conversation + no cross-session parent back-fill), addressable via the
FD-319 cursor/windowing re-extraction — not a content ceiling. Four prompt-directive
variants left Patrick within noise. Original wording struck through, not deleted.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
# Conflicts:
#	btcopilot/tests/personal/test_pdp.py
#	btcopilot/training/connectivity_check.py
#	doc/PROMPT_ENGINEERING_LOG.md
@patrickkidd patrickkidd merged commit f910022 into master Jun 2, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant