Skip to content

Stop auto-merging same-titled records (tasks, events, etc.)#62

Merged
marcelsamyn merged 3 commits into
mainfrom
claude/gracious-knuth-s9xfpa
Jun 10, 2026
Merged

Stop auto-merging same-titled records (tasks, events, etc.)#62
marcelsamyn merged 3 commits into
mainfrom
claude/gracious-knuth-s9xfpa

Conversation

@marcelsamyn

Copy link
Copy Markdown
Owner

Problem

Asking the assistant to create a task for each day of the week produced one node with seven DUE_ON links instead of seven separate tasks — and completing one day marked them all done.

There's no unique constraint on titles, and createCommitment correctly creates a distinct Task node per call. The culprit is automatic merging by canonical label, via two paths:

  1. dedupSweep (dedup-sweep.ts) — a background job (runs after ingestion/cleanup) that mechanically merges any nodes sharing (nodeType, canonicalLabel, scope). It fused the seven same-titled tasks into one node, rewiring all seven DUE_ON claims onto the survivor. A single HAS_TASK_STATUS then governs the whole node, so completing one "day" completes everything.
  2. LLM graph cleanup (cleanup-graph.ts) — its prompt instructs the model to merge_nodes whenever two nodes have "similar labels … compatible types", which could merge same-titled tasks too.

Both treated record/occurrence nodes the same as nominal entities.

Fix

A single source of truth in types/graph.ts:

  • LABEL_MERGEABLE_NODE_TYPES — the node types whose canonical label denotes one real-world referent and are therefore safe to collapse by label: Person, Location, Object, Emotion, Concept, Media, Temporal.
  • isLabelMergeableNodeType(nodeType) — predicate consulted by both auto-merge paths.

Record/occurrence types — Task, Event, Idea, Document, Conversation, AssistantDream, Feedback, Atlas — can legitimately recur with identical names (a task per day, a weekly standup Event, two files named notes.md) and are now never merged by label.

Applied to both paths:

  • dedup-sweep — only nominal-entity types enter the grouping query.
  • cleanup-operationsmergeNodesOp hard-refuses any merge touching a protected type (with a logged skip), a guarantee even if the LLM ignores the prompt; the cleanup prompt is also updated to forbid merging record/occurrence nodes.

Explicit, user-initiated merges via POST /node/merge call mergeNodes directly and are intentionally not gated.

Scope / caveats

  • This prevents future over-merging; it does not un-merge nodes already collapsed (that history is gone). A one-off repair could be written separately if needed.

Tests

  • New src/types/graph.test.ts — unit tests for the classifier, including a guard that every NodeType enum member is deliberately bucketed as mergeable or protected.
  • New dedup-sweep test — same-titled Task/Event nodes are left untouched.
  • build:check (tsc + structured-output schema check) and lint pass locally. The DB-backed dedup-sweep test bodies are gated on a reachable Postgres and execute in CI.

https://claude.ai/code/session_017rsfWZfZhDsfcMnSdPSBwR


Generated by Claude Code

claude added 2 commits June 10, 2026 09:48
The deterministic dedup-sweep merged any nodes sharing
(nodeType, canonicalLabel, scope), and the LLM graph cleanup was
prompted to merge nodes with "similar labels, compatible types".
Both treated record/occurrence nodes the same as nominal entities, so
N tasks created with the same title (e.g. one per day for a week)
were fused into a single node carrying every DUE_ON link — completing
one day's task then completed them all.

Introduce a single source of truth, LABEL_MERGEABLE_NODE_TYPES /
isLabelMergeableNodeType, classifying which node types denote one
real-world referent (Person, Location, Object, Emotion, Concept,
Media, Temporal) and are therefore safe to collapse by label. Apply it
to both automatic paths:

- dedup-sweep: only nominal-entity types enter the grouping query.
- cleanup-operations: mergeNodesOp refuses any merge involving a
  protected type, with a logged skip; the cleanup prompt is also told
  never to merge record/occurrence nodes.

Explicit user merges via POST /node/merge call mergeNodes directly and
are intentionally not gated.

Adds unit tests for the classifier and a dedup-sweep test asserting
same-titled Task/Event nodes are left untouched.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request restricts automatic node merges (both deterministic dedup sweeps and LLM-driven graph cleanups) to only nominal-entity node types, protecting record and occurrence types (like Tasks, Events, and Documents) from being incorrectly collapsed. The reviewer identified a critical issue in cleanup-operations.ts where mergeNodesOp queries the database using useDatabase() directly instead of using the active database/transaction context, which could lead to transaction isolation issues when processing newly created nodes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +522 to +527
const db = await useDatabase();
const involvedIds = [keepId, ...removeIds];
const typeRows = await db
.select({ id: nodes.id, nodeType: nodes.nodeType })
.from(nodes)
.where(and(eq(nodes.userId, userId), inArray(nodes.id, involvedIds)));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The mergeNodesOp function currently queries the database using useDatabase() directly instead of using the active database/transaction context (database or tx) passed to applyCleanupOperations.

If applyCleanupOperations is executed within a transaction (e.g., when databaseOverride is provided), any nodes created in the same batch (via create_node) will not be visible to useDatabase() due to transaction isolation. This can cause the type check to fail to find the newly created node, potentially bypassing the protection or causing unexpected behavior.

To fix this, we should pass the database context to mergeNodesOp and use it for the query.

Suggested Refactoring

  1. Update the signature of mergeNodesOp to accept database:
export async function mergeNodesOp(
  database: DbOrTx,
  userId: string,
  op: MergeNodesOp,
  resolveTempId: TempIdResolver,
): Promise<{ survivorId: TypeId<"node">; mergedIds: TypeId<"node">[] } | null> {
  1. Update the call site in runOne:
    case "merge_nodes": {
      const merged = await mergeNodesOp(database, userId, op, resolveTempId);
  1. Update the query inside mergeNodesOp to use the passed database instead of useDatabase():
  const involvedIds = [keepId, ...removeIds];
  const typeRows = await database
    .select({ id: nodes.id, nodeType: nodes.nodeType })
    .from(nodes)
    .where(and(eq(nodes.userId, userId), inArray(nodes.id, involvedIds)));

Copy link
Copy Markdown
Owner Author

Re: the mergeNodesOp DB-context comment — I looked into this and kept useDatabase() deliberately, because the suggested refactor would actually be less consistent here:

  • merge_nodes runs outside the dispatcher transaction by design (see the applyCleanupOperations docstring: "merge_nodes runs outside the dispatcher's transaction because mergeNodes manages its own"). mergeNodes in node.ts doesn't accept a context — it reads and writes via the same root useDatabase() singleton.
  • So the type-gate must query the same context the merge uses (root), not the dispatcher's database/tx. If I gated against an override transaction while mergeNodes operated on root, the check and the merge would diverge.
  • In the exact scenario raised (a create_node in an uncommitted tx, then merged): mergeNodes reads via root, wouldn't see the uncommitted node, and returns null — the merge no-ops. The gate, reading the same root context, also doesn't see it, so it can't be tricked into bypassing protection. The two stay consistent.
  • useDatabase() is a singleton over a single pg.Client, and the production caller passes await useDatabase() as the override, so in practice database === useDatabase() regardless.

I've added a comment on that block documenting the rationale (05d22f0) so it's clear the context choice is intentional.


Generated by Claude Code

@marcelsamyn marcelsamyn merged commit 9986745 into main Jun 10, 2026
1 check passed
@marcelsamyn marcelsamyn deleted the claude/gracious-knuth-s9xfpa branch June 10, 2026 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants