Stop auto-merging same-titled records (tasks, events, etc.)#62
Conversation
The deterministic dedup-sweep merged any nodes sharing (nodeType, canonicalLabel, scope), and the LLM graph cleanup was prompted to merge nodes with "similar labels, compatible types". Both treated record/occurrence nodes the same as nominal entities, so N tasks created with the same title (e.g. one per day for a week) were fused into a single node carrying every DUE_ON link — completing one day's task then completed them all. Introduce a single source of truth, LABEL_MERGEABLE_NODE_TYPES / isLabelMergeableNodeType, classifying which node types denote one real-world referent (Person, Location, Object, Emotion, Concept, Media, Temporal) and are therefore safe to collapse by label. Apply it to both automatic paths: - dedup-sweep: only nominal-entity types enter the grouping query. - cleanup-operations: mergeNodesOp refuses any merge involving a protected type, with a logged skip; the cleanup prompt is also told never to merge record/occurrence nodes. Explicit user merges via POST /node/merge call mergeNodes directly and are intentionally not gated. Adds unit tests for the classifier and a dedup-sweep test asserting same-titled Task/Event nodes are left untouched.
There was a problem hiding this comment.
Code Review
This pull request restricts automatic node merges (both deterministic dedup sweeps and LLM-driven graph cleanups) to only nominal-entity node types, protecting record and occurrence types (like Tasks, Events, and Documents) from being incorrectly collapsed. The reviewer identified a critical issue in cleanup-operations.ts where mergeNodesOp queries the database using useDatabase() directly instead of using the active database/transaction context, which could lead to transaction isolation issues when processing newly created nodes.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| const db = await useDatabase(); | ||
| const involvedIds = [keepId, ...removeIds]; | ||
| const typeRows = await db | ||
| .select({ id: nodes.id, nodeType: nodes.nodeType }) | ||
| .from(nodes) | ||
| .where(and(eq(nodes.userId, userId), inArray(nodes.id, involvedIds))); |
There was a problem hiding this comment.
The mergeNodesOp function currently queries the database using useDatabase() directly instead of using the active database/transaction context (database or tx) passed to applyCleanupOperations.
If applyCleanupOperations is executed within a transaction (e.g., when databaseOverride is provided), any nodes created in the same batch (via create_node) will not be visible to useDatabase() due to transaction isolation. This can cause the type check to fail to find the newly created node, potentially bypassing the protection or causing unexpected behavior.
To fix this, we should pass the database context to mergeNodesOp and use it for the query.
Suggested Refactoring
- Update the signature of
mergeNodesOpto acceptdatabase:
export async function mergeNodesOp(
database: DbOrTx,
userId: string,
op: MergeNodesOp,
resolveTempId: TempIdResolver,
): Promise<{ survivorId: TypeId<"node">; mergedIds: TypeId<"node">[] } | null> {- Update the call site in
runOne:
case "merge_nodes": {
const merged = await mergeNodesOp(database, userId, op, resolveTempId);- Update the query inside
mergeNodesOpto use the passeddatabaseinstead ofuseDatabase():
const involvedIds = [keepId, ...removeIds];
const typeRows = await database
.select({ id: nodes.id, nodeType: nodes.nodeType })
.from(nodes)
.where(and(eq(nodes.userId, userId), inArray(nodes.id, involvedIds)));|
Re: the
I've added a comment on that block documenting the rationale (05d22f0) so it's clear the context choice is intentional. Generated by Claude Code |
Problem
Asking the assistant to create a task for each day of the week produced one node with seven
DUE_ONlinks instead of seven separate tasks — and completing one day marked them all done.There's no unique constraint on titles, and
createCommitmentcorrectly creates a distinctTasknode per call. The culprit is automatic merging by canonical label, via two paths:dedupSweep(dedup-sweep.ts) — a background job (runs after ingestion/cleanup) that mechanically merges any nodes sharing(nodeType, canonicalLabel, scope). It fused the seven same-titled tasks into one node, rewiring all sevenDUE_ONclaims onto the survivor. A singleHAS_TASK_STATUSthen governs the whole node, so completing one "day" completes everything.cleanup-graph.ts) — its prompt instructs the model tomerge_nodeswhenever two nodes have "similar labels … compatible types", which could merge same-titled tasks too.Both treated record/occurrence nodes the same as nominal entities.
Fix
A single source of truth in
types/graph.ts:LABEL_MERGEABLE_NODE_TYPES— the node types whose canonical label denotes one real-world referent and are therefore safe to collapse by label: Person, Location, Object, Emotion, Concept, Media, Temporal.isLabelMergeableNodeType(nodeType)— predicate consulted by both auto-merge paths.Record/occurrence types — Task, Event, Idea, Document, Conversation, AssistantDream, Feedback, Atlas — can legitimately recur with identical names (a task per day, a weekly standup
Event, two files namednotes.md) and are now never merged by label.Applied to both paths:
mergeNodesOphard-refuses any merge touching a protected type (with a logged skip), a guarantee even if the LLM ignores the prompt; the cleanup prompt is also updated to forbid merging record/occurrence nodes.Explicit, user-initiated merges via
POST /node/mergecallmergeNodesdirectly and are intentionally not gated.Scope / caveats
Tests
src/types/graph.test.ts— unit tests for the classifier, including a guard that everyNodeTypeenum member is deliberately bucketed as mergeable or protected.dedup-sweeptest — same-titledTask/Eventnodes are left untouched.build:check(tsc + structured-output schema check) andlintpass locally. The DB-backed dedup-sweep test bodies are gated on a reachable Postgres and execute in CI.https://claude.ai/code/session_017rsfWZfZhDsfcMnSdPSBwR
Generated by Claude Code