fix(disambiguate): mapeo por contenido en lugar de índice de array (#78)#86
Conversation
Reviewer's GuideRefactors the disambiguation result mapping to match response items to paragraphs by text content using a consumption queue, and preserves predictions for paragraphs that the backend omits, instead of relying on brittle array index-based mapping that could misalign labels or lose annotations. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Pull request overview
This PR fixes incorrect paragraph-to-disambiguation-result alignment in the /anonymizer/disambiguate flow by matching backend response items back to the original paragraphs by paragraph content (text) instead of array index, and it avoids losing existing annotations when the backend drops paragraphs from the response.
Changes:
- Replace index-based mapping (
paragraphs[i]) with a per-text “consumption queue” to match response items to the first unconsumed paragraph with the same text. - Ensure returned
PredictLabel.paragraphIdis always the stable paragraphid(not the raw paragraph text from the backend). - Preserve original predictions for any paragraph omitted by the backend response.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -207,10 +225,18 @@ export const disambiguate = (file: DocFile) => | |||
| aymurai_label_instance: l.attrs.aymurai_label_instance ?? null, | |||
| aymurai_disambiguation: l.attrs.aymurai_disambiguation ?? null, | |||
| }, | |||
| paragraphId: paragraph?.id ?? item.document, | |||
| paragraphId: paragraph.id, | |||
| } satisfies PredictLabel; | |||
| }); | |||
| }); | |||
|
|
|||
| // Preserve raw predictions for any paragraph the backend did not return, | |||
| // so a backend count drop never silently erases annotations. | |||
| const unmatched = predictions.filter( | |||
| (p) => !matchedParagraphIds.has(p.paragraphId), | |||
| ); | |||
|
|
|||
| return [...disambiguated, ...unmatched]; | |||
El endpoint
/anonymizer/disambiguatedevuelve un array de párrafos procesados. El código anterior asumía que el backend respondía en el mismo orden y con la misma cantidad de elementos que el request, mapeando por índice. Si el backend reordenaba o descartaba algún párrafo, las etiquetas quedaban asignadas al párrafo incorrecto, o elparagraphIdse seteaba con el texto plano del párrafo (que tras el fix del PR #85 ya no coincide con ningún ID en el store de Redux).Cambios:
flatMap((item, i) => paragraphs[i])por una consumption queue keyed por texto: cada item del response se resuelve buscando el primer párrafo no consumido con ese texto, tolerando reordenamiento y párrafos duplicados.Resuelve parcialmente #78.
Summary by Sourcery
Align disambiguation results with their original paragraphs based on paragraph content instead of response index, and preserve predictions for paragraphs omitted by the backend.
Bug Fixes: