Add ability to compose multi-value Machinery translations#4236
Add ability to compose multi-value Machinery translations#4236mathjazz wants to merge 9 commits into
Conversation
For Fluent and MF2-handled formats (Android, Gettext, WebExt, Xcode, Xliff), the Machinery panel only matched on the first input field (via `getPlainMessage()`), so attributes and selector variants were never surfaced. This mirrors Pretranslation's complex string composition in Machinery, adding directly-pasteable composed suggestions alongside the existing results. More details: 1. Parameterize Pretranslation with mt_provider/mt_service_name/ mt_supported, and move entity-walking into `Pretranslation.walk_entity()` so Machinery can reuse the composition pipeline. 2. Add `/machinery-composed/` endpoint that walks the entity, looks up each value in TM, and falls back to the requested MT service for any remaining value. Returns the composed string + the actual mix of services used (TM badge + MT badge for hybrid results). 3. Frontend fires composed requests in parallel with the existing fetches when the entity format can have multiple values. Composed results dedupe through the existing `addResults()` merge.
on the entity having more than one translatable input, reusing the editor's field-counting logic. 2. Surface a quality badge. When every value is a 100% TM match, the composed string is a perfect TM match, so return quality 100 and pass it through to the panel. 3. Render composed (multi-value) suggestions as labeled fields, the same representation as the original string panel.
across all input fields, reusing the field-building logic that is already used by the History panel. The plain message is recorded as `machinery.translation` so that source attribution still matches the saved translation on submit.
string isn't suggested back to itself. The composed path didn't, so a composed TM result could be reconstructed from the entity's own translation after it was translated. Add an opt-in `exclude_entity` flag to Pretranslation that excludes the entity's own TM entries from per-value lookups, and enable it from the Machinery composed view. A value that can only be served by the entity's own translation then has no TM match, so a TM-only composition relying on it is no longer produced. Pretranslation behavior is unchanged.
Re-applying a composed Machinery suggestion (or restoring history) after editing a field took two clicks: the first did nothing. Typing updates only CodeMirror's internal doc and EditorResult, not EditorData.state.fields, so TranslationForm doesn't re-render and each EditField keeps a stale `defaultValue`. EditField re-syncs its document only in `useEffect(() => setValue(defaultValue), [defaultValue])`. distributeEntrySource builds new fields with placeholder handles while the on-screen editors stay bound, via their React key, to the previous fields' live handles. The re-applied value for the edited field equals its stale `defaultValue`, so the effect doesn't fire and the field isn't updated. A later re-render refreshes `defaultValue`, which is why the second click works. Push the distributed values straight into the live handles, matched by field id, the same way clearEditor does, so one click suffices.
The "Refine using AI" dropdown doesn't work on composed (multi-field/ plural) suggestions: the loader never shows, the refined result never updates the UI, and copying dumps the raw Fluent source into the first field. The backend /gpt-transform/ endpoint also refines a single string, so it can't preserve the entry structure (e.g. returns 2 plural forms instead of 4). Hide the dropdown for composed suggestions so they behave like a plain Google Translate source. Proper composed support is left as a follow-up.
When a suggestion combines multiple sources (e.g. GOOGLE TRANSLATE and TRANSLATION MEMORY), the source titles ran together with no separator.
Codecov Report❌ Patch coverage is 🚀 New features to boost your workflow:
|
eemeli
left a comment
There was a problem hiding this comment.
Looked at the Python parts only thus far.
| COMPOSED_MT_SERVICES = { | ||
| "google-translate": ( | ||
| lambda text, locale, preserve_placeables: get_google_translate_data( | ||
| text=text, locale=locale, preserve_placeables=preserve_placeables | ||
| ), | ||
| "google_translate_code", | ||
| ), | ||
| "microsoft-translator": ( | ||
| lambda text, locale, preserve_placeables: get_microsoft_translator_data( | ||
| text, locale.ms_translator_code | ||
| ), | ||
| "ms_translator_code", | ||
| ), | ||
| } |
There was a problem hiding this comment.
I'd prefer not separately defining the lambdas like this, and to have the calls directly in the code below.
| COMPOSED_FORMATS = { | ||
| Resource.Format.FLUENT, | ||
| Resource.Format.ANDROID, | ||
| Resource.Format.GETTEXT, | ||
| Resource.Format.WEBEXT, | ||
| Resource.Format.XCODE, | ||
| Resource.Format.XLIFF, | ||
| } |
There was a problem hiding this comment.
The behaviour should not be gated on the format, but on whether the Entity.value and .properties represent a single-pattern or multi-pattern message.
| Return a composed multi-value translation for a Fluent / MF2 entity. | ||
|
|
||
| Each translatable leaf (Fluent value/attribute, MF2 variant) is looked up in | ||
| Translation Memory; leaves without a 100% TM match fall back to the requested | ||
| MT service. Mirrors the Pretranslation pipeline so the Machinery panel can | ||
| surface a directly-pasteable composed translation alongside the per-leaf | ||
| results. |
There was a problem hiding this comment.
The direct references to the formats here are misleading, as I presume "MF2" is encompassing all the formats (like Android and Gettext) that are internally represented using MF2 syntax?
| if service == "translation-memory": | ||
| mt_provider = None | ||
| mt_service_name = "tm" | ||
| mt_supported = False | ||
| elif service in COMPOSED_MT_SERVICES: |
There was a problem hiding this comment.
See comment above; this should be a match statement handling all the valid service values.
| if entity.resource.format == Resource.Format.FLUENT: | ||
| entry = fluent_parse_entry(entity.string, with_linepos=False) | ||
| if entry.value: | ||
| self.message(entry.value) | ||
| accesskeys: list[tuple[str, Message]] = [] | ||
| for key, prop in entry.properties.items(): | ||
| if key.endswith("accesskey"): | ||
| accesskeys.append((key, prop)) | ||
| else: | ||
| self.message(prop) | ||
| for key, prop in accesskeys: | ||
| set_accesskey(entry, key, prop) | ||
| return FluentSerializer().serialize_entry( | ||
| fluent_astify_entry(entry, escape_syntax=False) | ||
| ) | ||
|
|
||
| if entity.resource.format in { | ||
| Resource.Format.ANDROID, | ||
| Resource.Format.GETTEXT, | ||
| Resource.Format.WEBEXT, | ||
| Resource.Format.XCODE, | ||
| Resource.Format.XLIFF, | ||
| }: | ||
| format = Format.mf2 | ||
| msg = parse_message(format, entity.string) | ||
| else: | ||
| format = None | ||
| msg = PatternMessage([entity.string]) | ||
| self.message(msg) | ||
| return serialize_message(format, msg) |
There was a problem hiding this comment.
Include
from moz.l10n.message import message_from_jsonabove, and then do something like this, with the serialization done later:
| if entity.resource.format == Resource.Format.FLUENT: | |
| entry = fluent_parse_entry(entity.string, with_linepos=False) | |
| if entry.value: | |
| self.message(entry.value) | |
| accesskeys: list[tuple[str, Message]] = [] | |
| for key, prop in entry.properties.items(): | |
| if key.endswith("accesskey"): | |
| accesskeys.append((key, prop)) | |
| else: | |
| self.message(prop) | |
| for key, prop in accesskeys: | |
| set_accesskey(entry, key, prop) | |
| return FluentSerializer().serialize_entry( | |
| fluent_astify_entry(entry, escape_syntax=False) | |
| ) | |
| if entity.resource.format in { | |
| Resource.Format.ANDROID, | |
| Resource.Format.GETTEXT, | |
| Resource.Format.WEBEXT, | |
| Resource.Format.XCODE, | |
| Resource.Format.XLIFF, | |
| }: | |
| format = Format.mf2 | |
| msg = parse_message(format, entity.string) | |
| else: | |
| format = None | |
| msg = PatternMessage([entity.string]) | |
| self.message(msg) | |
| return serialize_message(format, msg) | |
| value = message_from_json(entity.value) | |
| if value: | |
| self.message(entry.value) | |
| properties = { | |
| key: message_from_json(prop) | |
| for key, prop in entity.properties.items() | |
| } if entity.properties else {} | |
| accesskeys: list[tuple[str, Message]] = [] | |
| for key, prop in properties.items(): | |
| if key.endswith("accesskey"): | |
| accesskeys.append((key, prop)) | |
| else: | |
| self.message(prop) | |
| for key, prop in accesskeys: | |
| set_accesskey(entry, key, prop) | |
| return value, properties |
| self.mt_provider = mt_provider or get_google_translate_data | ||
| self.mt_service_name = mt_service_name |
There was a problem hiding this comment.
Are "mt_provider" and "mt_service" effectively synonymous, or somehow different? In any case, it seems weird to fall back here for one, but not the other.
|
Thanks for the comments. Please note that I've intentionally not flagged anyone for code review yet, because I'd like to first get feedback on the functionality. The code is still deployed to https://pontoon.allizom.org/. |
Fix #2886.