Skip to content

Add ability to compose multi-value Machinery translations#4236

Open
mathjazz wants to merge 9 commits into
mozilla:mainfrom
mathjazz:machinery-compose-multi-value-2886
Open

Add ability to compose multi-value Machinery translations#4236
mathjazz wants to merge 9 commits into
mozilla:mainfrom
mathjazz:machinery-compose-multi-value-2886

Conversation

@mathjazz

Copy link
Copy Markdown
Collaborator

Fix #2886.

mathjazz added 9 commits June 17, 2026 17:29
For Fluent and MF2-handled formats (Android, Gettext, WebExt, Xcode,
Xliff), the Machinery panel only matched on the first input field
(via `getPlainMessage()`), so attributes and selector variants were never
surfaced.

This mirrors Pretranslation's complex string composition in Machinery,
adding directly-pasteable composed suggestions alongside the
existing results.

More details:

1. Parameterize Pretranslation with mt_provider/mt_service_name/
mt_supported, and move entity-walking into `Pretranslation.walk_entity()`
so Machinery can reuse the composition pipeline.

2. Add `/machinery-composed/` endpoint that walks the entity, looks up
each value in TM, and falls back to the requested MT service for any
remaining value. Returns the composed string + the actual mix of
services used (TM badge + MT badge for hybrid results).

3. Frontend fires composed requests in parallel with the existing
fetches when the entity format can have multiple values. Composed
results dedupe through the existing `addResults()` merge.
on the entity having more than one translatable input, reusing the editor's
field-counting logic.

2. Surface a quality badge. When every value is a 100% TM match, the composed
string is a perfect TM match, so return quality 100 and pass it through to
the panel.

3. Render composed (multi-value) suggestions as labeled fields, the same
representation as the original string panel.
across all input fields, reusing the field-building logic that
is already used by the History panel.

The plain message is recorded as `machinery.translation` so that source
attribution still matches the saved translation on submit.
string isn't suggested back to itself. The composed path didn't, so a
composed TM result could be reconstructed from the entity's own
translation after it was translated.

Add an opt-in `exclude_entity` flag to Pretranslation that excludes the
entity's own TM entries from per-value lookups, and enable it from the
Machinery composed view. A value that can only be served by the entity's
own translation then has no TM match, so a TM-only composition relying
on it is no longer produced. Pretranslation behavior is unchanged.
Re-applying a composed Machinery suggestion (or restoring history)
after editing a field took two clicks: the first did nothing.

Typing updates only CodeMirror's internal doc and EditorResult, not
EditorData.state.fields, so TranslationForm doesn't re-render and each
EditField keeps a stale `defaultValue`. EditField re-syncs its document
only in `useEffect(() => setValue(defaultValue), [defaultValue])`.
distributeEntrySource builds new fields with placeholder handles while
the on-screen editors stay bound, via their React key, to the previous
fields' live handles. The re-applied value for the edited field equals
its stale `defaultValue`, so the effect doesn't fire and the field
isn't updated. A later re-render refreshes `defaultValue`, which is why
the second click works.

Push the distributed values straight into the live handles, matched by
field id, the same way clearEditor does, so one click suffices.
The "Refine using AI" dropdown doesn't work on composed (multi-field/
plural) suggestions: the loader never shows, the refined result never
updates the UI, and copying dumps the raw Fluent source into the first
field. The backend /gpt-transform/ endpoint also refines a single
string, so it can't preserve the entry structure (e.g. returns 2 plural
forms instead of 4).

Hide the dropdown for composed suggestions so they behave like a plain
Google Translate source. Proper composed support is left as a follow-up.
When a suggestion combines multiple sources (e.g. GOOGLE TRANSLATE and
TRANSLATION MEMORY), the source titles ran together with no separator.
@codecov-commenter

codecov-commenter commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 84.48276% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.90%. Comparing base (cb5f5d8) to head (e11f61f).
⚠️ Report is 11 commits behind head on main.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@eemeli eemeli left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked at the Python parts only thus far.

Comment on lines +35 to +48
COMPOSED_MT_SERVICES = {
"google-translate": (
lambda text, locale, preserve_placeables: get_google_translate_data(
text=text, locale=locale, preserve_placeables=preserve_placeables
),
"google_translate_code",
),
"microsoft-translator": (
lambda text, locale, preserve_placeables: get_microsoft_translator_data(
text, locale.ms_translator_code
),
"ms_translator_code",
),
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not separately defining the lambdas like this, and to have the calls directly in the code below.

Comment on lines +54 to +61
COMPOSED_FORMATS = {
Resource.Format.FLUENT,
Resource.Format.ANDROID,
Resource.Format.GETTEXT,
Resource.Format.WEBEXT,
Resource.Format.XCODE,
Resource.Format.XLIFF,
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behaviour should not be gated on the format, but on whether the Entity.value and .properties represent a single-pattern or multi-pattern message.

Comment on lines +101 to +107
Return a composed multi-value translation for a Fluent / MF2 entity.

Each translatable leaf (Fluent value/attribute, MF2 variant) is looked up in
Translation Memory; leaves without a 100% TM match fall back to the requested
MT service. Mirrors the Pretranslation pipeline so the Machinery panel can
surface a directly-pasteable composed translation alongside the per-leaf
results.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direct references to the formats here are misleading, as I presume "MF2" is encompassing all the formats (like Android and Gettext) that are internally represented using MF2 syntax?

Comment on lines +134 to +138
if service == "translation-memory":
mt_provider = None
mt_service_name = "tm"
mt_supported = False
elif service in COMPOSED_MT_SERVICES:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above; this should be a match statement handling all the valid service values.

Comment on lines +127 to +156
if entity.resource.format == Resource.Format.FLUENT:
entry = fluent_parse_entry(entity.string, with_linepos=False)
if entry.value:
self.message(entry.value)
accesskeys: list[tuple[str, Message]] = []
for key, prop in entry.properties.items():
if key.endswith("accesskey"):
accesskeys.append((key, prop))
else:
self.message(prop)
for key, prop in accesskeys:
set_accesskey(entry, key, prop)
return FluentSerializer().serialize_entry(
fluent_astify_entry(entry, escape_syntax=False)
)

if entity.resource.format in {
Resource.Format.ANDROID,
Resource.Format.GETTEXT,
Resource.Format.WEBEXT,
Resource.Format.XCODE,
Resource.Format.XLIFF,
}:
format = Format.mf2
msg = parse_message(format, entity.string)
else:
format = None
msg = PatternMessage([entity.string])
self.message(msg)
return serialize_message(format, msg)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include

from moz.l10n.message import message_from_json

above, and then do something like this, with the serialization done later:

Suggested change
if entity.resource.format == Resource.Format.FLUENT:
entry = fluent_parse_entry(entity.string, with_linepos=False)
if entry.value:
self.message(entry.value)
accesskeys: list[tuple[str, Message]] = []
for key, prop in entry.properties.items():
if key.endswith("accesskey"):
accesskeys.append((key, prop))
else:
self.message(prop)
for key, prop in accesskeys:
set_accesskey(entry, key, prop)
return FluentSerializer().serialize_entry(
fluent_astify_entry(entry, escape_syntax=False)
)
if entity.resource.format in {
Resource.Format.ANDROID,
Resource.Format.GETTEXT,
Resource.Format.WEBEXT,
Resource.Format.XCODE,
Resource.Format.XLIFF,
}:
format = Format.mf2
msg = parse_message(format, entity.string)
else:
format = None
msg = PatternMessage([entity.string])
self.message(msg)
return serialize_message(format, msg)
value = message_from_json(entity.value)
if value:
self.message(entry.value)
properties = {
key: message_from_json(prop)
for key, prop in entity.properties.items()
} if entity.properties else {}
accesskeys: list[tuple[str, Message]] = []
for key, prop in properties.items():
if key.endswith("accesskey"):
accesskeys.append((key, prop))
else:
self.message(prop)
for key, prop in accesskeys:
set_accesskey(entry, key, prop)
return value, properties

Comment on lines +105 to +106
self.mt_provider = mt_provider or get_google_translate_data
self.mt_service_name = mt_service_name

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are "mt_provider" and "mt_service" effectively synonymous, or somehow different? In any case, it seems weird to fall back here for one, but not the other.

@mathjazz

mathjazz commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks for the comments.

Please note that I've intentionally not flagged anyone for code review yet, because I'd like to first get feedback on the functionality. The code is still deployed to https://pontoon.allizom.org/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unify TM/MT used by Pretranslation and Machinery

3 participants