Skip to content

i18n: keep messages.pot location comments on one line to prevent merge conflicts#12903

Open
lokesh wants to merge 1 commit into
internetarchive:masterfrom
lokesh:12837/refactor/pot-disable-line-wrapping
Open

i18n: keep messages.pot location comments on one line to prevent merge conflicts#12903
lokesh wants to merge 1 commit into
internetarchive:masterfrom
lokesh:12837/refactor/pot-disable-line-wrapping

Conversation

@lokesh

@lokesh lokesh commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Closes #12837

Refactor. Stops messages.pot from generating spurious merge conflicts between unrelated PRs.

The problem, in plain terms

openlibrary/i18n/messages.pot is a generated file we commit. Every translatable string carries a #: "location comment" listing the files that use it:

#: search/authors.html search/lists.html search/subjects.html work_search.html
msgid "Checking Search Inside matches"

Babel wraps those lines at 76 characters. So when a PR adds one file to a string, the line tips over the limit and Babel re-flows the whole block — rewriting filenames that didn't actually change:

-#: search/authors.html search/lists.html search/subjects.html work_search.html
+#: search/authors.html search/editions.html search/lists.html
+#: search/subjects.html work_search.html

subjects.html and work_search.html got pushed onto a new line even though nothing about them changed. Now if a second in-flight PR also touches one of those templates, both branches have rewritten the same physical lines → git reports a conflict, even though the two PRs added completely unrelated files. The wrapping is what turns a non-overlapping change into an overlapping one.

Why the originally-proposed fix doesn't work

The issue suggested write_po(..., width=0). I tried it against Babel 2.18 and it does the opposite of what we want. Babel deliberately copies xgettext's behaviour — it always wraps comments, even when wrapping is otherwise off. From Babel's source:

# xgettext always wraps comments even if --no-wrap is passed;
# provide the same behaviour
comment_width = width if width and width > 0 else 76

So width=0 leaves the #: location comments wrapped at 76 (the actual problem — untouched) and only unwraps long msgid/msgstr text into single long lines (the cosmetic downside — applied). It ships the cost without the benefit. There is no Babel option that single-lines the locations while keeping message text wrapped.

What this PR does

After Babel writes the file, a small post-processing pass (_unwrap_location_comments) joins each message's wrapped #: block back onto a single line. Adding a file now just extends that one line:

-#: search/authors.html search/lists.html search/subjects.html work_search.html
+#: search/authors.html search/editions.html search/lists.html search/subjects.html work_search.html

Unrelated string changes no longer share any lines, so they stop conflicting. Message text (msgid/msgstr) is left wrapped at Babel's default width, so it stays readable in diffs.

For developers

  • One fewer source of "why is messages.pot conflicting again?" when rebasing or merging concurrent PRs that touch templates.
  • The location breadcrumbs (which templates use this string) are kept — we didn't drop them to solve this.
  • Heads-up: this PR's first regeneration reflows essentially the entire file (every wrapped location line collapses to one), so the diff is large but boring — it's a one-time formatting pass, not a content change. After this lands, the file stays stable. Best merged when no other .pot-touching PRs are mid-flight.

For translators

  • Nothing changes in what gets translated. This proves it — the regenerated .pot has the exact same set of msgids and the exact same locations; only how the location lines are wrapped differs. No string was added, removed, or altered.
  • The #: "where is this string used" hints translators rely on for context are preserved.

Technical

  • openlibrary/i18n/__init__.py — write write_po's output to a buffer, run it through _unwrap_location_comments, then write to disk. The helper walks the file line by line and joins any run of consecutive #: lines (Babel emits a message's location comment as a contiguous block) into one. It's a pure string transform with no dependency on Babel internals beyond the stable #: gettext convention.
  • Why post-process rather than a Babel flag: there is no flag that does this. width=0 keeps comments wrapped (shown above); no_location=True would fix conflicts but drops the location breadcrumbs, which the issue explicitly wanted to keep.

Testing

  • New unit tests in openlibrary/tests/core/test_i18n.py (Test_unwrap_location_comments): collapsing a wrapped block, leaving single-line comments untouched, not touching message text or #. translator notes, and the "add one file → clean one-line diff" scenario from feat(search): add /search/editions.json API + /search/editions UI (Phase 1 & 2 of #7451) #12663.
  • Verified the regenerated messages.pot round-trips through read_po to an identical set of msgids and locations vs. the old file (semantic no-op).
  • Verified via git diff that the .pot change touches only #: lines — zero message-content lines changed.
  • pre-commit passes on the changed files, including mypy and the Generate POT hook (confirming the committed .pot matches what the hook regenerates).
  • pytest openlibrary/tests/core/test_i18n.py → 6 passed.

Possible follow-ups (out of scope)

  • One #: line per file would make even two PRs editing the same string's location list merge cleanly. It's the most conflict-proof but unconventional and makes the file taller; this PR keeps the conventional space-joined single line.
  • A .gitattributes merge driver that regenerates messages.pot on conflict, or having CI own the file outright (both noted in the issue).

Stakeholders

@cdrini (i18n lead)

Babel wraps the #: location comments at 76 chars, so adding or removing one
file from a string's location list re-flows the whole block and rewrites
filenames that didn't change. That turns unrelated PRs into overlapping diffs
and causes spurious messages.pot merge conflicts (internetarchive#12837).

The issue proposed write_po(..., width=0), but Babel mirrors xgettext and
always wraps comments regardless of width, so width=0 only unwraps msgid text
(the downside) without unwrapping the location comments (the actual fix).

Instead, post-process write_po's output to collapse each #: block onto a
single line. Adding a file now just extends that line; message text stays
wrapped and readable. This first regeneration reflows the whole file as a
one-time change.

Closes internetarchive#12837
@lokesh lokesh marked this pull request as ready for review June 11, 2026 23:24
@lokesh lokesh marked this pull request as draft June 11, 2026 23:25
@lokesh lokesh marked this pull request as ready for review June 15, 2026 18:01
@mekarpeles mekarpeles self-assigned this Jun 15, 2026
@mekarpeles

Copy link
Copy Markdown
Member

Important: Are there other types of comments (like fuzzy) that might break?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

i18n: prevent messages.pot merge conflicts by disabling location-comment line wrapping

2 participants