Feat/lmstudio auto load by mvoof · Pull Request #1 · sc-localization/verse-translator

mvoof · 2026-06-23T12:29:26Z

No description provided.

Shows input/output paths, cache location, and language pair before translation starts. Replaces manual logger progress with a live tqdm bar showing entries/s, ETA, and current batch.

…tch-size auto

…xt-too-long auto-split

…consistent translation

…n, etc)

Before the first batch, LMStudioBackend.ensure_model_loaded() checks GET /api/v1/models and calls POST /api/v1/models/load if the model is absent, then polls until it's ready.

Before the first batch, ensure_model_loaded() checks GET /api/v1/models and calls POST /api/v1/models/load if the model is absent, then polls until it is ready. Fixes load request field name: 'model' not 'identifier'.

- Use /api/v1/chat instead of /v1/chat/completions; pass context_length and max_output_tokens explicitly per request - Auto-detect context_length from loaded_instances after model load - Treat truncated output (no closing ']') as ContextTooLongError so batch splitting kicks in automatically - Replace full-file assemble() with assemble_entries(): write cached entries once at start, then append each translated batch incrementally

gemini-code-assist

Code Review

This pull request transitions the translation pipeline from cloud-based CLI backends to local LLMs (LM Studio and Ollama), introducing glossary support, progress tracking, and batch-splitting recovery. The review highlights critical issues in the pipeline's file-writing logic that lead to entry duplication and lost formatting, a regex flaw in fluff-data filtering, a potential infinite polling loop during model loading, and opportunities to improve error handling and resilience.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-23T12:31:56Z

+    all_entries = [line for line in parsed.lines if line.kind == LineKind.ENTRY]
+
+    # Write cached + untranslatable entries as the initial file
+    assemble_entries(all_entries, config.output_path, append=False)
+
+    if not misses:
+        save_cache(cache_path, cache)
+        logger.info("Cache saved to %s", cache_path)
+        return config.output_path


Filtering parsed.lines to only LineKind.ENTRY and writing it initially discards all comments, empty lines, and sections from the INI file. Furthermore, writing the entries before they are translated means they are written in English, and then appended in Russian later, leading to duplication. We should remove this partial write and instead write the full parsed.lines at once when there are no misses.

Suggested change

all_entries = [line for line in parsed.lines if line.kind == LineKind.ENTRY]

# Write cached + untranslatable entries as the initial file

assemble_entries(all_entries, config.output_path, append=False)

if not misses:

save_cache(cache_path, cache)

logger.info("Cache saved to %s", cache_path)

return config.output_path

if not misses:

assemble_entries(parsed.lines, config.output_path, append=False)

save_cache(cache_path, cache)

logger.info("Cache saved to %s", cache_path)

return config.output_path

gemini-code-assist · 2026-06-23T12:31:56Z

+            save_cache(cache_path, cache)
+            assemble_entries(batch, config.output_path, append=True)


Remove the incremental append assemble_entries(batch, config.output_path, append=True) which causes duplication of entries and breaks the INI file structure. The translation progress is already safely preserved in the cache after each batch.

Suggested change

save_cache(cache_path, cache)

assemble_entries(batch, config.output_path, append=True)

save_cache(cache_path, cache)

gemini-code-assist · 2026-06-23T12:31:56Z

    save_cache(cache_path, cache)
    logger.info("Cache saved to %s", cache_path)


Add assemble_entries(parsed.lines, config.output_path, append=False) at the end of the run function to write the fully translated file with all comments, sections, and empty lines preserved.

Suggested change

save_cache(cache_path, cache)

logger.info("Cache saved to %s", cache_path)

assemble_entries(parsed.lines, config.output_path, append=False)

save_cache(cache_path, cache)

logger.info("Cache saved to %s", cache_path)

gemini-code-assist · 2026-06-23T12:31:56Z

 )

+# Matches SC fluff data: sequences of decimal numbers separated by spaces
+_FLUFF_RE = re.compile(r"(\d+\.\d+\s+){4,}")


The regex (\d+\.\d+\s+){4,} requires a trailing space for the last decimal number in the sequence. If the fluff data has exactly 4 numbers and no trailing space at the end of the line (e.g., 1.2 3.4 5.6 7.8), it will not be matched and will be incorrectly treated as translatable. Using \d+\.\d+(?:\s+\d+\.\d+){3,} correctly matches 4 or more decimal numbers separated by spaces regardless of trailing whitespace.

Suggested change

_FLUFF_RE = re.compile(r"(\d+\.\d+\s+){4,}")

_FLUFF_RE = re.compile(r"\d+\.\d+(?:\s+\d+\.\d+){3,}")

gemini-code-assist · 2026-06-23T12:31:57Z

+        ctx = None
+        while True:
+            loaded, ctx = self._model_status()
+            if loaded:
+                break
+            time.sleep(_LOAD_POLL_INTERVAL)


The polling loop in ensure_model_loaded will run indefinitely if the model name is misspelled, not downloaded, or fails to load. Adding a maximum number of attempts/timeout prevents the application from hanging forever.

Suggested change

ctx = None

while True:

loaded, ctx = self._model_status()

if loaded:

break

time.sleep(_LOAD_POLL_INTERVAL)

ctx = None

max_attempts = 30 # 60 seconds total timeout

attempts = 0

while True:

loaded, ctx = self._model_status()

if loaded:

break

attempts += 1

if attempts >= max_attempts:

raise RuntimeError(f"Timed out waiting for model {self.model!r} to load in LM Studio.")

time.sleep(_LOAD_POLL_INTERVAL)

…cates

… splitting

mvoof added 15 commits June 20, 2026 23:25

feat: add tqdm progress bar and startup info to pipeline

aa007ee

Shows input/output paths, cache location, and language pair before translation starts. Replaces manual logger progress with a live tqdm bar showing entries/s, ETA, and current batch.

refactor: switch to local-only backends (lmstudio + ollama), add --ba…

6872392

…tch-size auto

fix: remove unused CLI backends (claude, gemini, codex) and add conte…

89e7ff7

…xt-too-long auto-split

refactor: remove auto batch size, batch_size is manual only

5f83199

feat: output only translated lines, show resume progress on startup

2074418

fix: write output after each batch for crash recovery

afecefa

fix: skip SC fluff data strings (long float sequences) from translation

a7d364d

fix: include all entries in output, copy untranslatable as-is

dc40d1b

fix: expand system prompt with SC proper nouns glossary to prevent in…

85bc5ab

…consistent translation

fix: add more SC corporations to glossary (Sakura Sun, Covalex, Shubi…

e58f35d

…n, etc)

feat: load glossary from glossary.txt and inject into system prompt

eaec9ba

fix: add rule to preserve SC abbreviations (HUD, VTOL, SHD, ESP, etc)

07d603c

feat(lmstudio): auto-load model via LM Studio API if not loaded

eca5785

Before the first batch, LMStudioBackend.ensure_model_loaded() checks GET /api/v1/models and calls POST /api/v1/models/load if the model is absent, then polls until it's ready.

feat(lmstudio): auto-load model via LM Studio API if not loaded

76a4830

Before the first batch, ensure_model_loaded() checks GET /api/v1/models and calls POST /api/v1/models/load if the model is absent, then polls until it is ready. Fixes load request field name: 'model' not 'identifier'.

gemini-code-assist Bot reviewed Jun 23, 2026

View reviewed changes

mvoof added 5 commits June 23, 2026 17:33

fix: exclude untranslated misses from initial output to prevent dupli…

ba0a7d8

…cates

fix: improve fluff regex to match sequences without trailing whitespace

8c296b7

fix: add timeout to model load polling loop to prevent infinite hang

891a7f8

fix: treat JSON decode errors as ContextTooLongError to trigger batch…

5a109a2

… splitting

fix: rewrite output in original key order after all batches complete

1138eb0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/lmstudio auto load#1

Feat/lmstudio auto load#1
mvoof wants to merge 20 commits into
mainfrom
feat/lmstudio-auto-load

mvoof commented Jun 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 23, 2026

Uh oh!

gemini-code-assist Bot Jun 23, 2026

Uh oh!

gemini-code-assist Bot Jun 23, 2026

Uh oh!

gemini-code-assist Bot Jun 23, 2026

Uh oh!

gemini-code-assist Bot Jun 23, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		save_cache(cache_path, cache)
		assemble_entries(batch, config.output_path, append=True)

		save_cache(cache_path, cache)
		logger.info("Cache saved to %s", cache_path)

	_FLUFF_RE = re.compile(r"(\d+\.\d+\s+){4,}")
	_FLUFF_RE = re.compile(r"\d+\.\d+(?:\s+\d+\.\d+){3,}")

Uh oh!

Conversation

mvoof commented Jun 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant