Feat/lmstudio auto load#1
Conversation
Shows input/output paths, cache location, and language pair before translation starts. Replaces manual logger progress with a live tqdm bar showing entries/s, ETA, and current batch.
…xt-too-long auto-split
…consistent translation
Before the first batch, LMStudioBackend.ensure_model_loaded() checks GET /api/v1/models and calls POST /api/v1/models/load if the model is absent, then polls until it's ready.
Before the first batch, ensure_model_loaded() checks GET /api/v1/models and calls POST /api/v1/models/load if the model is absent, then polls until it is ready. Fixes load request field name: 'model' not 'identifier'.
- Use /api/v1/chat instead of /v1/chat/completions; pass context_length and max_output_tokens explicitly per request - Auto-detect context_length from loaded_instances after model load - Treat truncated output (no closing ']') as ContextTooLongError so batch splitting kicks in automatically - Replace full-file assemble() with assemble_entries(): write cached entries once at start, then append each translated batch incrementally
There was a problem hiding this comment.
Code Review
This pull request transitions the translation pipeline from cloud-based CLI backends to local LLMs (LM Studio and Ollama), introducing glossary support, progress tracking, and batch-splitting recovery. The review highlights critical issues in the pipeline's file-writing logic that lead to entry duplication and lost formatting, a regex flaw in fluff-data filtering, a potential infinite polling loop during model loading, and opportunities to improve error handling and resilience.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| all_entries = [line for line in parsed.lines if line.kind == LineKind.ENTRY] | ||
|
|
||
| # Write cached + untranslatable entries as the initial file | ||
| assemble_entries(all_entries, config.output_path, append=False) | ||
|
|
||
| if not misses: | ||
| save_cache(cache_path, cache) | ||
| logger.info("Cache saved to %s", cache_path) | ||
| return config.output_path |
There was a problem hiding this comment.
Filtering parsed.lines to only LineKind.ENTRY and writing it initially discards all comments, empty lines, and sections from the INI file. Furthermore, writing the entries before they are translated means they are written in English, and then appended in Russian later, leading to duplication. We should remove this partial write and instead write the full parsed.lines at once when there are no misses.
| all_entries = [line for line in parsed.lines if line.kind == LineKind.ENTRY] | |
| # Write cached + untranslatable entries as the initial file | |
| assemble_entries(all_entries, config.output_path, append=False) | |
| if not misses: | |
| save_cache(cache_path, cache) | |
| logger.info("Cache saved to %s", cache_path) | |
| return config.output_path | |
| if not misses: | |
| assemble_entries(parsed.lines, config.output_path, append=False) | |
| save_cache(cache_path, cache) | |
| logger.info("Cache saved to %s", cache_path) | |
| return config.output_path |
| save_cache(cache_path, cache) | ||
| assemble_entries(batch, config.output_path, append=True) |
There was a problem hiding this comment.
Remove the incremental append assemble_entries(batch, config.output_path, append=True) which causes duplication of entries and breaks the INI file structure. The translation progress is already safely preserved in the cache after each batch.
| save_cache(cache_path, cache) | |
| assemble_entries(batch, config.output_path, append=True) | |
| save_cache(cache_path, cache) |
| save_cache(cache_path, cache) | ||
| logger.info("Cache saved to %s", cache_path) |
There was a problem hiding this comment.
Add assemble_entries(parsed.lines, config.output_path, append=False) at the end of the run function to write the fully translated file with all comments, sections, and empty lines preserved.
| save_cache(cache_path, cache) | |
| logger.info("Cache saved to %s", cache_path) | |
| assemble_entries(parsed.lines, config.output_path, append=False) | |
| save_cache(cache_path, cache) | |
| logger.info("Cache saved to %s", cache_path) |
| ) | ||
|
|
||
| # Matches SC fluff data: sequences of decimal numbers separated by spaces | ||
| _FLUFF_RE = re.compile(r"(\d+\.\d+\s+){4,}") |
There was a problem hiding this comment.
The regex (\d+\.\d+\s+){4,} requires a trailing space for the last decimal number in the sequence. If the fluff data has exactly 4 numbers and no trailing space at the end of the line (e.g., 1.2 3.4 5.6 7.8), it will not be matched and will be incorrectly treated as translatable. Using \d+\.\d+(?:\s+\d+\.\d+){3,} correctly matches 4 or more decimal numbers separated by spaces regardless of trailing whitespace.
| _FLUFF_RE = re.compile(r"(\d+\.\d+\s+){4,}") | |
| _FLUFF_RE = re.compile(r"\d+\.\d+(?:\s+\d+\.\d+){3,}") |
| ctx = None | ||
| while True: | ||
| loaded, ctx = self._model_status() | ||
| if loaded: | ||
| break | ||
| time.sleep(_LOAD_POLL_INTERVAL) |
There was a problem hiding this comment.
The polling loop in ensure_model_loaded will run indefinitely if the model name is misspelled, not downloaded, or fails to load. Adding a maximum number of attempts/timeout prevents the application from hanging forever.
| ctx = None | |
| while True: | |
| loaded, ctx = self._model_status() | |
| if loaded: | |
| break | |
| time.sleep(_LOAD_POLL_INTERVAL) | |
| ctx = None | |
| max_attempts = 30 # 60 seconds total timeout | |
| attempts = 0 | |
| while True: | |
| loaded, ctx = self._model_status() | |
| if loaded: | |
| break | |
| attempts += 1 | |
| if attempts >= max_attempts: | |
| raise RuntimeError(f"Timed out waiting for model {self.model!r} to load in LM Studio.") | |
| time.sleep(_LOAD_POLL_INTERVAL) |
No description provided.