Skip to content

v0.2.0: Cross-platform, i18n, native audio decoder, UX improvements#1

Merged
1vank1n merged 18 commits into
mainfrom
fix/ci-cross-platform
Apr 8, 2026
Merged

v0.2.0: Cross-platform, i18n, native audio decoder, UX improvements#1
1vank1n merged 18 commits into
mainfrom
fix/ci-cross-platform

Conversation

@1vank1n

@1vank1n 1vank1n commented Apr 8, 2026

Copy link
Copy Markdown
Owner

v0.2.0

Cross-platform

  • CI builds for macOS ARM64, Windows x64, Linux x64 (GitHub Actions)
  • Platform-specific GPU: Metal on macOS, CPU on Windows/Linux
  • Cross-platform FFmpeg detection (Homebrew/apt/PATH/where.exe)
  • Bundle configs: DMG, NSIS installer, deb + AppImage

Native audio decoder (no FFmpeg required)

  • Symphonia (pure Rust) replaces FFmpeg for MP3/WAV/FLAC/OGG/AAC/M4A/MP4/MKV/WebM
  • Rubato for high-quality resampling to 16kHz mono
  • FFmpeg kept as optional fallback for AVI/MOV/WMA only

GigaAM v3 fix

  • Fixed crash on audio >30s (ONNX dimension mismatch)
  • Auto-chunking: 25s chunks with 2s overlap, results joined

i18n

  • English (default) and Russian UI
  • Language switcher in Settings, persisted in localStorage

UX improvements

  • Progress bar with stage labels (Decoding/Resampling/Transcribing)
  • Live elapsed timer + ETA based on previous transcription speed
  • Indeterminate pulsing bar during whisper inference
  • Log panel (toggle via status bar) for debugging
  • Cancel button on in-progress files
  • Click-to-browse on drop zone (native file picker)
  • Re-transcribe with different model (retry button)
  • Audio track export for video files (WAV download)
  • Persist selected model + language across restarts
  • Fixed layout: scrollable content with fixed header/status bar
  • Min-height on transcription result for readable preview
  • App version displayed in header
  • App icon

Versioning

  • Version synced across package.json, tauri.conf.json, Cargo.toml
  • CI creates draft release on tag push

macOS note

  • Unsigned app: run xattr -cr /Applications/HandyFiles.app after install
  • Intel Mac (x64) build disabled (no ONNX prebuilt for x86_64-apple-darwin)

🤖 Generated with Claude Code

1vank1n and others added 18 commits April 8, 2026 08:23
- Add "cargo install tauri-cli" step — CI runners don't have it
- Remove duplicate transcribe-rs entries for Windows/Linux
  (Cargo merges features from all sections, so whisper-metal
   would incorrectly be enabled on non-macOS platforms)
- Keep Metal only in macOS target-specific section
- Add libclang-dev and cmake for Linux (needed by whisper-rs bindgen)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Linux: switch ubuntu-22.04 → ubuntu-24.04 (ONNX Runtime prebuilt
  requires glibc 2.38+, ubuntu-22.04 only has 2.35)
- macOS x64: temporarily disabled — ort-sys has no prebuilt ONNX
  Runtime for x86_64-apple-darwin (Intel Macs still work with
  Whisper-only, GigaAM needs ARM64)
- Use startsWith for Linux platform check

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
App is unsigned, so macOS blocks it. Document xattr -cr fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add gotchas, tech debt, fix outdated architecture info.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Progress bar on files during decoding/resampling/transcription
  with percentage and stage labels
- Log panel (toggle via "Лог" button in status bar) shows timestamped
  decode/model load/transcription messages for debugging
- Fix slow resampling: reduce sinc_len 256→64, oversampling 256→128,
  chunk_size 1024→4096 — significantly faster on long files
- Backend emits progress/log events via Tauri event system
- Store tracks progress/stage per file and last 200 log entries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Cancel button (stop icon) on in-progress files in queue
  Uses cancellation token checked between decode/load/transcribe steps
- Click-to-browse: clicking the drop zone opens native file picker
- Fix duplicate log entries: listeners now init once (not per render)
  by calling store.getState().initListeners() directly instead of
  through useEffect deps that trigger on every render in StrictMode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Export button (music note icon with tooltip) in transcription result
  view, visible only for video files (mp4, mkv, mov, avi, webm)
- Decodes audio from source video on demand via Symphonia → writes WAV
  16kHz mono via hound — no temp files, no cleanup needed
- is_video flag on QueuedFile for frontend to show/hide export button

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use useRef guard to prevent double listener init in StrictMode
- Show indeterminate (pulsing) progress bar during transcription
  since whisper-rs is a blocking call with no intermediate progress
- Definite progress bar shown only when actual progress > 0 (decoding/resampling)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Live ticking timer (0:00, 0:01, ...) during transcription proves
  the process is alive even when whisper-rs gives no intermediate progress
- ETA calculated from previous transcription speed ratio:
  after first completed file, shows "1:23 (~2:10)" format
- Speed ratio = audio_duration / processing_time, persisted in store
- Audio duration extracted from log messages for ETA calculation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical fixes:
- GigaAM crashes on audio >30s due to ONNX dimension mismatch.
  Added chunking: splits audio into 25s chunks with 2s overlap,
  transcribes each chunk, joins results. Whisper unaffected.
- Log panel now renders as fixed overlay at bottom (z-50) instead
  of inline — no longer overlaps transcription result
- ETA: backend explicitly sends audio_duration_sec via progress
  event instead of fragile regex parsing from log messages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… bar

Split layout into 3 zones:
- Header (shrink-0): title + buttons, always visible at top
- Content (flex-1 min-h-0 overflow-y-auto): scrollable area with
  drop zone, file queue, transcription result
- Status bar (shrink-0): always visible at bottom, never overlapped

Fixes issue where many files caused content to overflow onto status bar.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
min-h-[120px] ensures at least ~4 lines visible when scrolling down.
max-h-[400px] caps growth so it doesn't push other elements off screen.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sync version across package.json, tauri.conf.json, Cargo.toml.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Version injected via Vite define from package.json at build time.
Displayed as gray "v0.2.0" next to "HandyFiles" in header.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Simple i18n system using Zustand store (no heavy deps like i18next)
- English as default, Russian available
- UI language persisted in localStorage
- Language switcher in Settings panel
- All UI strings translated: header, drop zone, file queue, result,
  settings, models, log panel, status bar, warnings
- Version displayed in header from package.json via Vite define

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Generated from handyfiles-icon.png:
- 32x32, 128x128, 256x256 PNGs for Tauri
- icon.icns for macOS (16-1024px via iconutil)
- icon.ico for Windows (256px via ImageMagick)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@1vank1n 1vank1n changed the title Fix CI: install Tauri CLI, fix platform dependencies v0.2.0: Cross-platform, i18n, native audio decoder, UX improvements Apr 8, 2026
@1vank1n 1vank1n merged commit 7992946 into main Apr 8, 2026
3 checks passed
@1vank1n 1vank1n deleted the fix/ci-cross-platform branch April 8, 2026 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant