Skip to content

Merge dev: typography styling system, audio FX, camera A/V sync, and export fixes#6

Merged
verticalrectangle merged 14 commits into
mainfrom
dev
Jun 20, 2026
Merged

Merge dev: typography styling system, audio FX, camera A/V sync, and export fixes#6
verticalrectangle merged 14 commits into
mainfrom
dev

Conversation

@epsilver

Copy link
Copy Markdown
Collaborator

Brings main up to date with dev (177 commits). High-level summary by area.

Typography

  • Real fonts, per-preset styling, easing, 50+ presets (Waves 1–3): per-element kinetic motion, evolved karaoke, gradient fill, letter-spacing.
  • Animated, accurate preset preview cards (real font + sample sentence + live motion).
  • New this session: per-field tweak + "hold" (pin) store — adjust a preset and pin what survives switching presets. Adds a user-facing alignment control (left/center/right) plus vertical position, X/Y offset, fade, and text style as holdable controls.
  • Consolidation: Typography is now the single styling surface for managed lyric/subtitle tracks (track-wide, no per-line desync); the Clip tab stays the per-clip keyframable editor for standalone Text. Preset grid moved into a collapsible "Browse presets" disclosure.
  • Persistence: project format bumped to persist the active preset + full tweak/hold store (older projects load via version guards).

Audio / FX

  • Per-brick dry/wet mix for audio FX; audio FX cards drag to the timeline like other FX.
  • "Hear effects" works while idle and on the Video Record brick; monitor windows track the live playhead and the brick span.
  • Coupling an audio FX brick to a record brick enables Hear Effects.
  • Visible FX-expand chevron on bricks with coupled FX; un-mirrored generated-effect thumbnails; solo FX brick drag onto content with the other chain kind.

Camera / video recording

  • Camera-mic detection + A/V-sync clarity in the Video Record panel.
  • Measure & A/B button for A/V offset (GCC-PHAT); dropped the fixed 100 ms camera-latency constant.
  • Rotation-correct border snapping when resizing a clip in the canvas.

Export & fidelity fixes

  • Stop audio export freezing (input-seeked matroska/AAC NOPTS into amix) and match preview loudness (amix normalize=0).
  • Audio clips with in_point == start no longer all play at t=0; audio_path fallback now respects mute.
  • "Rip audio to new track" works on H.264/AAC video; voice conversions survive a project reload.

Subtitles / text

  • Fix: "Transcribe (subtitles only)" now actually builds the Subtitles track (wired the completion callback to apply_subtitle_pipeline).
  • Subtitle button wiring; forbid text bricks stacked on content.

🤖 Generated with Claude Code

epsilver added 14 commits June 16, 2026 15:38
The monitor approximated the span coarsely: a non-windowed chain that
monitor_chain_sync swapped whole stages in/out on the UI thread as the
playhead crossed a brick edge (abrupt, no crossfade) — and while parked it
ignored the span entirely. So "what you monitor" didn't match "what the take
becomes", which is windowed precisely in render.cpp / process_seg.

Now the monitor uses the SAME windowed processor as playback/export. While the
transport rolls, perf_input_block runs the live mic through
audio_fx_chain_process_seg with src_t derived from the master clock (g_read_pos)
minus the host brick's start — so each coupled FX activates over its own span
with the same ~12 ms edge crossfades. The segment windows don't change as the
playhead moves, so the chain is built once (on a brick edit), not per frame;
the audio thread does the per-sample gating. Parked, it falls back to the plain
all-on chain for dial-in (unchanged).

New audio_monitor_chain_set_seg(segs, brick_start) + g_mon_windowed /
g_mon_brick_start; monitor_chain_sync picks the seg path while moving and the
plain path while idle, hashing the window bounds so moving/resizing a brick
rebuilds.
…r Effects

Adding/dropping an audio FX brick onto an Audio Record or Video Record brick
now turns "Hear effects" on automatically, so you immediately monitor through
it (and can dial its dry/wet) instead of hunting for the checkbox. Hooked in
timeline_couple_fx_brick — the single chokepoint every coupling path (drag,
toolbox, IPC autocouple) runs through — and scoped to Record/VideoRecord
hosts. Verified: hear_fx flips false→true on couple.
…rked/scrubbing

Previously the monitor only windowed the FX while the transport was rolling
(playing/recording); parked it fell back to an all-on chain, so scrubbing the
playhead didn't move the span — the FX was just on continuously. Two reasons:
the all-on path was a dial-in convenience, and a static master clock made
process_seg's seek detection reset the effect every block.

Now the monitor windows on the LIVE playhead in every state. The window gating
uses src_t (master clock = playhead, updated by audio_seek on every scrub), and
frame_idx is a free-running counter so a parked playhead is never mistaken for a
per-block seek — no resets, no glitches. src_t advances per sample while rolling
and holds while parked.

So the brick span is now authoritative everywhere — monitor, take, export:
park/scrub the playhead inside a brick and you hear the FX, outside it's dry,
and a brick spanning the whole take is "always on" (full-host window → win1=0).
To dial in an effect you park the playhead inside its span. monitor_chain_sync
always feeds the windowed seg chain now; the idle all-on path is gone.
… the other kind of chain

clips_conflict treated ANY two overlapping FX bricks as a conflict (they must
weld or bounce), distinguishing only FX-vs-content. But a video chain and an
audio chain are different kinds that coexist as separate glass bricks on the
same host. So dragging a standalone video FX brick onto content that already
carried an audio chain (or vice-versa) was rejected — the drop bounced back and
never coupled. Panel-card drops worked because they couple directly without the
drag-placement overlap check, which is why it looked like "cards work, timeline
bricks don't".

Only SAME-kind FX bricks conflict now (two audio, or two video — those still
must weld into one chain, not stack). Cross-kind overlap is allowed, so the
solo brick lands on the content's track and couple_pending_tick welds it into a
second chain.

Verified via a scripted UI drag: a standalone VHS brick dragged onto an
image+audio-chain clip couples into a second (video) chain — the host ends up
with both, and the source track empties.
The FX-library cards draw the preview texture with top-down UVs ({0,0}-{1,1}),
which is correct for the legacy CPU effects (top-down pixel uploads). But the
generated/shader effects come from fx_apply, whose output is a bottom-up GL
FBO — so those thumbnails rendered vertically mirrored, and any directional
effect ran the wrong way (motion going up when it should go down). The
composited canvas was always correct; only the preview was flipped.

Flip V during the blit into the dedicated per-effect preview texture so the
shader previews match the CPU ones (and the upright source). Verified: the
preview source image, previously inverted, now renders right-side-up.
extract_audio_start remuxed the source into a .webm via video_extract_segment,
which (a) copied the VIDEO stream too and (b) used a WebM container. WebM only
holds VP8/VP9/AV1 + Vorbis/Opus, so any ordinary H.264/AAC clip failed at
avformat_write_header — the worker returned an error, extract_done stayed false,
and no audio track was ever added. Looked like the menu item did nothing.

Rip now extracts AUDIO ONLY into a .mka (Matroska audio), which stream-copies
essentially any audio codec with no re-encode or quality loss. Added an
audio_only flag to video_extract_segment (default false, so segment extraction
is unchanged) that drops the video stream, and exposed it on the
extract_clip_segment IPC.

Verified end-to-end through the real libav path: ripping a 30s window of an
H.264/AAC lyric video produces a valid AAC-only .mka (no video stream, probes
clean) instead of erroring.
1) "Extract Subtitles" was wired to apply_subtitle_mode (the LYRICS builder,
   copy-pasted from the Extract Lyrics button), so apply_subtitle_pipeline — the
   function that actually creates a Subtitles track of ClipType::Subtitle clips
   from the segments JSON — had zero callers and the subtitle track never
   appeared. Point the button at apply_subtitle_pipeline.

2) Text overlays belong on their own tracks (they render on top regardless of
   track), but IPC add_clip happily stacked a Text/Subtitle/Lyrics clip onto a
   content clip on the same row. add_clip now refuses any text-vs-content overlap
   on a track and returns an error pointing to a separate track. The human drag
   path already enforced this via clips_conflict; this closes the agent/IPC hole.
…nal take

Export mixes a base audio track from state.audio_path whenever no clip "covers"
that exact path. When you rip a video's audio and rebuild it as converted-voice
segments (which reference the ripped .mka, not the source .MOV) and mute the
source video, state.audio_path still points at the .MOV — uncovered — so the
fallback summed the entire original take at full volume UNDER every converted
segment. Hence "all the voices on top of each other," and muting the source did
nothing because the fallback bypassed it.

Now the fallback checks whether any clip bearing state.audio_path is muted (its
clip or its track); if so the user silenced it deliberately, so the fallback is
skipped. Lyric-video and plain audio workflows are unchanged (their audio clip
covers the path, or nothing's muted).
Timeline placement in the GL export used adelay = (clip.start - in_point). But
every audio stream is reset to pts 0 by asetpts=PTS-STARTPTS in the filter graph,
so placement is purely adelay — which must be the FULL clip start, not start
minus in_point (that subtraction was leftover reasoning for a pre-asetpts seek
behaviour). The Record-take path already used adelay = clip.start correctly.

For sequential slices of one source — e.g. converted-voice segments where each
clip's in_point equals its timeline start — start - in_point collapsed to 0, so
every segment got adelay=0 and they all fired simultaneously at the beginning
instead of playing in sequence. Use adelay = clip.start.
The Clip's voice-conversion RESULT (vc_status / vc_out_path / vc_model_used) was
never serialized — only the FX settings were. So on reload vc_status reset to
Idle and vc_out_path went empty, and since playback/export only substitute the
converted audio when vc_status == Ready, every converted clip silently reverted
to its original take. Persist the result (project v52); a Processing status
settles to Ready if the output file still exists, else Idle so it can re-run.
Progress/error stay transient.
Clicking an effect inside a welded Audio Multi-FX brick showed only a condensed
inline slider editor — and for voice-convert it showed nothing at all (the
chain-entry editor had no AudioVoiceConvert case), so you couldn't pick a model,
re-convert, or transpose without decoupling the brick back to standalone.

Extracted the standalone brick's full settings (params + presets + the
voice-convert model picker / HF search / conversion progress / transpose +
dry/wet) into a shared audio_fx_settings_ui(fx_type, afx, ti, fx_start, fx_end).
The standalone panel and the welded-chain selected-entry editor now both call it,
so a welded effect gets the exact same page as a standalone one. The VC host is
the audio clip the chain rides on (its track + the brick span). The condensed
audio_chain_entry_params_ui stays for the bus chain's inline editor.
Two distinct bugs in the export audio filtergraph, both surfaced by a
project mixing the source .mka audio with delayed converted-voice clips.

1. Export froze at ~3% (frame 46). Input-seeked matroska/AAC sources have
   encoder priming that leaves a NOPTS packet at end-of-stream. A lone aac
   encoder tolerates it, but once such a stream passes through adelay and
   into amix the NOPTS propagates to the mix output; the mp4 muxer then
   sees a non-monotonic dts (AV_NOPTS_VALUE) and aborts. ffmpeg dies, the
   render pipe breaks (EPIPE), and the GL export loop stalls.
   This was newly triggered by 68625a6: that commit correctly switched
   adelay from (start - in_point), which clamped to ~0 for sequential
   slices, to the full clip.start — so these streams get a real adelay for
   the first time, exposing the NOPTS path.
   Fix: append asetpts=N/SR/TB to each processed stream chain, regenerating
   pts from the sample count (pts = sample_index / sample_rate) so the chain
   is strictly monotonic with no NOPTS flush packet.

2. Exports came out ~10 dB quieter than the project preview. The preview
   mixer (audio.cpp mix_master) sums clips additively and hard-clamps, but
   amix defaults to normalize=1, which divides the mix by the input count —
   and by the *active* count, so the attenuation drifts as clips start/end.
   Fix: amix ...:normalize=0 on both export paths (GL/VAAPI and the libx264
   filter-script path) so the export sums like the preview and the encoder
   clamps at 0 dBFS the same way.

Both verified against the exact ffmpeg command the app builds for the
affected project: it died at frame 5 before, exports a clean 44.17s file
(valid h264 + aac) after, at matching loudness (peak -8.3 dB vs -17.9 dB).
…g into the Typography tab

Reworks text styling so the Typography tab is the single surface for managed
lyric/subtitle tracks, while the Clip tab stays the per-clip (keyframable)
editor for standalone Text bricks. Adds a per-field "hold" (pin) so adjusting a
preset and switching to another keeps the pinned tweaks.

Tweak/hold store (app.h TypoTweaks):
- Generalises the three ad-hoc globals (typo_font_size/typo_color/typo_case)
  into a per-field value + active + held bitset. Editing a control marks the
  field active; the Hold pin marks it held; switching presets keeps only held
  fields (keep_held). apply_typo_style reads each field as held/tweaked ? tweak
  : preset value.
- Covers font size, color, letter case, alignment (NEW — the renderer already
  honored sub_anchor_h but nothing exposed it), tracking, wrap, vertical
  position, X/Y offset, fade in/out, and text style (shadow/stroke/glow/box).
  Fade is applied only when set so existing per-clip fades aren't wiped.

Typography panel:
- Tune/Advanced controls now sit on top; the preset grid moved into a
  collapsed-by-default "Browse presets · <active>" disclosure.
- section_fade / section_text_style refactored to take the field by reference
  and return whether they were edited, then shared via panel_clip.h so the Clip
  tab (Text) and the Typography tab (subtitle/lyric, track-wide) render the
  exact same controls.

Track-level styling:
- typo_restyle_live restyles every clip sharing the selected clip's type and
  source, so a tweak lands on the whole lyric/subtitle track at once. Only a
  standalone Text brick styles in isolation.
- The Clip tab hides Position/Color/Fade/Text Style for Lyrics/Subtitle (with a
  pointer to the Typography tab) so per-line edits can't desync the track; it
  keeps the keyframable sections for standalone Text.

Persistence: project format v53/v54 persists the active preset + the full
tweak/hold store (older projects load unchanged via version guards).
…s only)"

The clip context-menu item kicked the transcription pipeline but never set
state.pipeline_on_done, so when transcription finished nothing populated the
timeline — the process "ran" but no Subtitles track appeared. Wire it to
apply_subtitle_pipeline (ClipType::Subtitle on a "Subtitles" track), matching
the Extract Subtitles button, which had already been fixed the same way. Make
lyric video already set pipeline_on_done = generate_typography, which is why
that path worked and this one didn't.
@verticalrectangle verticalrectangle merged commit 59d9ddd into main Jun 20, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants