Merge dev: typography styling system, audio FX, camera A/V sync, and export fixes by epsilver · Pull Request #6 · verticalrectangle/pop-maker-studio

epsilver · 2026-06-19T20:02:39Z

Brings main up to date with dev (177 commits). High-level summary by area.

Typography

Real fonts, per-preset styling, easing, 50+ presets (Waves 1–3): per-element kinetic motion, evolved karaoke, gradient fill, letter-spacing.
Animated, accurate preset preview cards (real font + sample sentence + live motion).
New this session: per-field tweak + "hold" (pin) store — adjust a preset and pin what survives switching presets. Adds a user-facing alignment control (left/center/right) plus vertical position, X/Y offset, fade, and text style as holdable controls.
Consolidation: Typography is now the single styling surface for managed lyric/subtitle tracks (track-wide, no per-line desync); the Clip tab stays the per-clip keyframable editor for standalone Text. Preset grid moved into a collapsible "Browse presets" disclosure.
Persistence: project format bumped to persist the active preset + full tweak/hold store (older projects load via version guards).

Audio / FX

Per-brick dry/wet mix for audio FX; audio FX cards drag to the timeline like other FX.
"Hear effects" works while idle and on the Video Record brick; monitor windows track the live playhead and the brick span.
Coupling an audio FX brick to a record brick enables Hear Effects.
Visible FX-expand chevron on bricks with coupled FX; un-mirrored generated-effect thumbnails; solo FX brick drag onto content with the other chain kind.

Camera / video recording

Camera-mic detection + A/V-sync clarity in the Video Record panel.
Measure & A/B button for A/V offset (GCC-PHAT); dropped the fixed 100 ms camera-latency constant.
Rotation-correct border snapping when resizing a clip in the canvas.

Export & fidelity fixes

Stop audio export freezing (input-seeked matroska/AAC NOPTS into amix) and match preview loudness (amix normalize=0).
Audio clips with in_point == start no longer all play at t=0; audio_path fallback now respects mute.
"Rip audio to new track" works on H.264/AAC video; voice conversions survive a project reload.

Subtitles / text

Fix: "Transcribe (subtitles only)" now actually builds the Subtitles track (wired the completion callback to apply_subtitle_pipeline).
Subtitle button wiring; forbid text bricks stacked on content.

🤖 Generated with Claude Code

The monitor approximated the span coarsely: a non-windowed chain that monitor_chain_sync swapped whole stages in/out on the UI thread as the playhead crossed a brick edge (abrupt, no crossfade) — and while parked it ignored the span entirely. So "what you monitor" didn't match "what the take becomes", which is windowed precisely in render.cpp / process_seg. Now the monitor uses the SAME windowed processor as playback/export. While the transport rolls, perf_input_block runs the live mic through audio_fx_chain_process_seg with src_t derived from the master clock (g_read_pos) minus the host brick's start — so each coupled FX activates over its own span with the same ~12 ms edge crossfades. The segment windows don't change as the playhead moves, so the chain is built once (on a brick edit), not per frame; the audio thread does the per-sample gating. Parked, it falls back to the plain all-on chain for dial-in (unchanged). New audio_monitor_chain_set_seg(segs, brick_start) + g_mon_windowed / g_mon_brick_start; monitor_chain_sync picks the seg path while moving and the plain path while idle, hashing the window bounds so moving/resizing a brick rebuilds.

…r Effects Adding/dropping an audio FX brick onto an Audio Record or Video Record brick now turns "Hear effects" on automatically, so you immediately monitor through it (and can dial its dry/wet) instead of hunting for the checkbox. Hooked in timeline_couple_fx_brick — the single chokepoint every coupling path (drag, toolbox, IPC autocouple) runs through — and scoped to Record/VideoRecord hosts. Verified: hear_fx flips false→true on couple.

…rked/scrubbing Previously the monitor only windowed the FX while the transport was rolling (playing/recording); parked it fell back to an all-on chain, so scrubbing the playhead didn't move the span — the FX was just on continuously. Two reasons: the all-on path was a dial-in convenience, and a static master clock made process_seg's seek detection reset the effect every block. Now the monitor windows on the LIVE playhead in every state. The window gating uses src_t (master clock = playhead, updated by audio_seek on every scrub), and frame_idx is a free-running counter so a parked playhead is never mistaken for a per-block seek — no resets, no glitches. src_t advances per sample while rolling and holds while parked. So the brick span is now authoritative everywhere — monitor, take, export: park/scrub the playhead inside a brick and you hear the FX, outside it's dry, and a brick spanning the whole take is "always on" (full-host window → win1=0). To dial in an effect you park the playhead inside its span. monitor_chain_sync always feeds the windowed seg chain now; the idle all-on path is gone.

… the other kind of chain clips_conflict treated ANY two overlapping FX bricks as a conflict (they must weld or bounce), distinguishing only FX-vs-content. But a video chain and an audio chain are different kinds that coexist as separate glass bricks on the same host. So dragging a standalone video FX brick onto content that already carried an audio chain (or vice-versa) was rejected — the drop bounced back and never coupled. Panel-card drops worked because they couple directly without the drag-placement overlap check, which is why it looked like "cards work, timeline bricks don't". Only SAME-kind FX bricks conflict now (two audio, or two video — those still must weld into one chain, not stack). Cross-kind overlap is allowed, so the solo brick lands on the content's track and couple_pending_tick welds it into a second chain. Verified via a scripted UI drag: a standalone VHS brick dragged onto an image+audio-chain clip couples into a second (video) chain — the host ends up with both, and the source track empties.

The FX-library cards draw the preview texture with top-down UVs ({0,0}-{1,1}), which is correct for the legacy CPU effects (top-down pixel uploads). But the generated/shader effects come from fx_apply, whose output is a bottom-up GL FBO — so those thumbnails rendered vertically mirrored, and any directional effect ran the wrong way (motion going up when it should go down). The composited canvas was always correct; only the preview was flipped. Flip V during the blit into the dedicated per-effect preview texture so the shader previews match the CPU ones (and the upright source). Verified: the preview source image, previously inverted, now renders right-side-up.

extract_audio_start remuxed the source into a .webm via video_extract_segment, which (a) copied the VIDEO stream too and (b) used a WebM container. WebM only holds VP8/VP9/AV1 + Vorbis/Opus, so any ordinary H.264/AAC clip failed at avformat_write_header — the worker returned an error, extract_done stayed false, and no audio track was ever added. Looked like the menu item did nothing. Rip now extracts AUDIO ONLY into a .mka (Matroska audio), which stream-copies essentially any audio codec with no re-encode or quality loss. Added an audio_only flag to video_extract_segment (default false, so segment extraction is unchanged) that drops the video stream, and exposed it on the extract_clip_segment IPC. Verified end-to-end through the real libav path: ripping a 30s window of an H.264/AAC lyric video produces a valid AAC-only .mka (no video stream, probes clean) instead of erroring.

1) "Extract Subtitles" was wired to apply_subtitle_mode (the LYRICS builder, copy-pasted from the Extract Lyrics button), so apply_subtitle_pipeline — the function that actually creates a Subtitles track of ClipType::Subtitle clips from the segments JSON — had zero callers and the subtitle track never appeared. Point the button at apply_subtitle_pipeline. 2) Text overlays belong on their own tracks (they render on top regardless of track), but IPC add_clip happily stacked a Text/Subtitle/Lyrics clip onto a content clip on the same row. add_clip now refuses any text-vs-content overlap on a track and returns an error pointing to a separate track. The human drag path already enforced this via clips_conflict; this closes the agent/IPC hole.

…nal take Export mixes a base audio track from state.audio_path whenever no clip "covers" that exact path. When you rip a video's audio and rebuild it as converted-voice segments (which reference the ripped .mka, not the source .MOV) and mute the source video, state.audio_path still points at the .MOV — uncovered — so the fallback summed the entire original take at full volume UNDER every converted segment. Hence "all the voices on top of each other," and muting the source did nothing because the fallback bypassed it. Now the fallback checks whether any clip bearing state.audio_path is muted (its clip or its track); if so the user silenced it deliberately, so the fallback is skipped. Lyric-video and plain audio workflows are unchanged (their audio clip covers the path, or nothing's muted).

Timeline placement in the GL export used adelay = (clip.start - in_point). But every audio stream is reset to pts 0 by asetpts=PTS-STARTPTS in the filter graph, so placement is purely adelay — which must be the FULL clip start, not start minus in_point (that subtraction was leftover reasoning for a pre-asetpts seek behaviour). The Record-take path already used adelay = clip.start correctly. For sequential slices of one source — e.g. converted-voice segments where each clip's in_point equals its timeline start — start - in_point collapsed to 0, so every segment got adelay=0 and they all fired simultaneously at the beginning instead of playing in sequence. Use adelay = clip.start.

The Clip's voice-conversion RESULT (vc_status / vc_out_path / vc_model_used) was never serialized — only the FX settings were. So on reload vc_status reset to Idle and vc_out_path went empty, and since playback/export only substitute the converted audio when vc_status == Ready, every converted clip silently reverted to its original take. Persist the result (project v52); a Processing status settles to Ready if the output file still exists, else Idle so it can re-run. Progress/error stay transient.

Clicking an effect inside a welded Audio Multi-FX brick showed only a condensed inline slider editor — and for voice-convert it showed nothing at all (the chain-entry editor had no AudioVoiceConvert case), so you couldn't pick a model, re-convert, or transpose without decoupling the brick back to standalone. Extracted the standalone brick's full settings (params + presets + the voice-convert model picker / HF search / conversion progress / transpose + dry/wet) into a shared audio_fx_settings_ui(fx_type, afx, ti, fx_start, fx_end). The standalone panel and the welded-chain selected-entry editor now both call it, so a welded effect gets the exact same page as a standalone one. The VC host is the audio clip the chain rides on (its track + the brick span). The condensed audio_chain_entry_params_ui stays for the bus chain's inline editor.

Two distinct bugs in the export audio filtergraph, both surfaced by a project mixing the source .mka audio with delayed converted-voice clips. 1. Export froze at ~3% (frame 46). Input-seeked matroska/AAC sources have encoder priming that leaves a NOPTS packet at end-of-stream. A lone aac encoder tolerates it, but once such a stream passes through adelay and into amix the NOPTS propagates to the mix output; the mp4 muxer then sees a non-monotonic dts (AV_NOPTS_VALUE) and aborts. ffmpeg dies, the render pipe breaks (EPIPE), and the GL export loop stalls. This was newly triggered by 68625a6: that commit correctly switched adelay from (start - in_point), which clamped to ~0 for sequential slices, to the full clip.start — so these streams get a real adelay for the first time, exposing the NOPTS path. Fix: append asetpts=N/SR/TB to each processed stream chain, regenerating pts from the sample count (pts = sample_index / sample_rate) so the chain is strictly monotonic with no NOPTS flush packet. 2. Exports came out ~10 dB quieter than the project preview. The preview mixer (audio.cpp mix_master) sums clips additively and hard-clamps, but amix defaults to normalize=1, which divides the mix by the input count — and by the *active* count, so the attenuation drifts as clips start/end. Fix: amix ...:normalize=0 on both export paths (GL/VAAPI and the libx264 filter-script path) so the export sums like the preview and the encoder clamps at 0 dBFS the same way. Both verified against the exact ffmpeg command the app builds for the affected project: it died at frame 5 before, exports a clean 44.17s file (valid h264 + aac) after, at matching loudness (peak -8.3 dB vs -17.9 dB).

…g into the Typography tab Reworks text styling so the Typography tab is the single surface for managed lyric/subtitle tracks, while the Clip tab stays the per-clip (keyframable) editor for standalone Text bricks. Adds a per-field "hold" (pin) so adjusting a preset and switching to another keeps the pinned tweaks. Tweak/hold store (app.h TypoTweaks): - Generalises the three ad-hoc globals (typo_font_size/typo_color/typo_case) into a per-field value + active + held bitset. Editing a control marks the field active; the Hold pin marks it held; switching presets keeps only held fields (keep_held). apply_typo_style reads each field as held/tweaked ? tweak : preset value. - Covers font size, color, letter case, alignment (NEW — the renderer already honored sub_anchor_h but nothing exposed it), tracking, wrap, vertical position, X/Y offset, fade in/out, and text style (shadow/stroke/glow/box). Fade is applied only when set so existing per-clip fades aren't wiped. Typography panel: - Tune/Advanced controls now sit on top; the preset grid moved into a collapsed-by-default "Browse presets · <active>" disclosure. - section_fade / section_text_style refactored to take the field by reference and return whether they were edited, then shared via panel_clip.h so the Clip tab (Text) and the Typography tab (subtitle/lyric, track-wide) render the exact same controls. Track-level styling: - typo_restyle_live restyles every clip sharing the selected clip's type and source, so a tweak lands on the whole lyric/subtitle track at once. Only a standalone Text brick styles in isolation. - The Clip tab hides Position/Color/Fade/Text Style for Lyrics/Subtitle (with a pointer to the Typography tab) so per-line edits can't desync the track; it keeps the keyframable sections for standalone Text. Persistence: project format v53/v54 persists the active preset + the full tweak/hold store (older projects load unchanged via version guards).

…s only)" The clip context-menu item kicked the transcription pipeline but never set state.pipeline_on_done, so when transcription finished nothing populated the timeline — the process "ran" but no Subtitles track appeared. Wire it to apply_subtitle_pipeline (ClipType::Subtitle on a "Subtitles" track), matching the Extract Subtitles button, which had already been fixed the same way. Make lyric video already set pipeline_on_done = generate_typography, which is why that path worked and this one didn't.

epsilver added 14 commits June 16, 2026 15:38

verticalrectangle merged commit 59d9ddd into main Jun 20, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge dev: typography styling system, audio FX, camera A/V sync, and export fixes#6

Merge dev: typography styling system, audio FX, camera A/V sync, and export fixes#6
verticalrectangle merged 14 commits into
mainfrom
dev

epsilver commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

epsilver commented Jun 19, 2026

Typography

Audio / FX

Camera / video recording

Export & fidelity fixes

Subtitles / text

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants