Fix audio hash leaks, threading, and resource bugs#49
Merged
Conversation
added 2 commits
May 25, 2026 11:03
Addresses audit findings #7, #8, #9, #13, #14, #19, #20, #21, #29. Correctness: - #7/#8 Removed the entire HAVE_PTHREAD block (ph_audio_thread, ph_audio_hashes). It referenced types `slice` and `DP` and a function `ph_num_threads()` that were never declared anywhere in the public headers (so the block could not actually be compiled), the worker function had no return statement (UB on the void* return), and the slicing math walked one element past the input (count=10, threads=3 gave 5+3+3=11). Callers should drive ph_readaudio / ph_audiohash from their own thread pool. - #9 ph_readaudio2 now reads src_data.output_frames_gen (the count libsamplerate actually produced) instead of output_frames (the caller-supplied capacity), eliminating the downstream over-read. - ph_audiohash: corrected the nb_frames formula. The old expression could go negative for short inputs and was then passed to malloc. Memory and safety: - #13 readaudio_mp3: every error path now closes/deletes the mpg123_handle and calls mpg123_exit. Adds NULL checks on the malloc'd output, validates channels and samplerate (channels==0 would have produced an infinite loop), clamps the per-encoding inner loop so a decode that returns fewer bytes than `channels` cannot over-read the decode buffer. - #14 readaudio_snd: validates channels/samplerate/frames, checks every malloc, and frees inbuf on read failure. ph_audiohash's malloc is now checked and frame_length-sized buffers are std::vectors (no more VLAs). - #20/#21 Replaced double window[4096], double frame[4096], double magnF[2048], freqs/binbarks VLAs, and the per-filter new[] grid with std::vector. ph_audio_distance_ber preallocates `dist` once outside the per-lag loop instead of realloc'ing on every iteration; the worst-case M is just N1/block_size. - ph_compare_blocks now returns 1.0 (worst distance) rather than dividing by zero when block_size <= 0. - ph_audio_distance_ber and ph_audiohash zero their out parameters on every failure path; the previous code left Nc / nb_frames uninitialised when allocation failed. Performance: - #19 ph_bitcount uses __builtin_popcount, matching how ph_hamming_distance does its bit counting. Header hygiene (#29): - ph_fft.h includes <complex> (C++) instead of <complex.h> (C99) and drops the `using namespace std;` that was being injected into every translation unit that included it. fft() takes std::complex<double>*. - fft() now validates that N is a power of 2 (the radix-2 recursion silently misbehaves otherwise).
Audit \xc2\xa74: the bit derivation, filterbank shape, and confidence score in ph_audiohash already matched Haitsma-Kalker 2002, but several parameters did not: - frame_length was hard-coded to 4096 samples regardless of sample rate. At the example's sr=8000 that is 512 ms, far from the paper's 0.37 s frame. Now derived as the power of 2 closest to sr * 0.37. - frame advance was frame_length / 32 (97% overlap) giving ~62 frames/s at sr=8000. The paper specifies 31.25 frames/s (~32 ms advance). Now derived as round(sr / 31.25). - maxfreq was 3000 Hz; the paper specifies 2000 Hz. The previous range extended the upper band by 1 kHz beyond Haitsma-Kalker. Now 2000 Hz. nfft_half is now frame_length / 2 (was hard-coded 2048). Bit derivation and filter weights are unchanged. Note: this changes the temporal density of the fingerprint. Callers passing block_size to ph_audio_distance_ber will likely want a smaller value (e.g. 64 instead of the example's 256) because the absolute frame count for the same audio drops by ~2x. Hashes produced by this code are not compatible with hashes produced by the old parameters.
3 tasks
Align audio hash params with Haitsma-Kalker reference (breaking)
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes the cluster of audio-path bugs identified in the audit: findings #7, #8, #9, #13, #14, #19, #20, #21, #29.
Correctness
HAVE_PTHREADblock (ph_audio_thread,ph_audio_hashes). It referenced typessliceandDPand a functionph_num_threads()that were never declared anywhere in the public headers (so the block could not actually be compiled), the worker function had no return statement (UB on thevoid*return), and the slicing math walked one element past the input (count=10, threads=3gave5+3+3=11). Callers should driveph_readaudio/ph_audiohashfrom their own thread pool.ph_readaudio2now readssrc_data.output_frames_gen(the count libsamplerate actually produced) instead ofoutput_frames(the caller-supplied capacity), eliminating the downstream over-read.ph_audiohash: corrected thenb_framesformula. The old expression could go negative for short inputs and was then passed tomalloc.Memory and safety
readaudio_mp3: every error path now closes/deletes thempg123_handleand callsmpg123_exit. Adds NULL checks on the malloc'd output, validateschannelsandsamplerate(channels==0would have produced an infinite loop), clamps the per-encoding inner loop so a decode that returns fewer bytes thanchannelscannot over-read the decode buffer.readaudio_snd: validates channels/samplerate/frames, checks everymalloc, and freesinbufon read failure.ph_audiohash's malloc is now checked and frame-length-sized buffers arestd::vectors (no more VLAs).double window[4096],double frame[4096],double magnF[2048],freqs/binbarksVLAs, and the per-filternew[]grid withstd::vector.ph_audio_distance_berpreallocatesdistonce outside the per-lag loop instead ofrealloc'ing on every iteration; the worst-caseMis justN1/block_size.ph_compare_blocksnow returns1.0(worst distance) rather than dividing by zero whenblock_size <= 0.ph_audio_distance_berandph_audiohashzero their out parameters on every failure path; the previous code leftNc/nb_framesuninitialised when allocation failed.Performance
ph_bitcountuses__builtin_popcount, matching howph_hamming_distancedoes its bit counting.Header hygiene (#29)
ph_fft.hincludes<complex>(C++) instead of<complex.h>(C99) and drops theusing namespace std;that was being injected into every translation unit that included it.fft()takesstd::complex<double>*.fft()now validates thatNis a power of 2 (the radix-2 recursion silently misbehaves otherwise).Test plan
-Wall) withHAVE_AUDIO_HASH+HAVE_LIBMPG123fft({1,4,3,2})andfft({3,5,9,2})produce the hand-computed DFT outputs to 1e-7;fft(N=3)is now rejectedtest_audiophashon identical 5-second sines: confidence1.000000test_audiophashon 440 Hz vs 441 Hz: confidence0.249(low, as expected)