Skip to content

Fix audio hash leaks, threading, and resource bugs#49

Merged
aetilius merged 3 commits into
masterfrom
fix/audio-hash
May 25, 2026
Merged

Fix audio hash leaks, threading, and resource bugs#49
aetilius merged 3 commits into
masterfrom
fix/audio-hash

Conversation

@eklinger

Copy link
Copy Markdown
Collaborator

Fixes the cluster of audio-path bugs identified in the audit: findings #7, #8, #9, #13, #14, #19, #20, #21, #29.

Correctness

  • example for video test_dctvideohash.cpp not compiling #7 / Fixed video hashing and applied consistent formatting #8 Removed the entire HAVE_PTHREAD block (ph_audio_thread, ph_audio_hashes). It referenced types slice and DP and a function ph_num_threads() that were never declared anywhere in the public headers (so the block could not actually be compiled), the worker function had no return statement (UB on the void* return), and the slicing math walked one element past the input (count=10, threads=3 gave 5+3+3=11). Callers should drive ph_readaudio / ph_audiohash from their own thread pool.
  • Build instructions need updating #9 ph_readaudio2 now reads src_data.output_frames_gen (the count libsamplerate actually produced) instead of output_frames (the caller-supplied capacity), eliminating the downstream over-read.
  • ph_audiohash: corrected the nb_frames formula. The old expression could go negative for short inputs and was then passed to malloc.

Memory and safety

  • Running make fails when building examples/test_dctvideohash.cpp #13 readaudio_mp3: every error path now closes/deletes the mpg123_handle and calls mpg123_exit. Adds NULL checks on the malloc'd output, validates channels and samplerate (channels==0 would have produced an infinite loop), clamps the per-encoding inner loop so a decode that returns fewer bytes than channels cannot over-read the decode buffer.
  • Using pHash data from an older version of pHash w/the current version #14 readaudio_snd: validates channels/samplerate/frames, checks every malloc, and frees inbuf on read failure. ph_audiohash's malloc is now checked and frame-length-sized buffers are std::vectors (no more VLAs).
  • Extend pHash interface with ph_mh_imagehash_from_buffer (#1) #20 / bindings / PHP build #21 Replaced double window[4096], double frame[4096], double magnF[2048], freqs/binbarks VLAs, and the per-filter new[] grid with std::vector. ph_audio_distance_ber preallocates dist once outside the per-lag loop instead of realloc'ing on every iteration; the worst-case M is just N1/block_size.
  • ph_compare_blocks now returns 1.0 (worst distance) rather than dividing by zero when block_size <= 0.
  • ph_audio_distance_ber and ph_audiohash zero their out parameters on every failure path; the previous code left Nc / nb_frames uninitialised when allocation failed.

Performance

Header hygiene (#29)

  • ph_fft.h includes <complex> (C++) instead of <complex.h> (C99) and drops the using namespace std; that was being injected into every translation unit that included it. fft() takes std::complex<double>*.
  • fft() now validates that N is a power of 2 (the radix-2 recursion silently misbehaves otherwise).

Test plan

  • Compiles clean (-Wall) with HAVE_AUDIO_HASH + HAVE_LIBMPG123
  • FFT unit test: fft({1,4,3,2}) and fft({3,5,9,2}) produce the hand-computed DFT outputs to 1e-7; fft(N=3) is now rejected
  • test_audiophash on identical 5-second sines: confidence 1.000000
  • test_audiophash on 440 Hz vs 441 Hz: confidence 0.249 (low, as expected)

pHash Audit added 2 commits May 25, 2026 11:03
Addresses audit findings #7, #8, #9, #13, #14, #19, #20, #21, #29.

Correctness:
- #7/#8 Removed the entire HAVE_PTHREAD block (ph_audio_thread,
  ph_audio_hashes). It referenced types `slice` and `DP` and a
  function `ph_num_threads()` that were never declared anywhere in
  the public headers (so the block could not actually be compiled),
  the worker function had no return statement (UB on the void*
  return), and the slicing math walked one element past the input
  (count=10, threads=3 gave 5+3+3=11). Callers should drive
  ph_readaudio / ph_audiohash from their own thread pool.
- #9 ph_readaudio2 now reads src_data.output_frames_gen (the count
  libsamplerate actually produced) instead of output_frames (the
  caller-supplied capacity), eliminating the downstream over-read.
- ph_audiohash: corrected the nb_frames formula. The old expression
  could go negative for short inputs and was then passed to malloc.

Memory and safety:
- #13 readaudio_mp3: every error path now closes/deletes the
  mpg123_handle and calls mpg123_exit. Adds NULL checks on the
  malloc'd output, validates channels and samplerate (channels==0
  would have produced an infinite loop), clamps the per-encoding
  inner loop so a decode that returns fewer bytes than `channels`
  cannot over-read the decode buffer.
- #14 readaudio_snd: validates channels/samplerate/frames, checks
  every malloc, and frees inbuf on read failure. ph_audiohash's
  malloc is now checked and frame_length-sized buffers are
  std::vectors (no more VLAs).
- #20/#21 Replaced double window[4096], double frame[4096], double
  magnF[2048], freqs/binbarks VLAs, and the per-filter new[] grid
  with std::vector. ph_audio_distance_ber preallocates `dist` once
  outside the per-lag loop instead of realloc'ing on every
  iteration; the worst-case M is just N1/block_size.
- ph_compare_blocks now returns 1.0 (worst distance) rather than
  dividing by zero when block_size <= 0.
- ph_audio_distance_ber and ph_audiohash zero their out parameters
  on every failure path; the previous code left Nc / nb_frames
  uninitialised when allocation failed.

Performance:
- #19 ph_bitcount uses __builtin_popcount, matching how
  ph_hamming_distance does its bit counting.

Header hygiene (#29):
- ph_fft.h includes <complex> (C++) instead of <complex.h> (C99)
  and drops the `using namespace std;` that was being injected
  into every translation unit that included it. fft() takes
  std::complex<double>*.
- fft() now validates that N is a power of 2 (the radix-2
  recursion silently misbehaves otherwise).
Audit \xc2\xa74: the bit derivation, filterbank shape, and confidence
score in ph_audiohash already matched Haitsma-Kalker 2002, but several
parameters did not:

- frame_length was hard-coded to 4096 samples regardless of sample
  rate. At the example's sr=8000 that is 512 ms, far from the paper's
  0.37 s frame. Now derived as the power of 2 closest to sr * 0.37.

- frame advance was frame_length / 32 (97% overlap) giving ~62 frames/s
  at sr=8000. The paper specifies 31.25 frames/s (~32 ms advance).
  Now derived as round(sr / 31.25).

- maxfreq was 3000 Hz; the paper specifies 2000 Hz. The previous range
  extended the upper band by 1 kHz beyond Haitsma-Kalker. Now 2000 Hz.

nfft_half is now frame_length / 2 (was hard-coded 2048). Bit derivation
and filter weights are unchanged.

Note: this changes the temporal density of the fingerprint. Callers
passing block_size to ph_audio_distance_ber will likely want a smaller
value (e.g. 64 instead of the example's 256) because the absolute
frame count for the same audio drops by ~2x. Hashes produced by this
code are not compatible with hashes produced by the old parameters.
Align audio hash params with Haitsma-Kalker reference (breaking)
@aetilius aetilius merged commit 86c5655 into master May 25, 2026
@aetilius aetilius deleted the fix/audio-hash branch May 25, 2026 21:44
@aetilius aetilius mentioned this pull request May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants