Skip to content

Align audio hash params with Haitsma-Kalker reference (breaking)#54

Merged
aetilius merged 1 commit into
fix/audio-hashfrom
algo/audio-haitsma-kalker
May 25, 2026
Merged

Align audio hash params with Haitsma-Kalker reference (breaking)#54
aetilius merged 1 commit into
fix/audio-hashfrom
algo/audio-haitsma-kalker

Conversation

@eklinger

Copy link
Copy Markdown
Collaborator

Aligns the time-domain and frequency-domain parameters of ph_audiohash with Haitsma-Kalker 2002 ("A Highly Robust Audio Fingerprinting System"). Addresses §4 of the algorithmic-correctness assessment.

Stacked on top of #49 (fix/audio-hash). Merge #49 first, or re-target this PR at master after #49 lands.

What was wrong

The bit derivation, filterbank shape, and confidence score in ph_audiohash already matched Haitsma-Kalker, but several parameters did not:

  • frame_length was hard-coded to 4096 samples regardless of sample rate. At the example's sr=8000 that is 512 ms, far from the paper's 0.37 s frame.
  • Frame advance was frame_length / 32 (~97% overlap), giving ~62 frames/sec at sr=8000. The paper specifies 31.25 frames/sec (~32 ms advance).
  • maxfreq was 3000 Hz; the paper specifies 2000 Hz. The previous range extended the upper band by 1 kHz beyond Haitsma-Kalker.

What this PR does

  • frame_length is now derived as the power of 2 closest to sr * 0.37 (the radix-2 FFT requires a power of 2).
  • Advance is now round(sr / 31.25).
  • maxfreq is now 2000 Hz.
  • nfft_half is now frame_length / 2 (was hard-coded 2048).

The bit derivation and filter weights are unchanged.

Compatibility

Binary-incompatible hash change. Hashes produced by this code are not compatible with hashes produced by the old parameters.

This also changes the temporal density of the fingerprint. Callers passing block_size to ph_audio_distance_ber will likely want a smaller value (e.g. 64 instead of the example's 256) because the absolute frame count for the same audio drops by ~2× — the example value of 256 will produce M=0 blocks and a cs = 0.5 (neutral) confidence on short signals.

Verification

  • Identical 5 s 440 Hz sines, block_size = 64: cs = 1.000000
  • 440 Hz vs 441 Hz, block_size = 64: cs = 0.251 (low similarity, as expected)

Test plan

  • Compiles clean (-Wall) with HAVE_AUDIO_HASH + HAVE_LIBMPG123
  • test_audiophash -f a.wav -g a.wav -b 64: confidence 1.0
  • test_audiophash -f 440.wav -g 441.wav -b 64: confidence ~0.25

Audit \xc2\xa74: the bit derivation, filterbank shape, and confidence
score in ph_audiohash already matched Haitsma-Kalker 2002, but several
parameters did not:

- frame_length was hard-coded to 4096 samples regardless of sample
  rate. At the example's sr=8000 that is 512 ms, far from the paper's
  0.37 s frame. Now derived as the power of 2 closest to sr * 0.37.

- frame advance was frame_length / 32 (97% overlap) giving ~62 frames/s
  at sr=8000. The paper specifies 31.25 frames/s (~32 ms advance).
  Now derived as round(sr / 31.25).

- maxfreq was 3000 Hz; the paper specifies 2000 Hz. The previous range
  extended the upper band by 1 kHz beyond Haitsma-Kalker. Now 2000 Hz.

nfft_half is now frame_length / 2 (was hard-coded 2048). Bit derivation
and filter weights are unchanged.

Note: this changes the temporal density of the fingerprint. Callers
passing block_size to ph_audio_distance_ber will likely want a smaller
value (e.g. 64 instead of the example's 256) because the absolute
frame count for the same audio drops by ~2x. Hashes produced by this
code are not compatible with hashes produced by the old parameters.
@aetilius aetilius merged commit 1151960 into fix/audio-hash May 25, 2026
@aetilius aetilius deleted the algo/audio-haitsma-kalker branch May 25, 2026 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants