Align audio hash params with Haitsma-Kalker reference (breaking)#54
Merged
Conversation
Audit \xc2\xa74: the bit derivation, filterbank shape, and confidence score in ph_audiohash already matched Haitsma-Kalker 2002, but several parameters did not: - frame_length was hard-coded to 4096 samples regardless of sample rate. At the example's sr=8000 that is 512 ms, far from the paper's 0.37 s frame. Now derived as the power of 2 closest to sr * 0.37. - frame advance was frame_length / 32 (97% overlap) giving ~62 frames/s at sr=8000. The paper specifies 31.25 frames/s (~32 ms advance). Now derived as round(sr / 31.25). - maxfreq was 3000 Hz; the paper specifies 2000 Hz. The previous range extended the upper band by 1 kHz beyond Haitsma-Kalker. Now 2000 Hz. nfft_half is now frame_length / 2 (was hard-coded 2048). Bit derivation and filter weights are unchanged. Note: this changes the temporal density of the fingerprint. Callers passing block_size to ph_audio_distance_ber will likely want a smaller value (e.g. 64 instead of the example's 256) because the absolute frame count for the same audio drops by ~2x. Hashes produced by this code are not compatible with hashes produced by the old parameters.
This was referenced May 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Aligns the time-domain and frequency-domain parameters of
ph_audiohashwith Haitsma-Kalker 2002 ("A Highly Robust Audio Fingerprinting System"). Addresses §4 of the algorithmic-correctness assessment.What was wrong
The bit derivation, filterbank shape, and confidence score in
ph_audiohashalready matched Haitsma-Kalker, but several parameters did not:frame_lengthwas hard-coded to 4096 samples regardless of sample rate. At the example'ssr=8000that is 512 ms, far from the paper's 0.37 s frame.frame_length / 32(~97% overlap), giving ~62 frames/sec atsr=8000. The paper specifies 31.25 frames/sec (~32 ms advance).maxfreqwas 3000 Hz; the paper specifies 2000 Hz. The previous range extended the upper band by 1 kHz beyond Haitsma-Kalker.What this PR does
frame_lengthis now derived as the power of 2 closest tosr * 0.37(the radix-2 FFT requires a power of 2).round(sr / 31.25).maxfreqis now 2000 Hz.nfft_halfis nowframe_length / 2(was hard-coded2048).The bit derivation and filter weights are unchanged.
Compatibility
Binary-incompatible hash change. Hashes produced by this code are not compatible with hashes produced by the old parameters.
This also changes the temporal density of the fingerprint. Callers passing
block_sizetoph_audio_distance_berwill likely want a smaller value (e.g. 64 instead of the example's 256) because the absolute frame count for the same audio drops by ~2× — the example value of 256 will produce M=0 blocks and acs = 0.5(neutral) confidence on short signals.Verification
block_size = 64:cs = 1.000000block_size = 64:cs = 0.251(low similarity, as expected)Test plan
-Wall) withHAVE_AUDIO_HASH+HAVE_LIBMPG123test_audiophash -f a.wav -g a.wav -b 64: confidence 1.0test_audiophash -f 440.wav -g 441.wav -b 64: confidence ~0.25