Skip to content

feat: add BloomFilter data structure#88

Open
chenmiaoming wants to merge 1 commit into
jsrivaya:mainfrom
chenmiaoming:main
Open

feat: add BloomFilter data structure#88
chenmiaoming wants to merge 1 commit into
jsrivaya:mainfrom
chenmiaoming:main

Conversation

@chenmiaoming
Copy link
Copy Markdown

Description

This PR implements issue #7 by adding a BloomFilter data structure for space-efficient probabilistic membership checks.

Changes included

  • Added BloomFilter with configurable number of hash functions.
  • Added support for custom primary and secondary hash algorithms.
  • Added with_capacity(expected_items, target_fpr) factory for parameter tuning.
  • Added insert, contains, clear, false_positive_rate, hash_functions, size, and bit_count APIs.
  • Added BloomFilter unit tests and wired them into the existing CMake test target.
  • Updated documentation navigation and user-facing docs.
  • Added a dedicated Bloom Filter documentation page.

Validation performed

cmake -S . -B build -DLOON_BUILD_TESTS=ON
cmake --build build
ctest --test-dir build --output-on-failure

Result

  • 42 tests passed, 0 failed.

Related Issue

Closes #7

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Performance improvement
  • Code refactoring (no functional changes)

Checklist

  • My code follows the project's coding standards
  • I have run make check-format and fixed any issues
  • I have added tests that prove my fix/feature works
  • All new and existing tests pass (make build)
  • I have updated documentation if needed
  • My changes generate no new warnings

Performance Impact

N/A (no dedicated benchmark or performance regression measurements were added in this PR).

Additional Notes

  • Bloom filters guarantee no false negatives and allow false positives by design.
  • The false positive rate is estimated from inserted item count, bit size, and hash function count.
  • Existing warnings observed during build are in test/test_ring_buffer.cpp and were not introduced by this PR.
  • The estimation follows the standard approximation:

$$ FPR \approx (1 - e^{-kn/m})^k $$

where:

  • $n$ is inserted item count,
  • $m$ is bit count,
  • $k$ is hash function count.

Implement issue jsrivaya#7 by adding a fixed-size BloomFilter with configurable hash count and custom hash algorithms.

Add with_capacity(expected_items, target_fpr), false positive rate estimation, tests, and docs updates.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Bloom Filter data structure

1 participant