Skip to content

karnstack/byox-bloom-filter-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

karnstack/byox-bloom-filter-python

Starter template (Python) for the Bloom Filter primitive on karnstack.

Six stages. Paper-backed tests. You implement the interface; karnstack tells you what to read at each stage.

Prerequisites

mise is the only thing you need installed globally. It pins Python 3.14 for this repo, creates a virtual environment, and runs the stage tasks. If you do not want to install mise, the equivalent commands are documented under Without mise below.

Install mise:

curl https://mise.run | sh

Quick start

mise trust              # allow this repo's .mise.toml (one time)
mise install            # installs Python 3.14 + creates .venv
mise run setup          # installs the bloom package + pytest into .venv
mise run stage 1        # runs the tests for stage 1 (they fail until you implement)

Open stage 1 on karnstack. Implement bloom/__init__.py until mise run stage 1 passes. Then move on:

mise run stage 2

mise run all runs every stage at once.

Layout

.
├── .mise.toml                          # toolchain + tasks
├── pyproject.toml
├── bloom/
│   └── __init__.py                     # you implement here
└── tests/
    ├── test_stage01_bit_array.py
    ├── test_stage02_multi_hash.py
    ├── test_stage03_sizing.py
    ├── test_stage04_blocked.py
    ├── test_stage05_concurrent.py
    └── test_stage06_serialize.py

Stages

  1. Bit array and single hash
  2. Multiple hashes (Kirsch-Mitzenmacher)
  3. Optimal sizing math
  4. Cache-line-blocked layout
  5. Concurrent-safe Add
  6. Serialize and saturation

Each stage is described on karnstack. Read first, then implement.

What you are building

A constant-size data structure that says "definitely not in the set" or "maybe in the set" in O(k) time, with a tunable false-positive rate. The structure inside every production LSM-tree (RocksDB, LevelDB, Cassandra) used to skip disk reads on missing keys.

Papers cited

Without mise

If you do not want to install mise, ensure you have Python 3.14+ installed and run:

python -m venv .venv
source .venv/bin/activate         # or .venv\Scripts\activate on Windows
pip install -e '.[test]'

# Stage 1
pytest -v -k stage01 tests/

# Stage N (replace 01 with the zero-padded stage number)
pytest -v -k stageNN tests/

# All stages
pytest -v tests/

License

MIT. See LICENSE. Your fork is yours.

About

byox bloom filter (python) - karnstack template. fork, implement each stage, push, get a verified credential.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors