ProbStructs as easy to use C++ library with probabilistic structures.
Full documentation is available at http://probstructs.readthedocs.io/en/latest/
- CountMinSketch - frequency table of events in a stream
- ExponentialHistorgram - frequency of specific event in the last N elements from a stream
- ExponentialCountMinSketch - frequency table of events in the last N elements from a stream
- Hash - hashing function
using namespace probstructs;
ExponentialCountMinSketch<int> sketch(100, 4, 8);
uint32_t ts = 0;
ts = 0;
sketch.inc("aaa", ts, 1);
sketch.inc(std::string("bbb"), ts, 4);
sketch.inc("ccc", ts, 8);
std::cerr << sketch.get(std::string("aaa"), 4, ts) << std::endl;
// 1
std::cerr << sketch.get("bbb", 4, ts) << std::endl;
// 4
std::cerr << sketch.get("ccc", 4, ts) << std::endl;
// 8
std::cerr << sketch.get("ddd", 4, ts) << std::endl;
// 0
ts = 4;
std::cerr << sketch.get("aaa", 2, ts) << std::endl;
// 0
std::cerr << sketch.get("bbb", 2, ts) << std::endl;
// 0
std::cerr << sketch.get(std::string("ccc"), 2, ts) << std::endl;
// 0
std::cerr << sketch.get("ddd", 2, ts) << std::endl;
// 0
std::cerr << sketch.get("aaa", 8, ts) << std::endl;
// 1
std::cerr << sketch.get("bbb", 8, ts) << std::endl;
// 4
std::cerr << sketch.get("ccc", 8, ts) << std::endl;
// 8
std::cerr << sketch.get("ddd", 8, ts) << std::endl;
// 0Prerequisites: CMake 3.11+, Doxygen, Graphviz, and the Python
packages listed in docs/requirements.txt (breathe and
sphinx-rtd-theme).
macOS:
brew install cmake doxygen graphviz
pip install -r docs/requirements.txtUbuntu/Debian:
sudo apt-get install cmake doxygen graphviz
pip install -r docs/requirements.txtThen build:
make docs-buildThe generated HTML lands in _docs/docs/sphinx/.
Build and run the benchmark suite (requires CMake 3.11+ and a C++17 compiler;
install on macOS with brew install cmake):
make bench-build # fetches Google Benchmark, compiles
make bench-run # runs and saves results to benchmark_results/local/<timestamp>.json
make bench-compare # compares the two most-recent local result filesResults are stored in benchmark_results/local/ (gitignored, per-machine)
so local runs never pollute the repo. CI results are committed to
benchmark_results/ci/ and used as the regression baseline for pull requests.
See the full benchmark documentation for details on filtering, repeating runs, and comparing specific result files.