Static function data structures associate a static set of keys with values while allowing arbitrary output values for queries involving keys outside the set. This enables them to use significantly less memory. Several techniques are known, with compressed static functions approaching the zero-order empirical entropy of the value sequence. Learned static functions use machine learning to capture correlations between keys and values. For each key, a model predicts a probability distribution over the values, from which we derive a key-specific prefix code to compactly encode the true value. The resulting codeword is stored in a classic static function data structure. This design allows learned static functions to break the zero-order entropy barrier while still supporting point queries.
In this repository, we give the first implementation of a learned static function based on a modified version of BuRR that we also provide. You can find the original here: Code, paper.
trainfolder: Code for training the ML model using TensorFlow- BuRR-VLR repository: Implementation of our variable-length BuRR adaption
includefolder: Implementation of our learned data structurecsffolder: Benchmark code for the GOV competitor
For convenience, we provide a Docker image that can be used to reproduce our experiments. To run it, clone our repo and run Docker (as superuser).
git clone --recursive https://github.com/gvinciguerra/LearnedStaticFunction.git
docker build --pull --rm -t lsf .
docker run -it -v $(pwd)/lrdata:/lrdata -v $(pwd)/data_sux4j:/data_sux4j -v $(pwd)/out:/out lsfIt will run the training and the benchmarks for all competitors.
It will automatically generate the paper based on those results.
The paper can be found in /out/main.pdf together with other raw benchmark outputs.
Note that the runtime is several hours.
This code is licensed under the GPLv3.