-
Notifications
You must be signed in to change notification settings - Fork 19
expireover: Add bloom filter for fast history existence checks #339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| /* | ||
| ** Bloom filter for fast set membership testing. | ||
| ** | ||
| ** A space-efficient probabilistic data structure that can test whether | ||
| ** an element is a member of a set. False positive matches are possible, | ||
| ** but false negatives are not: a query returns either "possibly in set" | ||
| ** or "definitely not in set." | ||
| ** | ||
| ** Uses enhanced double hashing (Kirsch & Mitzenmacher 2006) to derive | ||
| ** multiple hash positions from a single HASH value. | ||
| */ | ||
|
|
||
| #ifndef INN_BLOOM_H | ||
| #define INN_BLOOM_H | ||
|
|
||
| #include "inn/libinn.h" | ||
| #include "inn/portable-macros.h" | ||
| #include "inn/portable-stdbool.h" | ||
|
|
||
| #include <stddef.h> | ||
|
|
||
| BEGIN_DECLS | ||
|
|
||
| /* The layout of this struct is entirely internal to the implementation. */ | ||
| struct bloom_filter; | ||
|
|
||
| /* | ||
| ** Create a new bloom filter sized for the given number of estimated entries | ||
| ** and false positive rate expressed as a reciprocal (e.g., 10000 means | ||
| ** 1-in-10,000 or 0.01% false positive rate). Uses xmalloc internally, | ||
| ** so dies on allocation failure. | ||
| */ | ||
| struct bloom_filter *bloom_create(size_t estimated_entries, unsigned long fp_inv); | ||
|
|
||
| /* | ||
| ** Add a HASH to the bloom filter. | ||
| */ | ||
| void bloom_add(struct bloom_filter *bf, const HASH *hash); | ||
|
|
||
| /* | ||
| ** Check whether a HASH is possibly in the bloom filter. Returns true if | ||
| ** the element is probably in the set (with false positive rate as configured), | ||
| ** or false if the element is definitely not in the set. | ||
| */ | ||
| bool bloom_check(const struct bloom_filter *bf, const HASH *hash); | ||
|
|
||
| /* | ||
| ** Free a bloom filter and all associated memory. Safe to call with NULL. | ||
| */ | ||
| void bloom_free(struct bloom_filter *bf); | ||
|
|
||
| /* | ||
| ** Return the number of entries that have been added to the bloom filter. | ||
| */ | ||
| size_t bloom_count(const struct bloom_filter *bf); | ||
|
|
||
| /* | ||
| ** Return the number of hash functions (k) used by the bloom filter. | ||
| */ | ||
| unsigned int bloom_nhash(const struct bloom_filter *bf); | ||
|
|
||
| /* | ||
| ** Return the total number of bits (m) in the bloom filter. | ||
| */ | ||
| size_t bloom_bits(const struct bloom_filter *bf); | ||
|
|
||
| END_DECLS | ||
|
|
||
| #endif /* INN_BLOOM_H */ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Julien-Elie I wonder if we add another chicken bit here and only do it for tradspool/timehash etc?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the times with the default fp look reasonable so maybe need to see more data points to see if i.e. CNFS should completely opt out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The times are similar so maybe it's not worth adding extra complexity at this point to probe whether articles are in a self-expiring storage method and the
-Nflag is not used.FWIW, without the
-Nflag, the time of run ofexpireoveris still similar in my news spool (about 46 seconds for 416 000 articles).