A Bloom filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set. It's "probabilistic" because it can tell you one of two things:
This means false positives are possible (it might say an element is present when it isn't), but false negatives are impossible (if it says an element is absent, it is truly absent).
Bloom filters shine when:
Common Use Cases: Checking if a user ID exists before hitting a database, checking if a URL has been seen by a web crawler, identifying previously seen items in caching systems.
m
, initially all set to 0. Choose k
different hash functions.k
hash functions.hash(item) % m
).k
calculated indices in the bit array to 1.k
hash functions.k
calculated indices in the bit array.A false positive occurs when checking for an item that was never added, but all the bits corresponding to its hash values happen to be 1 because they were set by other added items. The probability of false positives depends on the bit array size (m
), the number of hash functions (k
), and the number of items added (n
). Larger m
and optimal k
reduce false positives.
Note: Standard Bloom filters do not support deletion easily because clearing a bit might affect other items that hashed to the same bit.
Configure the filter, add items, and check for their presence to observe the behavior and false positives.