Bloom Filter

What is a Bloom Filter?

A Bloom filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set. It's "probabilistic" because it can tell you one of two things:

The element is definitely NOT in the set.
The element MIGHT BE in the set.

This means false positives are possible (it might say an element is present when it isn't), but false negatives are impossible (if it says an element is absent, it is truly absent).

Why Use It?

Bloom filters shine when:

You have a very large set of items.
Membership queries are frequent.
The cost of a false positive is acceptable and less than the cost of checking the actual data source unnecessarily.
You want to use significantly less memory than storing the entire set.

Common Use Cases: Checking if a user ID exists before hitting a database, checking if a URL has been seen by a web crawler, identifying previously seen items in caching systems.

How it Works

Initialization: Create a bit array (or bit vector) of size m, initially all set to 0. Choose k different hash functions.
Adding an Item ("Insertion"):
- Take the item you want to add.
- Hash it using each of the k hash functions.
- Each hash function produces an index (usually hash(item) % m).
- Set the bits at all these k calculated indices in the bit array to 1.
Checking for an Item ("Lookup"):
- Take the item you want to check.
- Hash it using the same k hash functions.
- Check the bits at all k calculated indices in the bit array.
- If ANY of these bits is 0, the item is definitely NOT in the set.
- If ALL of these bits are 1, the item MIGHT BE in the set (could be a true positive or a false positive).

False Positives Explained

A false positive occurs when checking for an item that was never added, but all the bits corresponding to its hash values happen to be 1 because they were set by other added items. The probability of false positives depends on the bit array size (m), the number of hash functions (k), and the number of items added (n). Larger m and optimal k reduce false positives.

Note: Standard Bloom filters do not support deletion easily because clearing a bit might affect other items that hashed to the same bit.

Bloom Filter

What is a Bloom Filter?

Why Use It?

How it Works

False Positives Explained

Visualize the Bloom Filter