Bloom Filter

What is a Bloom Filter?

A Bloom filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set. It's "probabilistic" because it can tell you one of two things:

This means false positives are possible (it might say an element is present when it isn't), but false negatives are impossible (if it says an element is absent, it is truly absent).

Why Use It?

Bloom filters shine when:

Common Use Cases: Checking if a user ID exists before hitting a database, checking if a URL has been seen by a web crawler, identifying previously seen items in caching systems.

How it Works

  1. Initialization: Create a bit array (or bit vector) of size m, initially all set to 0. Choose k different hash functions.
  2. Adding an Item ("Insertion"):
    • Take the item you want to add.
    • Hash it using each of the k hash functions.
    • Each hash function produces an index (usually hash(item) % m).
    • Set the bits at all these k calculated indices in the bit array to 1.
  3. Checking for an Item ("Lookup"):
    • Take the item you want to check.
    • Hash it using the same k hash functions.
    • Check the bits at all k calculated indices in the bit array.
    • If ANY of these bits is 0, the item is definitely NOT in the set.
    • If ALL of these bits are 1, the item MIGHT BE in the set (could be a true positive or a false positive).

False Positives Explained

A false positive occurs when checking for an item that was never added, but all the bits corresponding to its hash values happen to be 1 because they were set by other added items. The probability of false positives depends on the bit array size (m), the number of hash functions (k), and the number of items added (n). Larger m and optimal k reduce false positives.

Note: Standard Bloom filters do not support deletion easily because clearing a bit might affect other items that hashed to the same bit.

Visualize the Bloom Filter

Configure the filter, add items, and check for their presence to observe the behavior and false positives.

Bit Array (m = 30)
Enter an item to Add or Check.
Items Added (n = 0)
Hash Details
(Hashes and indices will appear here)
Log messages will appear here...