Redis Bloom Filters: How I scaled a Memory-Efficient Uniqueness Check?

7 min readSep 19, 2023

Problem Statement:

Many applications require ensuring the uniqueness of user-provided fields, such as usernames. This is a common requirement for web applications, where usernames should be unique for each registered user. While on a small scale, you can achieve this by querying the database and enforcing unique constraints, this approach becomes challenging as the registration rate increases. In this article, we will explore how to design a system that can efficiently handle uniqueness verification at scale using Redis Bloom filters.

Database-Backed Uniqueness Check:

On a small scale, you can perform a database query to check if a username is already in use. However, as the registration rate increases, this approach may not scale efficiently. To address this, you can introduce a Redis cache. By maintaining a Redis set containing all usernames, you can significantly speed up queries. When a request comes in to verify a username, you only need to check the cache, eliminating the need to hit the database. However, this method requires substantial memory, particularly if you have a large number of records.

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

public class UsernameUniquenessChecker {
    private static final String DB_URL = "jdbc:mysql://localhost:3306/your_database";
    private static final String DB_USER = "your_username";
    private static final String DB_PASSWORD = "your_password";

    public static boolean isUsernameUnique(String username) {
        try (Connection conn = DriverManager.getConnection(DB_URL, DB_USER, DB_PASSWORD)) {
            String sql = "SELECT COUNT(*) FROM users WHERE username = ?";
            try (PreparedStatement stmt = conn.prepareStatement(sql)) {
                stmt.setString(1, username);
                try (ResultSet rs = stmt.executeQuery()) {
                    if (rs.next()) {
                        int count = rs.getInt(1);
                        return count == 0; // If count is 0, username is unique
                    }
                }
            }
        } catch (SQLException e) {
            e.printStackTrace();
        }
        return false; // In case of an error, consider the username as non-unique
    }

    public static void main(String[] args) {
        String desiredUsername = "new_user";
        boolean isUnique = isUsernameUnique(desiredUsername);
        if (isUnique) {
            System.out.println("Username '" + desiredUsername + "' is unique. Proceed with registration.");
        } else {
            System.out.println("Username '" + desiredUsername + "' is already in use. Choose a different one.");
        }
    }
}

Performance Issues:

Database Load: As the registration rate increases, the database experiences a higher load due to the frequent execution of SELECT queries to check username uniqueness. Each query requires database resources, including CPU and I/O operations, to search for matching usernames. This increased load can lead to performance degradation and slower response times for other database operations.
Latency: Database queries involve network communication between the application server and the database server. The time it takes to establish connections, send queries, and receive responses contributes to latency. With a high registration rate and many concurrent requests, this latency can accumulate and impact user experience.
Scalability Challenges: Databases have limits on concurrent connections and resources. If the registration rate continues to grow, the database server may struggle to handle the increased number of incoming requests. Scaling a database vertically (adding more resources to a single server) can be costly and may have limitations.

Caching as a Partial Solution:

To address the performance issue of database calls for username uniqueness checks, an efficient Redis cache is introduced. This cache is initialized with a Redis key dedicated to storing usernames.

When a user registers or logs in, the application first checks this cache. If the username is found in the cache, it’s considered valid without the need for a database query.
If the username is not cached, a database query is executed for validation, and the result is then stored in the cache with a set expiration time.

This approach significantly reduces the load on the database, enhances response times, and scales efficiently as the user base grows. It also ensures data consistency by handling cache invalidation after database updates and manages memory usage effectively.

import redis.clients.jedis.Jedis;
import redis.clients.jedis.JedisPool;
import redis.clients.jedis.JedisPoolConfig;

public class UsernameCache {

    private static final String REDIS_HOST = "localhost"; // Redis server host
    private static final int REDIS_PORT = 6379; // Redis server port
    private static final int CACHE_EXPIRATION_SECONDS = 3600; // Cache expiration time in seconds

    private static JedisPool jedisPool;

    // Initialize the Redis connection pool
    static {
        JedisPoolConfig poolConfig = new JedisPoolConfig();
        jedisPool = new JedisPool(poolConfig, REDIS_HOST, REDIS_PORT);
    }

    // Method to check if a username is unique using the Redis cache
    public static boolean isUsernameUnique(String username) {
        try (Jedis jedis = jedisPool.getResource()) {
            // Check if the username exists in the Redis cache
            if (jedis.sismember("usernames", username)) {
                return false; // Username is not unique
            }
        } catch (Exception e) {
            e.printStackTrace();
            // Handle exceptions or fallback to database query if Redis is unavailable
        }
        return true; // Username is unique (not found in cache)
    }

    // Method to add a username to the Redis cache
    public static void addToCache(String username) {
        try (Jedis jedis = jedisPool.getResource()) {
            jedis.sadd("usernames", username); // Add the username to the cache set
            jedis.expire("usernames", CACHE_EXPIRATION_SECONDS); // Set expiration time for the cache
        } catch (Exception e) {
            e.printStackTrace();
            // Handle exceptions if Redis cache update fails
        }
    }

    // Cleanup and close the Redis connection pool
    public static void close() {
        jedisPool.close();
    }

    public static void main(String[] args) {
        // Example usage:
        String newUsername = "Dip_Mazumder";
        if (isUsernameUnique(newUsername)) {
            System.out.println("Username is unique. Proceed with registration.");
            addToCache(newUsername); // Add the username to the Redis cache
        } else {
            System.out.println("Username is already in use. Please choose another.");
        }

        // Don't forget to close the connection pool when your application exits
        close();
    }
}

Memory Usage per Record: Each username requires approximately 20 bytes of memory. This estimate includes the storage for the username string itself and any additional metadata associated with it.

You want to store 1 billion usernames.

Total Memory = Memory Usage per Record * Number of Records = 20 bytes/record * 1,000,000,000 records = 20,000,000,000 bytes = 20,000,000 KB = 20,000 MB = 20 GB

So, you would need approximately 20 gigabytes of memory to store 1 billion usernames with each username consuming 20 bytes of memory.

Uniqueness Test with Bloom Filters:

To mitigate the memory requirements of the cache-based approach, you can consider using Bloom filters. Bloom filters are probabilistic data structures that allow for a controlled level of false positives but guarantee no false negatives. If the filter reports an item is not present, you can be sure it was never stored in the filter. However, if the filter reports that it has seen an item, it might or might not have seen that item. This error rate can be controlled, making Bloom filters an efficient choice.

Unlike storing full items, Bloom filters use minimal memory, and memory requirements depend on the required error rate (false positive probability).

For example, to store 1 billion records with a 0.001 false positive probability, you only need 1.67 GB of memory.

Using Redis Bloom Filters with Java:

To implement Redis Bloom filters with Java, you first need to install a version of Redis built with the BloomFilter module. You can download a pre-compiled version or use a Docker image, such as redislabs/rebloom:latest.

Unfortunately, the popular Redis SDK for Java, Jedis, does not support the Redis Bloom filter module out of the box. You’ll need to add the JReBloom dependency to your project. Here’s how to do it in Maven:

<dependencies>
    <dependency>
        <groupId>com.redislabs</groupId>
        <artifactId>jrebloom</artifactId>
        <version>2.0.0</version>
    </dependency>
</dependencies>

The library provides a Client class, which offers methods to interact with the Bloom filter. You'll need to configure a JedisPool bean and create a Client object using it:

import com.redislabs.client.bloom.BloomFilter;
import com.redislabs.client.bloom.Client;
import redis.clients.jedis.JedisPool;
import redis.clients.jedis.JedisPoolConfig;

public class UsernameUniquenessChecker {
    // Define your Redis server details
    private static final String REDIS_HOST = "localhost";
    private static final int REDIS_PORT = 6379;
    private static final String REDIS_PASSWORD = null; // If Redis has no password

    // Define the Bloom filter key
    private static final String BF_KEY = "usernames";

    public static void main(String[] args) {
        // Configure the JedisPool for Redis connection
        JedisPoolConfig poolConfig = new JedisPoolConfig();
        JedisPool jedisPool = new JedisPool(poolConfig, REDIS_HOST, REDIS_PORT, 0, REDIS_PASSWORD);

        // Create a Client for interacting with Redis Bloom filter
        Client client = new Client(jedisPool);

        // Initialize or create a Bloom filter with a desired error rate and capacity
        BloomFilter bf = BloomFilter.getNamed(BF_KEY, client);

        // Add usernames to the Bloom filter (simulate registration)
        String username1 = "Dip";
        String username2 = "Mazumder";
        String username3 = "Dip_Mazumder";

        bf.add(username1);
        bf.add(username2);
        bf.add(username3);

        // Check if a username is unique
        String newUsername = "user123"; // Simulate a new registration attempt

        if (bf.contains(newUsername)) {
            System.out.println("Username '" + newUsername + "' is possibly not unique. Further verification required.");
            // You can perform additional checks or query the database to confirm uniqueness
        } else {
            System.out.println("Username '" + newUsername + "' is likely unique. Proceed with registration.");
            // You can safely proceed with user registration
        }

        // Close the JedisPool when done
        jedisPool.close();
    }
}

Performance :

Redis Bloom filters offer a memory-efficient solution for large-scale uniqueness verification. They strike a balance between memory consumption and error rates, making them suitable for applications where memory constraints are a concern. By implementing Redis Bloom filters, you can efficiently handle uniqueness checks, even as your system scales up, without compromising on performance.