Solving Uneven Load Distribution Problem for High Traffic System
In high-throughput systems, managing even load distribution is critical for ensuring timely data processing and optimal resource utilization, especially during peak periods like promotional events or holidays. For applications handling millions of transactions daily, maintaining real-time processing is essential to meet strict SLAs. However, challenges such as consumer lag, uneven processing capacities, and the need for efficient resource allocation often arise in Kafka-based architectures.Scale:
Solving Uneven Load Distribution Problem in Kafka
In high-throughput systems, managing even load distribution is critical for ensuring timely data processing and optimal resource utilization, especially during peak periods like promotional events or holidays. For applications handling millions of transactions daily, maintaining real-time processing is essential to meet strict SLAs. However, challenges such as consumer lag, uneven processing capacities, and the need for efficient resource allocation often arise in Kafka-based architectures.
Scale and Setup:
- Transaction Volume: 5–10 million transactions daily, peaking at 50,000 transactions per minute during high-traffic periods.
- Kafka Partitions: 30 partitions per topic.
- Consumers: 15–20 consumer instances distributed across multiple AWS ECS instances for distributed data processing.
Challenges We Faced:
- Uneven Load Distribution: Initially, our fraud detection system used round-robin partitioning for distributing Kafka messages (financial transactions) across partitions. Certain consumers experienced lag, delaying fraud detection and leading to failed real-time detections for high-risk transactions.
For example, during a Black Friday promotion, some transactions requiring enhanced verification (e.g., multi-factor authentication or third-party API lookups) slowed down particular workers. While other workers remained underutilized, these overloaded workers caused latency spikes in processing, delaying fraud detection by up to 20seconds in some cases, which is unacceptable in a financial system where timely detection is critical.
2. Hardware and Consumer Processing Capacity: While scaling horizontally by adding more consumer instances on AWS ECS, we encountered hardware discrepancies. Some ECS instances had higher throughput, capable of processing 20,000 transactions per minute, while others processed only 12,000 transactions per minute due to underlying hardware limitations. This created an imbalance, where faster consumers were not fully utilized, and slower consumers were struggling to keep up, causing Kafka consumer lag and requiring us to over-provision ECS resources to maintain SLAs.
This mismatch in processing speed leads to over-provisioning — where we had to allocate more hardware resources than necessary to ensure that even the slowest consumer can keep up with the processing demand.
For example, if 60 messages per second are being distributed equally across 5 consumer, but one of the workers can only process 10 messages per second due to limitations, then the entire system’s throughput is reduced to that bottlenecked rate, building up latency in message processing.
The total capacity is reduced, and extra resources must be provisioned to ensure SLA compliance, leading to 66.7% overprovisioning in some cases.Moreover, during peak times, we saw P99 latencies rising to 20 seconds, particularly in fraud detection cases involving complex risk models.
Solution
Lag-Aware Producers
Concept: A lag-aware producer is an advanced Kafka producer that monitors the consumer lag (the difference between the messages produced and consumed for a partition) before publishing new messages. By incorporating lag awareness, the producer adjusts the rate of message production based on the lag in each partition, ensuring that overloaded consumers aren’t overwhelmed while faster consumers can continue processing efficiently.
How it Works:
- The producer periodically fetches lag information from Kafka brokers, checking the lag for each partition (e.g., number of unprocessed messages per partition).
- The producer maintains an internal lag cache to reduce the frequency of fetching lag information from Kafka (since querying Kafka for every message can be inefficient).
- Based on the lag data, the producer decides whether to send more messages to partitions with lower lag and fewer messages to partitions with higher lag.
Impact:
Lag-aware producers prevent overwhelmed consumers from further lagging, ensuring more evenly distributed processing. This approach is especially effective when some transactions require additional processing, as producers will allocate those tasks to less burdened partitions, preventing system bottlenecks.
Why it’s Important:
- Without lag-aware producers, the system risks overwhelming already overburdened partitions, leading to increased message processing delays.
- For fraud detection, this delay could result in failed real-time detection, where a suspicious transaction might be processed too late to stop.
2. Lag-Aware Consumers
Concept: A lag-aware consumer is a Kafka consumer that monitors its own performance and the lag of the partitions it consumes from. If a consumer detects that it is falling behind (i.e., accumulating too many unprocessed messages in its partitions), it can dynamically rebalance partition assignments, potentially handing off some partitions to other, underutilized consumers.
How it Works:
- Each consumer monitors its own processing lag.
- When lag in a partition exceeds a predefined threshold, the consumer triggers a rebalance. This causes some of its partitions to be reassigned to other consumers that have more processing capacity.
- The rebalancing is handled dynamically, ensuring that no single consumer is overwhelmed.
Impact:
Lag-aware consumers ensure that no consumer becomes a bottleneck, maintaining efficient and timely processing across all partitions. This self-adjusting mechanism reduces latency, especially during high-traffic periods, by redistributing tasks based on real-time performance.
3. Same-Queue Length Algorithm
Concept: The Same-Queue Length Algorithm is designed to ensure that each Kafka partition has a similar backlog (queue length), or lag, at any given time. By adjusting how many messages are sent to each partition, this algorithm aims to keep all consumers equally busy, minimizing the risk that one consumer becomes overwhelmed while others remain idle.
How it Works:
- Kafka producers continuously monitor the queue length (or lag) of each partition.
- The producer sends more messages to partitions with shorter queues and fewer messages to those with longer queues, thereby equalizing the queue lengths across all partitions.
- Over time, this ensures that all partitions have even queue lengths, and no partition has an excessively large backlog compared to others.
Why it’s Important:
- By ensuring that no partition’s queue grows disproportionately larger than others, the system can process transactions evenly, reducing the risk of consumer lag and ensuring timely fraud detection.
- This approach is particularly important for complex, high-risk transactions, where overloading one partition can lead to delays in detecting fraudulent activity.
4. Weighted Load Balancing
In cases where consumer processing capacity is known (i.e., some ECS instances are faster than others), weighted load balancing can be used to assign more traffic to faster consumers and less traffic to slower ones, ensuring that the system uses hardware resources as efficiently as possible.
How it Works:
- Each consumer is assigned a weight based on its processing capacity (e.g., Consumer 1 can process 20,000 transactions/min, while Consumer 3 can process only 10,000 transactions/min).
- Kafka partitions are assigned to consumers based on these weights, ensuring that faster consumers handle more traffic.
Impact:
In a system with varied hardware, weighted load balancing reduces underutilization of high-capacity instances and prevents overloading slower ones. By distributing load according to consumer capacity, this approach minimizes lag and maintains a steady processing rate across the system.
Summary
Through lag-aware producers, lag-aware consumers, queue-length balancing, and weighted load balancing, Kafka-based systems can dynamically adjust to maintain even load distribution, reduce latency, and ensure efficient resource utilization. These strategies optimize both processing speed and resource allocation, enabling high-throughput systems to handle peak loads while preventing bottlenecks.