Distributed Task Queue
A high-throughput distributed task queue built for processing millions of jobs daily with guaranteed delivery and fault tolerance.
Problem
Existing job processing systems could not handle the required throughput of 2M+ jobs per day while maintaining ordering guarantees and exactly-once semantics.
Solution
Designed a distributed task queue using Redis Streams for message brokering with PostgreSQL as a durable backing store. Implemented consumer groups with automatic rebalancing and dead-letter handling.
Architecture
The system follows a producer-consumer pattern with a coordinator service managing partition assignment. Each worker node processes messages from assigned partitions, with checkpointing to PostgreSQL for recovery. A separate reconciliation service handles failed deliveries.
Key Decisions
- Chose Redis Streams over Kafka for lower operational complexity at the required scale
- Implemented idempotency keys at the consumer level to achieve exactly-once processing
- Used gRPC for inter-service communication to minimize serialization overhead
- Designed a custom backpressure mechanism to prevent worker overload