Back to blog
·12 min read

Building Scalable Task Queues with Redis and Node.js

Lessons learned from building a distributed task processing system handling 100K+ daily jobs.

Node.jsRedisArchitecture

When you need to process tasks asynchronously at scale, a robust queue system becomes essential. Here's what I learned building one that handles 100K+ jobs daily.

Why Not Just Use SQS?

Amazon SQS is great, but for our use case we needed:

  • Sub-second job pickup latency
  • Complex retry strategies per job type
  • Real-time job progress tracking
  • Priority queues

Redis with Bull gave us all of this with simpler operational overhead.

Architecture Overview

The system has three main components:

  • Producers - API servers that enqueue jobs
  • Redis - The queue storage and pub/sub backbone
  • Workers - Horizontally scalable job processors

Key Design Decisions

1. Job Persistence

While Redis is fast, we needed durability. Every job is:

  • Written to Redis for processing
  • Logged to PostgreSQL for audit trail
  • Results persisted after completion

2. Retry Strategy

Different jobs need different retry behaviors:

const emailQueue = new Bull("email", {
  defaultJobOptions: {
    attempts: 5,
    backoff: {
      type: "exponential",
      delay: 2000, // 2s, 4s, 8s, 16s, 32s
    },
  },
});

3. Worker Scaling

Workers scale based on queue depth using Kubernetes HPA:

metrics:
  - type: External
    external:
      metric:
        name: redis_queue_depth
      target:
        type: AverageValue
        averageValue: "100"

Lessons Learned

  1. . Always set job timeouts - Stuck jobs will block workers
  2. . Use separate queues for different priorities - Don't let bulk jobs block critical ones
  3. . Monitor everything - Queue depth, processing time, failure rates

The system has been running in production for 2 years with 99.9% job completion rate.