Back to blog
April 10, 2025

Building a Fault-Tolerant Job Queue with Redis and Node.js

How I designed the job queue architecture behind VTU AutoPilot — handling concurrency, session expiry retries, and real-time progress streaming via SSE.

node.jsredisarchitecturebackend

The Problem

VTU's online course platform requires students to sit through 100+ video lectures and mark each one complete. Doing this manually for every subject, every semester, is genuinely tedious. VTU AutoPilot was built to automate exactly that — but at scale, with multiple users running jobs simultaneously, the architecture had to be thoughtful.

Why a Job Queue?

The naive approach is to process each lecture sequentially in a single HTTP request. That breaks immediately:

A proper job queue solves all of this by separating the submission of work from execution.

The Architecture

Client → REST API → Redis Queue → Worker → SSE Stream → Client

The REST API accepts job submissions and enqueues them. A persistent worker process pulls jobs off the queue and executes them. Progress is pushed back to the client via Server-Sent Events (SSE) — a lightweight alternative to WebSockets for one-directional streaming.

Redis as the Backbone

Redis was chosen for three reasons:

  1. Persistence — jobs survive server restarts (appendonly yes in config)
  2. Atomic operationsLPUSH / BRPOP for queue operations are atomic, preventing double-processing
  3. TTL support — completed job state automatically expires, keeping memory clean

The queue structure looks like this:

// Enqueue
await redis.lpush("job:queue", JSON.stringify({ jobId, userId, courseId }));
 
// Worker dequeue (blocks until item available)
const [, raw] = await redis.brpop("job:queue", 0);
const job = JSON.parse(raw);

Handling Session Expiry

VTU's backend returns 401, 419, or 403 when a session expires mid-job. The worker detects these and automatically re-authenticates before retrying the failed lecture — transparently, without any user action:

async function processLecture(lecture, credentials) {
  try {
    await markComplete(lecture, credentials.sessionToken);
  } catch (err) {
    if ([401, 419, 403].includes(err.status)) {
      // Re-authenticate and retry once
      credentials.sessionToken = await refreshSession(credentials);
      await markComplete(lecture, credentials.sessionToken);
    } else {
      throw err;
    }
  }
}

Real-time Progress via SSE

Instead of polling an endpoint, the client opens a persistent SSE connection. The server pushes updates as lectures complete:

// Server
app.get("/progress/:jobId", (req, res) => {
  res.setHeader("Content-Type", "text/event-stream");
  res.setHeader("Cache-Control", "no-cache");
  res.setHeader("Connection", "keep-alive");
 
  const send = (data) => res.write(`data: ${JSON.stringify(data)}\n\n`);
 
  const interval = setInterval(async () => {
    const progress = await redis.hgetall(`job:${req.params.jobId}`);
    send(progress);
    if (progress.status === "done") {
      clearInterval(interval);
      res.end();
    }
  }, 500);
 
  req.on("close", () => clearInterval(interval));
});

Deduplication and Concurrency Control

Without deduplication, a user could submit the same course twice and waste resources (and potentially hit rate limits). Each job gets a deterministic ID based on userId + courseId:

const jobId = crypto
  .createHash("sha256")
  .update(`${userId}:${courseId}`)
  .digest("hex")
  .slice(0, 16);

Concurrency is capped with a semaphore pattern — only N jobs run simultaneously across the entire server, preventing the VTU backend from rate-limiting.

Key Takeaways

The full source is on GitHub.