A queue is the simplest form of asynchronous messaging. A producer drops a job onto the queue and walks away; one of N workers picks it up, does the work, acknowledges. If the worker dies before acking, the broker re-delivers. Add more workers to go faster. The plumbing behind every "send the email later," every image-processing pipeline, every long-running job your web request shouldn't wait for.
← Back to APIs & Networking"Sign up" should return in 100ms. Sending the welcome email, generating the avatar, kicking off the analytics event, and warming the recommendations cache should not block that response. Enqueue and return; workers do the slow stuff. The user gets a fast page; you keep the work.
Black Friday traffic is 10× normal for two hours. Your DB and web tier can't auto-scale that fast. A queue is a buffer — requests pile up, workers drain at their normal rate, the system degrades into latency rather than collapsing into errors. You trade "instant" for "alive."
Worker died mid-job? Broker re-delivers. Email service was down? Worker raises an exception, doesn't ack, the message comes back. Combined with idempotent handlers (see below), you get free resilience.
Producer doesn't know how many workers exist or how long they take. Workers can be deployed and scaled independently. The team that owns the producer can ship without coordinating with the team that owns the worker.
Queues guarantee at-least-once delivery. Sometimes the worker finishes the job, then crashes before acking — the message comes back, and a second worker runs it again. Make every handler idempotent: store the job's unique ID and skip if already processed, or design the operation so running it twice is harmless (UPDATE order SET status='shipped' WHERE id=42 is naturally idempotent).
A message that fails repeatedly — bug in the handler, malformed payload, missing dependency — will block the queue forever if you keep retrying. After N failures, route it to a dead-letter queue for human inspection. Wire alerts on DLQ depth.
Set the timeout longer than your worst-case job duration plus a margin. Too short, and a slow job triggers a duplicate while it's still running. Too long, and a crashed worker holds messages hostage for ages. Many brokers let workers extend the timeout dynamically as work progresses.
If a job fails because a downstream service is down, retrying immediately doesn't help. Use exponential backoff between retries. After a retry budget, send to DLQ. If 1 message in 10,000 always fails — a malformed customer record from 2009 — let it land in DLQ and don't pretend it's transient.
It's tempting — SELECT FOR UPDATE SKIP LOCKED on a jobs table works, kind of. It's fine at low scale. At higher throughput it bottlenecks the DB, locks become contention hotspots, and you reinvent every queue feature poorly. Use a real broker once you're past "few jobs per second."
The exception: the outbox pattern, where an outbox table inside the same DB transaction as a state change feeds a real broker. That's a queue-as-a-table done right.
| Broker | Style | Where It Shines |
|---|---|---|
| RabbitMQ | AMQP — queues + flexible routing exchanges | Task queues, work distribution, complex routing rules. |
| Amazon SQS | Managed FIFO or standard queue | "Just give me a queue" on AWS. Pairs naturally with Lambda. |
| Azure Service Bus | Managed queue + topics | The default on Azure; sessions, transactions, scheduled delivery. |
| Google Cloud Tasks | HTTP-target task queue | Async invocations of HTTP endpoints; fits Cloud Run / App Engine. |
| ActiveMQ Artemis | Self-hosted broker | Enterprise on-prem; supports multiple protocols. |
| Beanstalkd, Sidekiq, Resque | Lightweight task queues (often Redis-backed) | Web app background jobs in Ruby/Python/Node. |
| Redis Streams / Lists | In-memory, lightweight | Quick wins alongside an existing Redis. Plan for persistence carefully. |
Pick a queue when one producer hands off work to one consumer (or one of N workers competing for it), and you don't need fan-out, replay, or ordering across the whole stream. Background jobs, image processing, email, integrations, retries against external APIs — all classic queue territory.
Pick Pub/Sub when one event needs to reach many independent consumers. Pick event streaming (Kafka) when you need replay, long retention, and an ordered append-only log as the source of truth.