Notification Delivery Pipeline Design (Queue, Throttling, Retry)

Implementation principles for a queue-first delivery backbone that remains stable under burst traffic

Notification DeliveryJob QueueThrottlingRetryDead Letter Queue
2 min read

Why Queue-First Delivery

Back in Stock traffic is bursty.
When popular products recover inventory, synchronous delivery quickly hits timeouts and provider limits.

This is one of those problems that looks small, then grows fast.

A queue-first pipeline keeps the system stable and recoverable.
It also makes failure handling much more predictable.

Define explicit transitions:

  1. pending
  2. processing
  3. sent / failed-temporary / failed-permanent
  4. dead-letter

Only failed-temporary should be retried automatically.
Everything else should follow explicit rules.

Throughput Controls

Use multiple limits simultaneously:

  • global messages-per-second
  • per-domain caps
  • tenant/workspace caps for multi-store setups

Queue ordering can be FIFO by subscription time to preserve fairness.
That kind of consistency matters more than it seems.

Retry Strategy

Classify failures and apply different behavior:

  • 429, timeout: exponential backoff
  • transient provider errors: retry with jitter
  • invalid recipient: mark failed-permanent immediately

When retry budget is exhausted, move to dead-letter and allow manual replay from admin UI.
This keeps the primary queue healthy while preserving recoverability.

Idempotency for Send Safety

Prevent duplicate sends with a deterministic key:

  • notificationKey = variantId + subscriberId + restockWindow
  • acquire/send atomically
  • if already sent, mark as duplicate-suppressed

Template Version Traceability

Store template metadata in send logs:

  • template ID
  • template version
  • resolved variables at send time

This makes post-incident investigation and compliance audits much easier.
And yes, those details usually matter later, not earlier.

Pipeline Observability

Track these core metrics:

  • queue latency (enqueue to sent)
  • temporary vs permanent failure rate
  • retry recovery rate
  • dead-letter backlog

Set alert thresholds for dead-letter growth and queue latency spikes.
Without alerts, problems stay invisible until users complain.

Summary

Back in Stock delivery should be treated as a resilient asynchronous system.
A queue-first model with clear state transitions, throttling, retries, and idempotency is the practical baseline for production.