System Resilient to Retries

Duplicate prevention and auto-retry design for easy recovery from operational errors

Duplicate PreventionRetryIdempotencyError RecoveryExponential Backoff
4 min read

About This Topic

The most troublesome aspect of business systems is recovering from operational errors. Problems like "two identical invoices were created" or "an error occurred midway and I don't know how much was processed" cause workplace confusion.

This project was designed to handle various "oops" moments gracefully.

Sending the Same Invoice Twice is OK

A mechanism called "idempotency" prevents duplicate processing.

Duplicate Prevention Mechanism
First Request

Send with key="invoice:001"

Cache Check

Not registered → Execute as new process

Save Result

key="invoice:001" → Success, ID: 12345

Second Request (Same Key)
Second Request

Send again with key="invoice:001"

Cache Check

Already registered → Check previous result

Return Previous Result

Skip processing, return ID: 12345

Idempotency Key Design

Each request is assigned a unique key. If a second submission uses the same key, it's treated as "already created" and processing is skipped.

comp1:invoice:001
ComponentsCompany ID + doc type + doc number
PurposeInvoice duplicate prevention
comp1:delivery:2024-001
ComponentsCompany ID + doc type + doc number
PurposeDelivery slip duplicate prevention

7-Day Memory

Results are cached for 7 days. Within a week, you can verify "has this invoice been created or not."

Auto-Retry on Communication Errors

For temporary network issues or server congestion, the system automatically waits and retries.

Exponential Backoff with Jitter

"Exponential backoff" gradually lengthens retry intervals. "Jitter" adds random delay to prevent multiple clients from retrying simultaneously and causing congestion again.

Retry Flow
Send Request

Execute API call

Error Occurs

429 (rate limit) or 5xx (server error)

Retry 1

Wait 300ms + random (0-199ms)

Retry 2

Wait 600ms + random

Retry 3

Wait 1200ms + random

Up to 5 Retries

Return error if limit reached

Retryable Errors

Not all errors trigger retries. Only temporary errors are retried.

429
MeaningToo many requests (rate limit)
RetryYes
500-599
MeaningTemporary server error
RetryYes
400
MeaningInvalid request
RetryNo (needs fixing)
401
MeaningAuthentication error
RetryNo (check token)
404
MeaningResource not found
RetryNo

Detailed Logs for Problem Identification

Records when, who, what was executed, and the result. Even when errors occur, it's clear which row had what problem.

Structured Log Example

{
  "event": "row_processed_error",
  "docType": "invoice",
  "rowNumber": 5,
  "idempotencyKey": "comp1:invoice:001",
  "httpStatus": 422,
  "error": "partner_code not found",
  "timestamp": "2025-01-24T10:30:00.000Z"
}

What Logs Reveal

event
ContentWhat happened
UseIdentify event type
rowNumber
ContentWhich row it occurred on
UseFind row in spreadsheet
httpStatus
ContentAPI response code
UseUnderstand error type
error
ContentError message
UseIdentify specific cause
timestamp
ContentWhen it occurred
UseChronological tracking

Error Handling Flow

Error Response Flow
Error Occurs

Status column in spreadsheet shows "Error"

Identify Cause

Check error content in logs. Find row using rowNumber

Fix Data

Correct problematic data in spreadsheet

Re-execute

Select fixed row and resend. New idempotency key means new processing

What This Design Achieves

For Operations

  • Prevent duplicate invoices: Same invoice is never created twice
  • Resilience to temporary failures: Auto-retry handles recovery
  • Easy cause identification: Structured logs pinpoint problem areas instantly

For Field Staff

  • Operate with confidence: Peace of mind knowing mistakes are OK
  • Immediate status awareness: Result confirmation via status display
  • Easy re-execution: Just fix and resend

Related Topics