165 lines
7.7 KiB
Markdown
165 lines
7.7 KiB
Markdown
|
|
---
|
||
|
|
name: idempotency-design
|
||
|
|
description: "Knowledge contract for designing idempotent operations, idempotency keys, TTL, storage, duplicate behavior, and collision handling. Referenced by design-architecture when designing idempotency."
|
||
|
|
---
|
||
|
|
|
||
|
|
This is a knowledge contract, not a workflow skill. It is referenced by `design-architecture` when the architect is designing idempotent operations.
|
||
|
|
|
||
|
|
## Core Principle
|
||
|
|
|
||
|
|
Idempotency must be driven by PRD requirements. Do not add idempotency to operations that do not need it. Do not skip idempotency on operations that the PRD explicitly requires to be idempotent.
|
||
|
|
|
||
|
|
Common PRD requirements that imply idempotency:
|
||
|
|
- "The system must not create duplicates when the same request is submitted twice"
|
||
|
|
- "Users should be able to retry failed submissions safely"
|
||
|
|
- "Payment processing must be exactly-once"
|
||
|
|
- "Webhook deliveries may be retried"
|
||
|
|
|
||
|
|
## Identifying Idempotent Operations
|
||
|
|
|
||
|
|
An operation needs idempotency when:
|
||
|
|
- The client may retry due to network timeout or failure
|
||
|
|
- The operation has side effects that must not be duplicated (creating resources, charging money, sending notifications)
|
||
|
|
- The PRD explicitly requires safe retry behavior
|
||
|
|
- The operation is triggered by an unreliable delivery mechanism (webhooks, message queues)
|
||
|
|
|
||
|
|
An operation is naturally idempotent when:
|
||
|
|
- It is a read operation (GET, HEAD, OPTIONS)
|
||
|
|
- It is a delete operation where deleting a non-existent resource returns 404 or 204
|
||
|
|
- It is a PUT that fully replaces a resource (set state to X)
|
||
|
|
- It is an operation where duplicated execution produces the same result
|
||
|
|
|
||
|
|
## Idempotency Key Strategy
|
||
|
|
|
||
|
|
### Key Source
|
||
|
|
- Client-generated: the client provides a unique key (e.g., UUID, order reference). Preferred for API operations.
|
||
|
|
- Deterministic: derived from request content (e.g., hash of user_id + action + parameters). Preferred when the client cannot provide a key.
|
||
|
|
- System-generated: the server assigns a key. Only for internal operations where the client does not participate.
|
||
|
|
|
||
|
|
### Key Format
|
||
|
|
- Define the key format explicitly (e.g., `UUID v7`, `{prefix}-{unique-identifier}`, `sha256(payload)`)
|
||
|
|
- Keys must be unique across the entire scope of the operation
|
||
|
|
- Keys must be reproducible: the same logical request must produce the same key
|
||
|
|
|
||
|
|
### Key Scope
|
||
|
|
- Per-user: key is unique within the user's context
|
||
|
|
- per-resource-type: key is unique within the resource type (e.g., all payment creation)
|
||
|
|
- Global: key is unique across the entire system
|
||
|
|
|
||
|
|
Define the scope based on the PRD requirement. Tighter scope is preferred when possible.
|
||
|
|
|
||
|
|
## Idempotency Key Storage
|
||
|
|
|
||
|
|
### Where to Store
|
||
|
|
- Database table (preferred for persistent idempotency)
|
||
|
|
- Table: `idempotency_keys`
|
||
|
|
- Columns: `key`, `operation_type`, `request_hash`, `response_hash`, `status`, `created_at`, `expires_at`
|
||
|
|
- Index: unique index on `(key, operation_type)`
|
||
|
|
- Redis (preferred for ephemeral idempotency with TTL)
|
||
|
|
- Key: `idempotency:{operation_type}:{key}`
|
||
|
|
- Value: serialized response or status reference
|
||
|
|
- TTL: set to expire after the idempotency window
|
||
|
|
|
||
|
|
### Storage Decision Framework
|
||
|
|
- Use database when: idempotency must survive restarts, keys must be queryable, audit trail is required
|
||
|
|
- Use Redis when: idempotency is time-bounded, fast lookup is critical, keys can expire, persistence loss is acceptable
|
||
|
|
|
||
|
|
## TTL (Time-to-Live)
|
||
|
|
|
||
|
|
Define for each idempotent operation:
|
||
|
|
- TTL duration: how long duplicate detection is active
|
||
|
|
- TTL basis: when does the clock start (key creation time, last access time)
|
||
|
|
- TTL scope: does the key expire or is it permanent
|
||
|
|
|
||
|
|
### TTL Duration Guidelines
|
||
|
|
- API operations: typically 24 hours (allows client retries within a day)
|
||
|
|
- Payment operations: typically 30 days (matches settlement windows)
|
||
|
|
- Webhook processing: typically 7 days (matches delivery retry windows)
|
||
|
|
- Internal operations: match the operation's natural retry window
|
||
|
|
|
||
|
|
### TTL Behavior
|
||
|
|
- After TTL expires, the key is removed and a new request with the same key is processed as a new operation
|
||
|
|
- Define whether TTL is strictly enforced (hard delete) or softly enforced (soft delete, kept for audit)
|
||
|
|
|
||
|
|
## Duplicate Request Behavior
|
||
|
|
|
||
|
|
When a duplicate request is detected (key already exists):
|
||
|
|
|
||
|
|
### During Processing
|
||
|
|
- The original request is still being processed
|
||
|
|
- Return `202 Accepted` with a status URL (for async operations)
|
||
|
|
- Or return `409 Conflict` if the client should not retry yet
|
||
|
|
|
||
|
|
### After Successful Processing
|
||
|
|
- Return the original successful response (stored or reconstructable)
|
||
|
|
- Must return the same status code and response body as the original
|
||
|
|
- This is the most common and recommended behavior
|
||
|
|
|
||
|
|
### After Failed Processing
|
||
|
|
- If the original processing permanently failed, allow retry with the same key
|
||
|
|
- If the original processing was interrupted (timeout, crash), allow retry with the same key
|
||
|
|
- Define whether the client must generate a new key or can reuse the original
|
||
|
|
|
||
|
|
Define for each idempotent operation:
|
||
|
|
- What the client receives when submitting a duplicate during processing
|
||
|
|
- What the client receives when submitting a duplicate after success
|
||
|
|
- What the client receives when submitting a duplicate after failure
|
||
|
|
|
||
|
|
## Collision Handling
|
||
|
|
|
||
|
|
A key collision occurs when two different logical requests produce the same idempotency key.
|
||
|
|
|
||
|
|
### Prevention
|
||
|
|
- Use UUID v7 or similar globally unique identifiers for client-generated keys
|
||
|
|
- Use sufficiently random hash functions for content-derived keys
|
||
|
|
- Include enough context in content-derived keys (user_id + action + parameters)
|
||
|
|
|
||
|
|
### Detection
|
||
|
|
- Compare the request hash of the new request with the stored request hash
|
||
|
|
- If hashes match: this is a true duplicate, return the stored response
|
||
|
|
- If hashes differ: this is a collision, different logical requests produced the same key
|
||
|
|
|
||
|
|
### Resolution
|
||
|
|
- Reject the new request with `409 Conflict` and ask the client to use a new key
|
||
|
|
- This is the safest and most common approach
|
||
|
|
- Never overwrite the original request's result with a different request's result
|
||
|
|
|
||
|
|
## Idempotency for Different Operation Types
|
||
|
|
|
||
|
|
### Create Operations
|
||
|
|
- Most common use case for idempotency
|
||
|
|
- Key: client-generated UUID or deterministic hash
|
||
|
|
- Behavior: return original created resource on duplicate
|
||
|
|
- Status codes: `201 Created` on first request, `200 OK` with original resource on duplicate
|
||
|
|
|
||
|
|
### Update Operations
|
||
|
|
- PUT operations that fully replace state are naturally idempotent
|
||
|
|
- PATCH operations that set state to a specific value are idempotent
|
||
|
|
- PATCH operations that increment or append are NOT naturally idempotent
|
||
|
|
- Key: derived from resource ID + operation type if not naturally idempotent
|
||
|
|
|
||
|
|
### Delete Operations
|
||
|
|
- Naturally idempotent: deleting an already-deleted resource returns `204 No Content` or `404 Not Found`
|
||
|
|
- Define which behavior the API contract specifies and stick with it consistently
|
||
|
|
|
||
|
|
### Payment Operations
|
||
|
|
- Must be idempotent (regulatory and financial requirement)
|
||
|
|
- Key: payment reference or client-generated UUID
|
||
|
|
- TTL: match settlement window (typically 30 days)
|
||
|
|
- Behavior: return original payment result on duplicate; never double-charge
|
||
|
|
|
||
|
|
### Webhook Processing
|
||
|
|
- Must be idempotent (delivery services may retry)
|
||
|
|
- Key: webhook event ID or delivery attempt ID
|
||
|
|
- TTL: match delivery retry window (typically 7 days)
|
||
|
|
- Behavior: skip processing on duplicate, return success
|
||
|
|
|
||
|
|
## Anti-Patterns
|
||
|
|
|
||
|
|
- Adding idempotency to naturally idempotent operations (wastes resources)
|
||
|
|
- Not adding idempotency to operations the PRD requires to be safe for retry
|
||
|
|
- Storing idempotency keys with no TTL, causing unbounded table growth
|
||
|
|
- Using content-derived keys with insufficient entropy, causing collisions
|
||
|
|
- Overwriting stored results on key collision instead of rejecting
|
||
|
|
- Implementing idempotency at the wrong layer (e.g., only at the database level without API-level coordination)
|
||
|
|
- Not documenting which operations are idempotent and which are not
|