--- name: consistency-transaction-design description: "Knowledge contract for consistency and transaction design. Provides principles and patterns for strong vs eventual consistency, idempotency, deduplication, retry, outbox pattern, saga, and compensation. Referenced by design-architecture when defining consistency model. Subsumes idempotency-design." --- This is a knowledge contract, not a workflow skill. It provides theoretical guidance that the Architect references when designing consistency and transaction models. It does not produce artifacts directly. This knowledge contract subsumes the previous `idempotency-design` contract. All idempotency concepts are included here alongside broader consistency and transaction patterns. ## Core Principles ### CAP Theorem - **Consistency**: Every read receives the most recent write or an error - **Availability**: Every request receives a (non-error) response, without guarantee that it contains the most recent write - **Partition tolerance**: The system continues to operate despite an arbitrary number of messages being dropped or delayed by the network - You cannot have all three simultaneously. Choose based on business requirements. ### Consistency Spectrum - **Strong consistency**: Read always returns the latest write. Simplest mental model, but limits availability and scalability. - **Causal consistency**: Reads respect causal ordering. Good for collaborative systems. - **Eventual consistency**: Reads may return stale data, but converge over time. Highest availability and scalability. - **Session consistency**: Reads within a session see their own writes. Good compromise for user-facing systems. ## Consistency Model Selection ### When to Use Strong Consistency - Financial transactions (balances must be accurate) - Inventory management (overselling is unacceptable) - Unique constraint enforcement (duplicate records are unacceptable) - Configuration data (wrong config causes system errors) ### When to Use Eventual Consistency - Read-heavy workloads with high availability requirements - Derived data (counts, aggregates, projections) - Notification delivery (delay is acceptable) - Analytics data (trend accuracy is sufficient) - Search indexes (slight staleness is acceptable) ### Design Considerations - Define the consistency model per data domain, not per system - Document the expected replication lag and its business impact - Define conflict resolution strategy for eventual consistency (last-write-wins, merge, manual) - Define staleness tolerance per read pattern (how stale is acceptable?) ## Idempotency Design ### What is Idempotency? An operation is idempotent if executing it once has the same effect as executing it multiple times. ### When Idempotency is Required - Any operation triggered by user action (network retries, browser refresh) - Any operation triggered by webhook (delivery may be duplicated) - Any operation processed from a queue (at-least-once delivery) - Any operation that modifies state (creates, updates, deletes) ### Idempotency Key Strategy - **Source**: Where does the key come from? (client-generated, server-assigned, composite) - **Format**: UUID, hash of request content, or composite key (user_id + action + timestamp) - **TTL**: How long is the key stored? Must be long enough to catch retries, short enough to avoid storage bloat - **Storage**: Where are idempotency keys stored? (database, Redis, in-memory) ### Idempotency Response Behavior - **First request**: Process normally, return success response - **Duplicate request**: Return the original response (stored alongside the idempotency key) - **Concurrent request**: Return 409 Conflict or 425 Too Early (if the original request is still processing) ### Idempotency Collision Handling - Different requests with the same key must be detected and rejected - Keys must be unique per operation type and per client/tenant scope ## Deduplication ### Patterns - **Idempotency key**: For request-level deduplication - **Content hash**: For message-level deduplication (hash the message content) - **Sequence number**: For ordered message deduplication (track last processed sequence) - **Tombstone**: Mark processed messages to prevent reprocessing ### Design Considerations - Define deduplication window (how long to track processed messages) - Define deduplication scope (per-producer, per-consumer, per-queue) - Define storage for deduplication state (Redis with TTL, database table) - Define cleanup strategy for deduplication state ## Retry ### Retry Patterns - **Fixed interval**: Retry at fixed intervals (simple, but may overload recovering service) - **Exponential backoff**: Increase delay between retries (recommended default) - **Exponential backoff with jitter**: Add randomness to prevent thundering herd - **Circuit breaker**: Stop retrying after consecutive failures, try again after cooldown ### Design Considerations - Define maximum retry count per operation - Define backoff strategy (base, max, multiplier) - Define retryable vs non-retryable errors - Retryable: network timeout, 503, 429 - Non-retryable: 400, 401, 403, 404, 409 - Define retry budget (max retries per time window to prevent runaway retries) - Define what to do after max retries (DLQ, alert, manual intervention) ## Outbox Pattern ### When to Use - When you need to atomically write to a database and publish a message - When you cannot use a distributed transaction across database and message broker - When you need at-least-once message delivery guarantee ### How It Works 1. Write business data and outbox message to the same database transaction 2. A separate process reads the outbox table and publishes messages to the broker 3. Mark outbox messages as published after successful delivery 4.failed deliveries are retried by the outbox reader ### Design Considerations - Outbox table must be in the same database as business data - Outbox reader must handle duplicate delivery (consumer must be idempotent) - Outbox reader polling interval affects delivery latency - Define outbox message TTL and cleanup strategy ## Saga Pattern ### When to Use - When a business operation spans multiple services and requires distributed transaction semantics - When you need to rollback if any step fails ### Choreography-Based Saga - Each service publishes events that trigger the next step - No central coordinator - Services must listen for events and decide what to do - Compensation: each service publishes a compensation event if a step fails ### Orchestration-Based Saga - A central orchestrator calls each service in sequence - Orchestrator maintains saga state and decides which step to execute next - Compensation: orchestrator calls compensation operations in reverse order - More visible and debuggable, but adds a single point of failure ### Design Considerations - Define saga steps and order - Define compensation for each step (what to do if this step or a later step fails) - Define saga timeout and expiration - Define how to handle partial failures (which steps completed, which need compensation) - Consider whether choreography or orchestration is more appropriate - Choreography: simpler, more decoupled, harder to debug - Orchestration: more visible, easier to debug, more coupled ## Anti-Patterns - **Assuming strong consistency when using eventually consistent storage**: Be explicit about consistency guarantees - **Missing idempotency for queue consumers**: Queue delivery is at-least-once, consumers must be idempotent - **Infinite retries without backoff**: Always use exponential backoff with a maximum - **Distributed transactions across services**: Use saga pattern instead of trying to enforce ACID across services - **Outbox without deduplication**: Outbox pattern guarantees at-least-once delivery, consumers must handle duplicates - **Saga without compensation**: Every saga step must have a defined compensation action - **Missing conflict resolution for eventually consistent data**: Define how conflicts are resolved when they inevitably occur