opencode-workflow/skills/consistency-transaction-design/SKILL.md

156 lines
7.9 KiB
Markdown

---
name: consistency-transaction-design
description: "Knowledge contract for consistency and transaction design. Provides principles and patterns for strong vs eventual consistency, idempotency, deduplication, retry, outbox pattern, saga, and compensation. Referenced by design-architecture when defining consistency model. Subsumes idempotency-design."
---
This is a knowledge contract, not a workflow skill. It provides theoretical guidance that the Architect references when designing consistency and transaction models. It does not produce artifacts directly.
This knowledge contract subsumes the previous `idempotency-design` contract. All idempotency concepts are included here alongside broader consistency and transaction patterns.
## Core Principles
### CAP Theorem
- **Consistency**: Every read receives the most recent write or an error
- **Availability**: Every request receives a (non-error) response, without guarantee that it contains the most recent write
- **Partition tolerance**: The system continues to operate despite an arbitrary number of messages being dropped or delayed by the network
- You cannot have all three simultaneously. Choose based on business requirements.
### Consistency Spectrum
- **Strong consistency**: Read always returns the latest write. Simplest mental model, but limits availability and scalability.
- **Causal consistency**: Reads respect causal ordering. Good for collaborative systems.
- **Eventual consistency**: Reads may return stale data, but converge over time. Highest availability and scalability.
- **Session consistency**: Reads within a session see their own writes. Good compromise for user-facing systems.
## Consistency Model Selection
### When to Use Strong Consistency
- Financial transactions (balances must be accurate)
- Inventory management (overselling is unacceptable)
- Unique constraint enforcement (duplicate records are unacceptable)
- Configuration data (wrong config causes system errors)
### When to Use Eventual Consistency
- Read-heavy workloads with high availability requirements
- Derived data (counts, aggregates, projections)
- Notification delivery (delay is acceptable)
- Analytics data (trend accuracy is sufficient)
- Search indexes (slight staleness is acceptable)
### Design Considerations
- Define the consistency model per data domain, not per system
- Document the expected replication lag and its business impact
- Define conflict resolution strategy for eventual consistency (last-write-wins, merge, manual)
- Define staleness tolerance per read pattern (how stale is acceptable?)
## Idempotency Design
### What is Idempotency?
An operation is idempotent if executing it once has the same effect as executing it multiple times.
### When Idempotency is Required
- Any operation triggered by user action (network retries, browser refresh)
- Any operation triggered by webhook (delivery may be duplicated)
- Any operation processed from a queue (at-least-once delivery)
- Any operation that modifies state (creates, updates, deletes)
### Idempotency Key Strategy
- **Source**: Where does the key come from? (client-generated, server-assigned, composite)
- **Format**: UUID, hash of request content, or composite key (user_id + action + timestamp)
- **TTL**: How long is the key stored? Must be long enough to catch retries, short enough to avoid storage bloat
- **Storage**: Where are idempotency keys stored? (database, Redis, in-memory)
### Idempotency Response Behavior
- **First request**: Process normally, return success response
- **Duplicate request**: Return the original response (stored alongside the idempotency key)
- **Concurrent request**: Return 409 Conflict or 425 Too Early (if the original request is still processing)
### Idempotency Collision Handling
- Different requests with the same key must be detected and rejected
- Keys must be unique per operation type and per client/tenant scope
## Deduplication
### Patterns
- **Idempotency key**: For request-level deduplication
- **Content hash**: For message-level deduplication (hash the message content)
- **Sequence number**: For ordered message deduplication (track last processed sequence)
- **Tombstone**: Mark processed messages to prevent reprocessing
### Design Considerations
- Define deduplication window (how long to track processed messages)
- Define deduplication scope (per-producer, per-consumer, per-queue)
- Define storage for deduplication state (Redis with TTL, database table)
- Define cleanup strategy for deduplication state
## Retry
### Retry Patterns
- **Fixed interval**: Retry at fixed intervals (simple, but may overload recovering service)
- **Exponential backoff**: Increase delay between retries (recommended default)
- **Exponential backoff with jitter**: Add randomness to prevent thundering herd
- **Circuit breaker**: Stop retrying after consecutive failures, try again after cooldown
### Design Considerations
- Define maximum retry count per operation
- Define backoff strategy (base, max, multiplier)
- Define retryable vs non-retryable errors
- Retryable: network timeout, 503, 429
- Non-retryable: 400, 401, 403, 404, 409
- Define retry budget (max retries per time window to prevent runaway retries)
- Define what to do after max retries (DLQ, alert, manual intervention)
## Outbox Pattern
### When to Use
- When you need to atomically write to a database and publish a message
- When you cannot use a distributed transaction across database and message broker
- When you need at-least-once message delivery guarantee
### How It Works
1. Write business data and outbox message to the same database transaction
2. A separate process reads the outbox table and publishes messages to the broker
3. Mark outbox messages as published after successful delivery
4.failed deliveries are retried by the outbox reader
### Design Considerations
- Outbox table must be in the same database as business data
- Outbox reader must handle duplicate delivery (consumer must be idempotent)
- Outbox reader polling interval affects delivery latency
- Define outbox message TTL and cleanup strategy
## Saga Pattern
### When to Use
- When a business operation spans multiple services and requires distributed transaction semantics
- When you need to rollback if any step fails
### Choreography-Based Saga
- Each service publishes events that trigger the next step
- No central coordinator
- Services must listen for events and decide what to do
- Compensation: each service publishes a compensation event if a step fails
### Orchestration-Based Saga
- A central orchestrator calls each service in sequence
- Orchestrator maintains saga state and decides which step to execute next
- Compensation: orchestrator calls compensation operations in reverse order
- More visible and debuggable, but adds a single point of failure
### Design Considerations
- Define saga steps and order
- Define compensation for each step (what to do if this step or a later step fails)
- Define saga timeout and expiration
- Define how to handle partial failures (which steps completed, which need compensation)
- Consider whether choreography or orchestration is more appropriate
- Choreography: simpler, more decoupled, harder to debug
- Orchestration: more visible, easier to debug, more coupled
## Anti-Patterns
- **Assuming strong consistency when using eventually consistent storage**: Be explicit about consistency guarantees
- **Missing idempotency for queue consumers**: Queue delivery is at-least-once, consumers must be idempotent
- **Infinite retries without backoff**: Always use exponential backoff with a maximum
- **Distributed transactions across services**: Use saga pattern instead of trying to enforce ACID across services
- **Outbox without deduplication**: Outbox pattern guarantees at-least-once delivery, consumers must handle duplicates
- **Saga without compensation**: Every saga step must have a defined compensation action
- **Missing conflict resolution for eventually consistent data**: Define how conflicts are resolved when they inevitably occur