opencode-workflow/skills/design-architecture/SKILL.md

---
name: design-architecture
description: "Design system architecture based on PRD requirements. The Architect pipeline's core step, producing the single strict output file with all deliverables: Architecture Doc, Mermaid Diagrams, API Contract, DB Schema, ADR, NFR, Security Boundaries, Integration Boundaries, Observability, Consistency Model."
---

This skill produces the complete architecture document for a feature, including all required deliverables.

**Announce at start:** "I'm using the design-architecture skill to design the system architecture."

## Primary Input

- `docs/prd/{feature}.md` (required)

## Primary Output (STRICT PATH)

- `docs/architecture/{feature}.md`

This is the **only** file artifact produced by the Architect pipeline. No intermediate files (research, analysis) are written to disk. All deliverables — diagrams, schemas, specs, ADRs — must be embedded within this single document.

## Hard Gate

Do NOT start this skill if the PRD has unresolved ambiguities that block architectural decisions. Resolve them with the PM first.

## Process

You MUST complete these steps in order:

1. **Read the PRD** at `docs/prd/{feature}.md` end-to-end to understand all requirements
2. **Apply internal analysis** from the `analyze-prd` step (if performed) to understand which knowledge domains are relevant
3. **Design each architecture section** based on PRD requirements and relevant knowledge domains
4. **Apply knowledge contracts** as needed:
   - `system-decomposition` when designing service boundaries
   - `api-contract-design` when defining API contracts
   - `data-modeling` when designing database schema
   - `distributed-system-basics` when dealing with distributed concerns
   - `architecture-patterns` when selecting architectural patterns
   - `storage-knowledge` when making storage technology decisions
   - `async-queue-design` when designing asynchronous workflows
   - `error-model-design` when defining error handling
   - `security-boundary-design` when defining auth, authorization, tenant isolation
   - `consistency-transaction-design` when defining consistency model, idempotency, saga
   - `integration-boundary-design` when defining external API integration patterns
   - `observability-design` when defining logs, metrics, traces, alerts, SLOs
   - `migration-rollout-design` when defining rollout strategy, feature flags, rollback
5. **Apply deliverable skills** to produce concrete artifacts:
   - `generate_mermaid_diagram` when producing diagrams
   - `design_database_schema` when producing database schema
   - `generate_openapi_spec` when producing API specifications
   - `write_adr` when documenting architectural decisions
   - `evaluate_tech_stack` when evaluating technology choices
6. **Ensure traceability** — every architectural decision must trace back to at least one PRD requirement
7. **Write completeness check** — verify all 18 required sections are present and substantive
8. **Write the architecture document** to `docs/architecture/{feature}.md`

## Architect Behavior Principles

Apply these principles in priority order when making design decisions:
1. **High Availability** — Design for fault tolerance and resilience over perfect consistency
2. **Scalability** — Design for horizontal scaling over vertical scaling
3. **Stateless First** — Prefer stateless services; externalize state to databases or caches
4. **API First** — Define contracts before implementation; APIs are the primary interface
5. **Event Driven First** — Prefer event-driven communication for cross-service coordination
6. **Async First** — Prefer asynchronous processing for non-realtime operations

## Architecture Document Template

```markdown
# Architecture: {Feature Name}

## Overview

High-level description of the system architecture. Map every major PRD requirement to an architectural component. Summarize the system's purpose, key design decisions, and architectural style.

### Requirement Traceability

| PRD Requirement | Architectural Component |
|----------------|------------------------|
| ... | ... |

## System Architecture

Describe the complete system architecture including all services, databases, message queues, caches, and external integrations. Show how components are organized, what technology stack each uses, and how they communicate.

### Technology Stack

| Layer | Technology | Justification |
|-------|-----------|---------------|
| Language | ... | ... |
| Framework | ... | ... |
| Database | ... | ... |
| Queue | ... | ... |
| Cache | ... | ... |
| Infrastructure | ... | ... |

If the feature has no backend component, write `N/A` with a brief reason.

### Component Architecture

Describe each major component, its responsibility, and how it fits into the overall system.

## Service Boundaries

Define service boundaries with clear responsibilities and communication patterns.

For each service or module:
- Name and single responsibility
- Owned data
- Communication patterns with other services (sync, async, event-driven)
- Potential coupling points and mitigation

### Communication Matrix

| From | To | Pattern | Protocol | Purpose |
|------|----|---------|----------|---------|
| ... | ... | ... | ... | ... |

## Data Flow

Describe how data moves through the system end-to-end. Include:
- Request lifecycle from entry point to response
- Background job processing flow
- Event propagation flow
- Data transformation and enrichment steps

## Database Schema

Define all database tables, columns, indexes, partition keys, constraints, and relationships. If the feature requires no database changes, write `N/A` with a brief reason.

### Table Definitions

For each table:
- Table name and purpose
- Column definitions (name, type, constraints, defaults)
- Indexes with justification based on query patterns
- Partition keys (where applicable)
- Foreign key relationships

### Entity Relationships

Describe relationships between tables.

### Denormalization Strategy

If denormalization is applied, document which fields are denormalized, why, and the consistency implications.

### Migration Strategy

Notes on migration approach if schema changes affect existing data.

## API Contract

Define all API endpoints with full specifications. Use OpenAPI-style definitions for REST APIs. For gRPC APIs, define the service and method specifications.

### Endpoint Catalog

| Method | Path | Description | PRD Requirement |
|--------|------|-------------|-----------------|
| ... | ... | ... | ... |

### Endpoint Details

For each endpoint:
- Method and path
- Request schema (headers, path params, query params, body)
- Response schema (success and error responses)
- Status codes
- Authentication requirements
- Idempotency requirements (when applicable)
- Rate limiting expectations (when applicable)
- Pagination and filtering (when applicable)
- PRD functional requirement it satisfies

### Error Codes

Define consistent error codes and error response format.

## Async / Queue Design

Define asynchronous operations and their behavior. If the feature has no asynchronous requirements, write `N/A` with a brief reason.

### Async Operations

For each async operation:
- Operation name and trigger
- Queue or event topic
- Producer and consumer
- Retry policy (max retries, backoff, DLQ)
- Ordering guarantees
- Timeout and cancellation behavior

## Consistency Model

Define the consistency guarantees of the system.

### Consistency Strategy
- Strong vs eventual consistency per data domain
- When eventual consistency is acceptable and why
- Conflict resolution strategies

### Idempotency Design

For each idempotent operation:
- Operation name
- Idempotency key source and format
- Key TTL and storage location
- Duplicate request behavior
- Collision handling

### Deduplication & Retry
- Deduplication strategy for messages and events
- Retry policies and backoff strategies
- Outbox pattern usage (when applicable)
- Saga / compensation patterns (when applicable)

If the feature has no consistency or idempotency requirements, write `N/A` with a brief reason.

## Error Model

Define error handling strategy across the system.

### Error Categories
- Client errors (4xx)
- Server errors (5xx)
- Business rule violations
- Timeout errors
- Cascading failure modes

### Error Propagation Strategy
- Fail-fast vs graceful degradation vs circuit breaker
- Fallback behavior

### Error Response Format

Consistent error response schema across the system.

### PRD Edge Case Mapping

| Error Category | PRD Edge Case | Handling Strategy |
|---------------|---------------|-------------------|
| ... | ... | ... |

## Security Boundaries

Define security architecture for the system.

- Authentication mechanism
- Authorization model (RBAC, ABAC, etc.)
- Service identity and service-to-service auth
- Token propagation strategy
- Tenant isolation (multi-tenancy model)
- Secret management approach
- Audit logging requirements

If the feature has no security implications, write `N/A` with a brief reason.

## Integration Boundaries

Define all integrations with external systems.

For each external system integration:
- External system name and purpose
- Integration pattern (API call, webhook, polling, event subscription)
- Rate limits and quotas
- Failure modes and fallback behavior
- Retry strategy
- Data contract (request/response schemas)
- Authentication mechanism

If the feature has no external integrations, write `N/A` with a brief reason.

## Observability

Define observability strategy for the system.

### Logs
- Log levels and what to log
- Structured logging format
- Log aggregation strategy

### Metrics
- Key business metrics
- Key system metrics
- Metric naming conventions

### Traces
- Distributed tracing strategy
- Correlation ID propagation
- Span boundaries

### Alerts
- Alert conditions and thresholds
- Alert routing and escalation

### SLOs
- Availability SLOs
- Latency SLOs
- Error budget

## Scaling Strategy

Define how the system scales based on NFRs.

- Horizontal scaling approach (which components scale independently)
- Vertical scaling considerations
- Database scaling strategy (read replicas, sharding, partitioning)
- Cache scaling strategy
- Queue scaling strategy
- Auto-scaling policies (when applicable)
- Bottleneck analysis

## Non-Functional Requirements

Document all NFRs from the PRD and how the architecture addresses each one.

| NFR | Requirement | Architectural Decision | Verification Method |
|-----|-------------|----------------------|---------------------|
| Performance | ... | ... | ... |
| Availability | ... | ... | ... |
| Scalability | ... | ... | ... |
| Security | ... | ... | ... |
| Compliance | ... | ... | ... |

## Mermaid Diagrams

Produce at minimum the following diagrams embedded in the document.

### System Architecture Diagram

```mermaid
graph TD
    A[Component A] --> B[Component B]
    B --> C[Database]
    B --> D[Queue]
```

### Sequence Diagram

```mermaid
sequenceDiagram
    participant Client
    participant Service
    participant DB
    Client->>Service: Request
    Service->>DB: Query
    DB-->>Service: Result
    Service-->>Client: Response
```

### Data Flow Diagram

```mermaid
graph LR
    A[Source] --> B[Processing]
    B --> C[Storage]
    B --> D[Output]
```

Additional diagrams as needed (event flow, state machine, etc.).

## ADR

Document significant architectural decisions.

### ADR-001: {Decision Title}

- **Context**: Why this decision was needed, including which PRD requirements drove it
- **Decision**: What was decided
- **Consequences**: What trade-offs or implications result
- **Alternatives**: What other options were considered

(Add additional ADRs as needed for each significant decision.)

## Risks

Identify and document architectural risks:

| Risk | Impact | Likelihood | Mitigation |
|------|--------|-----------|------------|
| ... | High/Medium/Low | High/Medium/Low | ... |

## Open Questions

List any unresolved questions that need PM or Engineering input:

1. ...
2. ...
```

## Completeness Check

Before finalizing the architecture document, verify:

1. All 18 required sections are present (or explicitly marked N/A with reason)
2. Every PRD functional requirement is traced to at least one architectural component
3. Every PRD NFR is traced to at least one architectural decision
4. Every architecture section that is not N/A has substantive content
5. All API endpoints map to PRD functional requirements
6. All DB tables map to data requirements from functional requirements or NFRs
7. All async flows map to PRD requirements
8. All error handling strategies map to PRD edge cases
9. ADRs exist for all significant decisions (minimum 1)
10. At least 3 Mermaid diagrams are present (system, sequence, data flow)
11. Service boundaries are aligned with domain responsibilities
12. Security boundaries are defined
13. Integration boundaries are defined for all external systems
14. Observability strategy covers logs, metrics, and traces
15. Consistency model is explicit about strong vs eventual guarantees
16. No architectural element exists without traceability to a PRD requirement

## Guardrails

This is a pure Architecture skill.

Do:
- Design system structure and boundaries
- Define API contracts and data models
- Define error handling, retry, and consistency strategies
- Define security boundaries and integration patterns
- Produce Mermaid diagrams, DB schemas, API specs, and ADRs
- Make architectural decisions with clear rationale and alternatives
- Ensure traceability to PRD requirements

Do not:
- Change PRD requirements or scope
- Create task breakdowns, milestones, or deliverables
- Write test cases or test plans
- Write implementation code or pseudocode
- Choose specific libraries or frameworks at the implementation level
- Prescribe code patterns, class structures, or function-level logic
- Produce any file artifact other than `docs/architecture/{feature}.md`

The Architect defines HOW the system is structured.
The Engineering defines HOW the code is written.

## Transition

After completing the architecture document, invoke `challenge-architecture` to validate and stress-test the architecture.