--- name: design-architecture description: "Design system architecture based on PRD requirements. The Architect pipeline's core step, producing the single strict output file with all deliverables: Architecture Doc, Mermaid Diagrams, API Contract, DB Schema, ADR, NFR, Security Boundaries, Integration Boundaries, Observability, Consistency Model." --- This skill produces the complete architecture document for a feature, including all required deliverables. **Announce at start:** "I'm using the design-architecture skill to design the system architecture." ## Primary Input - `docs/prd/{feature}.md` (required) ## Primary Output (STRICT PATH) - `docs/architecture/{feature}.md` This is the **only** file artifact produced by the Architect pipeline. No intermediate files (research, analysis) are written to disk. All deliverables — diagrams, schemas, specs, ADRs — must be embedded within this single document. ## Hard Gate Do NOT start this skill if the PRD has unresolved ambiguities that block architectural decisions. Resolve them with the PM first. ## Process You MUST complete these steps in order: 1. **Read the PRD** at `docs/prd/{feature}.md` end-to-end to understand all requirements 2. **Apply internal analysis** from the `analyze-prd` step (if performed) to understand which knowledge domains are relevant 3. **Design each architecture section** based on PRD requirements and relevant knowledge domains 4. **Apply knowledge contracts** as needed: - `system-decomposition` when designing service boundaries - `api-contract-design` when defining API contracts - `data-modeling` when designing database schema - `distributed-system-basics` when dealing with distributed concerns - `architecture-patterns` when selecting architectural patterns - `storage-knowledge` when making storage technology decisions - `async-queue-design` when designing asynchronous workflows - `error-model-design` when defining error handling - `security-boundary-design` when defining auth, authorization, tenant isolation - `consistency-transaction-design` when defining consistency model, idempotency, saga - `integration-boundary-design` when defining external API integration patterns - `observability-design` when defining logs, metrics, traces, alerts, SLOs - `migration-rollout-design` when defining rollout strategy, feature flags, rollback 5. **Apply deliverable skills** to produce concrete artifacts: - `generate_mermaid_diagram` when producing diagrams - `design_database_schema` when producing database schema - `generate_openapi_spec` when producing API specifications - `write_adr` when documenting architectural decisions - `evaluate_tech_stack` when evaluating technology choices 6. **Ensure traceability** — every architectural decision must trace back to at least one PRD requirement 7. **Write completeness check** — verify all 18 required sections are present and substantive 8. **Write the architecture document** to `docs/architecture/{feature}.md` ## Architect Behavior Principles Apply these principles in priority order when making design decisions: 1. **High Availability** — Design for fault tolerance and resilience over perfect consistency 2. **Scalability** — Design for horizontal scaling over vertical scaling 3. **Stateless First** — Prefer stateless services; externalize state to databases or caches 4. **API First** — Define contracts before implementation; APIs are the primary interface 5. **Event Driven First** — Prefer event-driven communication for cross-service coordination 6. **Async First** — Prefer asynchronous processing for non-realtime operations ## Architecture Document Template ```markdown # Architecture: {Feature Name} ## Overview High-level description of the system architecture. Map every major PRD requirement to an architectural component. Summarize the system's purpose, key design decisions, and architectural style. ### Requirement Traceability | PRD Requirement | Architectural Component | |----------------|------------------------| | ... | ... | ## System Architecture Describe the complete system architecture including all services, databases, message queues, caches, and external integrations. Show how components are organized, what technology stack each uses, and how they communicate. ### Technology Stack | Layer | Technology | Justification | |-------|-----------|---------------| | Language | ... | ... | | Framework | ... | ... | | Database | ... | ... | | Queue | ... | ... | | Cache | ... | ... | | Infrastructure | ... | ... | If the feature has no backend component, write `N/A` with a brief reason. ### Component Architecture Describe each major component, its responsibility, and how it fits into the overall system. ## Service Boundaries Define service boundaries with clear responsibilities and communication patterns. For each service or module: - Name and single responsibility - Owned data - Communication patterns with other services (sync, async, event-driven) - Potential coupling points and mitigation ### Communication Matrix | From | To | Pattern | Protocol | Purpose | |------|----|---------|----------|---------| | ... | ... | ... | ... | ... | ## Data Flow Describe how data moves through the system end-to-end. Include: - Request lifecycle from entry point to response - Background job processing flow - Event propagation flow - Data transformation and enrichment steps ## Database Schema Define all database tables, columns, indexes, partition keys, constraints, and relationships. If the feature requires no database changes, write `N/A` with a brief reason. ### Table Definitions For each table: - Table name and purpose - Column definitions (name, type, constraints, defaults) - Indexes with justification based on query patterns - Partition keys (where applicable) - Foreign key relationships ### Entity Relationships Describe relationships between tables. ### Denormalization Strategy If denormalization is applied, document which fields are denormalized, why, and the consistency implications. ### Migration Strategy Notes on migration approach if schema changes affect existing data. ## API Contract Define all API endpoints with full specifications. Use OpenAPI-style definitions for REST APIs. For gRPC APIs, define the service and method specifications. ### Endpoint Catalog | Method | Path | Description | PRD Requirement | |--------|------|-------------|-----------------| | ... | ... | ... | ... | ### Endpoint Details For each endpoint: - Method and path - Request schema (headers, path params, query params, body) - Response schema (success and error responses) - Status codes - Authentication requirements - Idempotency requirements (when applicable) - Rate limiting expectations (when applicable) - Pagination and filtering (when applicable) - PRD functional requirement it satisfies ### Error Codes Define consistent error codes and error response format. ## Async / Queue Design Define asynchronous operations and their behavior. If the feature has no asynchronous requirements, write `N/A` with a brief reason. ### Async Operations For each async operation: - Operation name and trigger - Queue or event topic - Producer and consumer - Retry policy (max retries, backoff, DLQ) - Ordering guarantees - Timeout and cancellation behavior ## Consistency Model Define the consistency guarantees of the system. ### Consistency Strategy - Strong vs eventual consistency per data domain - When eventual consistency is acceptable and why - Conflict resolution strategies ### Idempotency Design For each idempotent operation: - Operation name - Idempotency key source and format - Key TTL and storage location - Duplicate request behavior - Collision handling ### Deduplication & Retry - Deduplication strategy for messages and events - Retry policies and backoff strategies - Outbox pattern usage (when applicable) - Saga / compensation patterns (when applicable) If the feature has no consistency or idempotency requirements, write `N/A` with a brief reason. ## Error Model Define error handling strategy across the system. ### Error Categories - Client errors (4xx) - Server errors (5xx) - Business rule violations - Timeout errors - Cascading failure modes ### Error Propagation Strategy - Fail-fast vs graceful degradation vs circuit breaker - Fallback behavior ### Error Response Format Consistent error response schema across the system. ### PRD Edge Case Mapping | Error Category | PRD Edge Case | Handling Strategy | |---------------|---------------|-------------------| | ... | ... | ... | ## Security Boundaries Define security architecture for the system. - Authentication mechanism - Authorization model (RBAC, ABAC, etc.) - Service identity and service-to-service auth - Token propagation strategy - Tenant isolation (multi-tenancy model) - Secret management approach - Audit logging requirements If the feature has no security implications, write `N/A` with a brief reason. ## Integration Boundaries Define all integrations with external systems. For each external system integration: - External system name and purpose - Integration pattern (API call, webhook, polling, event subscription) - Rate limits and quotas - Failure modes and fallback behavior - Retry strategy - Data contract (request/response schemas) - Authentication mechanism If the feature has no external integrations, write `N/A` with a brief reason. ## Observability Define observability strategy for the system. ### Logs - Log levels and what to log - Structured logging format - Log aggregation strategy ### Metrics - Key business metrics - Key system metrics - Metric naming conventions ### Traces - Distributed tracing strategy - Correlation ID propagation - Span boundaries ### Alerts - Alert conditions and thresholds - Alert routing and escalation ### SLOs - Availability SLOs - Latency SLOs - Error budget ## Scaling Strategy Define how the system scales based on NFRs. - Horizontal scaling approach (which components scale independently) - Vertical scaling considerations - Database scaling strategy (read replicas, sharding, partitioning) - Cache scaling strategy - Queue scaling strategy - Auto-scaling policies (when applicable) - Bottleneck analysis ## Non-Functional Requirements Document all NFRs from the PRD and how the architecture addresses each one. | NFR | Requirement | Architectural Decision | Verification Method | |-----|-------------|----------------------|---------------------| | Performance | ... | ... | ... | | Availability | ... | ... | ... | | Scalability | ... | ... | ... | | Security | ... | ... | ... | | Compliance | ... | ... | ... | ## Mermaid Diagrams Produce at minimum the following diagrams embedded in the document. ### System Architecture Diagram ```mermaid graph TD A[Component A] --> B[Component B] B --> C[Database] B --> D[Queue] ``` ### Sequence Diagram ```mermaid sequenceDiagram participant Client participant Service participant DB Client->>Service: Request Service->>DB: Query DB-->>Service: Result Service-->>Client: Response ``` ### Data Flow Diagram ```mermaid graph LR A[Source] --> B[Processing] B --> C[Storage] B --> D[Output] ``` Additional diagrams as needed (event flow, state machine, etc.). ## ADR Document significant architectural decisions. ### ADR-001: {Decision Title} - **Context**: Why this decision was needed, including which PRD requirements drove it - **Decision**: What was decided - **Consequences**: What trade-offs or implications result - **Alternatives**: What other options were considered (Add additional ADRs as needed for each significant decision.) ## Risks Identify and document architectural risks: | Risk | Impact | Likelihood | Mitigation | |------|--------|-----------|------------| | ... | High/Medium/Low | High/Medium/Low | ... | ## Open Questions List any unresolved questions that need PM or Engineering input: 1. ... 2. ... ``` ## Completeness Check Before finalizing the architecture document, verify: 1. All 18 required sections are present (or explicitly marked N/A with reason) 2. Every PRD functional requirement is traced to at least one architectural component 3. Every PRD NFR is traced to at least one architectural decision 4. Every architecture section that is not N/A has substantive content 5. All API endpoints map to PRD functional requirements 6. All DB tables map to data requirements from functional requirements or NFRs 7. All async flows map to PRD requirements 8. All error handling strategies map to PRD edge cases 9. ADRs exist for all significant decisions (minimum 1) 10. At least 3 Mermaid diagrams are present (system, sequence, data flow) 11. Service boundaries are aligned with domain responsibilities 12. Security boundaries are defined 13. Integration boundaries are defined for all external systems 14. Observability strategy covers logs, metrics, and traces 15. Consistency model is explicit about strong vs eventual guarantees 16. No architectural element exists without traceability to a PRD requirement ## Guardrails This is a pure Architecture skill. Do: - Design system structure and boundaries - Define API contracts and data models - Define error handling, retry, and consistency strategies - Define security boundaries and integration patterns - Produce Mermaid diagrams, DB schemas, API specs, and ADRs - Make architectural decisions with clear rationale and alternatives - Ensure traceability to PRD requirements Do not: - Change PRD requirements or scope - Create task breakdowns, milestones, or deliverables - Write test cases or test plans - Write implementation code or pseudocode - Choose specific libraries or frameworks at the implementation level - Prescribe code patterns, class structures, or function-level logic - Produce any file artifact other than `docs/architecture/{feature}.md` The Architect defines HOW the system is structured. The Engineering defines HOW the code is written. ## Transition After completing the architecture document, invoke `challenge-architecture` to validate and stress-test the architecture.