437 lines
14 KiB
Markdown
437 lines
14 KiB
Markdown
---
|
|
name: design-architecture
|
|
description: "Design system architecture based on PRD requirements. The Architect pipeline's core step, producing the single strict output file with all deliverables: Architecture Doc, Mermaid Diagrams, API Contract, DB Schema, ADR, NFR, Security Boundaries, Integration Boundaries, Observability, Consistency Model."
|
|
---
|
|
|
|
This skill produces the complete architecture document for a feature, including all required deliverables.
|
|
|
|
**Announce at start:** "I'm using the design-architecture skill to design the system architecture."
|
|
|
|
## Primary Input
|
|
|
|
- `docs/prd/{feature}.md` (required)
|
|
|
|
## Primary Output (STRICT PATH)
|
|
|
|
- `docs/architecture/{feature}.md`
|
|
|
|
This is the **only** file artifact produced by the Architect pipeline. No intermediate files (research, analysis) are written to disk. All deliverables — diagrams, schemas, specs, ADRs — must be embedded within this single document.
|
|
|
|
## Hard Gate
|
|
|
|
Do NOT start this skill if the PRD has unresolved ambiguities that block architectural decisions. Resolve them with the PM first.
|
|
|
|
## Process
|
|
|
|
You MUST complete these steps in order:
|
|
|
|
1. **Read the PRD** at `docs/prd/{feature}.md` end-to-end to understand all requirements
|
|
2. **Apply internal analysis** from the `analyze-prd` step (if performed) to understand which knowledge domains are relevant
|
|
3. **Design each architecture section** based on PRD requirements and relevant knowledge domains
|
|
4. **Apply knowledge contracts** as needed:
|
|
- `system-decomposition` when designing service boundaries
|
|
- `api-contract-design` when defining API contracts
|
|
- `data-modeling` when designing database schema
|
|
- `distributed-system-basics` when dealing with distributed concerns
|
|
- `architecture-patterns` when selecting architectural patterns
|
|
- `storage-knowledge` when making storage technology decisions
|
|
- `async-queue-design` when designing asynchronous workflows
|
|
- `error-model-design` when defining error handling
|
|
- `security-boundary-design` when defining auth, authorization, tenant isolation
|
|
- `consistency-transaction-design` when defining consistency model, idempotency, saga
|
|
- `integration-boundary-design` when defining external API integration patterns
|
|
- `observability-design` when defining logs, metrics, traces, alerts, SLOs
|
|
- `migration-rollout-design` when defining rollout strategy, feature flags, rollback
|
|
5. **Apply deliverable skills** to produce concrete artifacts:
|
|
- `generate_mermaid_diagram` when producing diagrams
|
|
- `design_database_schema` when producing database schema
|
|
- `generate_openapi_spec` when producing API specifications
|
|
- `write_adr` when documenting architectural decisions
|
|
- `evaluate_tech_stack` when evaluating technology choices
|
|
6. **Ensure traceability** — every architectural decision must trace back to at least one PRD requirement
|
|
7. **Write completeness check** — verify all 18 required sections are present and substantive
|
|
8. **Write the architecture document** to `docs/architecture/{feature}.md`
|
|
|
|
## Architect Behavior Principles
|
|
|
|
Apply these principles in priority order when making design decisions:
|
|
1. **High Availability** — Design for fault tolerance and resilience over perfect consistency
|
|
2. **Scalability** — Design for horizontal scaling over vertical scaling
|
|
3. **Stateless First** — Prefer stateless services; externalize state to databases or caches
|
|
4. **API First** — Define contracts before implementation; APIs are the primary interface
|
|
5. **Event Driven First** — Prefer event-driven communication for cross-service coordination
|
|
6. **Async First** — Prefer asynchronous processing for non-realtime operations
|
|
|
|
## Architecture Document Template
|
|
|
|
```markdown
|
|
# Architecture: {Feature Name}
|
|
|
|
## Overview
|
|
|
|
High-level description of the system architecture. Map every major PRD requirement to an architectural component. Summarize the system's purpose, key design decisions, and architectural style.
|
|
|
|
### Requirement Traceability
|
|
|
|
| PRD Requirement | Architectural Component |
|
|
|----------------|------------------------|
|
|
| ... | ... |
|
|
|
|
## System Architecture
|
|
|
|
Describe the complete system architecture including all services, databases, message queues, caches, and external integrations. Show how components are organized, what technology stack each uses, and how they communicate.
|
|
|
|
### Technology Stack
|
|
|
|
| Layer | Technology | Justification |
|
|
|-------|-----------|---------------|
|
|
| Language | ... | ... |
|
|
| Framework | ... | ... |
|
|
| Database | ... | ... |
|
|
| Queue | ... | ... |
|
|
| Cache | ... | ... |
|
|
| Infrastructure | ... | ... |
|
|
|
|
If the feature has no backend component, write `N/A` with a brief reason.
|
|
|
|
### Component Architecture
|
|
|
|
Describe each major component, its responsibility, and how it fits into the overall system.
|
|
|
|
## Service Boundaries
|
|
|
|
Define service boundaries with clear responsibilities and communication patterns.
|
|
|
|
For each service or module:
|
|
- Name and single responsibility
|
|
- Owned data
|
|
- Communication patterns with other services (sync, async, event-driven)
|
|
- Potential coupling points and mitigation
|
|
|
|
### Communication Matrix
|
|
|
|
| From | To | Pattern | Protocol | Purpose |
|
|
|------|----|---------|----------|---------|
|
|
| ... | ... | ... | ... | ... |
|
|
|
|
## Data Flow
|
|
|
|
Describe how data moves through the system end-to-end. Include:
|
|
- Request lifecycle from entry point to response
|
|
- Background job processing flow
|
|
- Event propagation flow
|
|
- Data transformation and enrichment steps
|
|
|
|
## Database Schema
|
|
|
|
Define all database tables, columns, indexes, partition keys, constraints, and relationships. If the feature requires no database changes, write `N/A` with a brief reason.
|
|
|
|
### Table Definitions
|
|
|
|
For each table:
|
|
- Table name and purpose
|
|
- Column definitions (name, type, constraints, defaults)
|
|
- Indexes with justification based on query patterns
|
|
- Partition keys (where applicable)
|
|
- Foreign key relationships
|
|
|
|
### Entity Relationships
|
|
|
|
Describe relationships between tables.
|
|
|
|
### Denormalization Strategy
|
|
|
|
If denormalization is applied, document which fields are denormalized, why, and the consistency implications.
|
|
|
|
### Migration Strategy
|
|
|
|
Notes on migration approach if schema changes affect existing data.
|
|
|
|
## API Contract
|
|
|
|
Define all API endpoints with full specifications. Use OpenAPI-style definitions for REST APIs. For gRPC APIs, define the service and method specifications.
|
|
|
|
### Endpoint Catalog
|
|
|
|
| Method | Path | Description | PRD Requirement |
|
|
|--------|------|-------------|-----------------|
|
|
| ... | ... | ... | ... |
|
|
|
|
### Endpoint Details
|
|
|
|
For each endpoint:
|
|
- Method and path
|
|
- Request schema (headers, path params, query params, body)
|
|
- Response schema (success and error responses)
|
|
- Status codes
|
|
- Authentication requirements
|
|
- Idempotency requirements (when applicable)
|
|
- Rate limiting expectations (when applicable)
|
|
- Pagination and filtering (when applicable)
|
|
- PRD functional requirement it satisfies
|
|
|
|
### Error Codes
|
|
|
|
Define consistent error codes and error response format.
|
|
|
|
## Async / Queue Design
|
|
|
|
Define asynchronous operations and their behavior. If the feature has no asynchronous requirements, write `N/A` with a brief reason.
|
|
|
|
### Async Operations
|
|
|
|
For each async operation:
|
|
- Operation name and trigger
|
|
- Queue or event topic
|
|
- Producer and consumer
|
|
- Retry policy (max retries, backoff, DLQ)
|
|
- Ordering guarantees
|
|
- Timeout and cancellation behavior
|
|
|
|
## Consistency Model
|
|
|
|
Define the consistency guarantees of the system.
|
|
|
|
### Consistency Strategy
|
|
- Strong vs eventual consistency per data domain
|
|
- When eventual consistency is acceptable and why
|
|
- Conflict resolution strategies
|
|
|
|
### Idempotency Design
|
|
|
|
For each idempotent operation:
|
|
- Operation name
|
|
- Idempotency key source and format
|
|
- Key TTL and storage location
|
|
- Duplicate request behavior
|
|
- Collision handling
|
|
|
|
### Deduplication & Retry
|
|
- Deduplication strategy for messages and events
|
|
- Retry policies and backoff strategies
|
|
- Outbox pattern usage (when applicable)
|
|
- Saga / compensation patterns (when applicable)
|
|
|
|
If the feature has no consistency or idempotency requirements, write `N/A` with a brief reason.
|
|
|
|
## Error Model
|
|
|
|
Define error handling strategy across the system.
|
|
|
|
### Error Categories
|
|
- Client errors (4xx)
|
|
- Server errors (5xx)
|
|
- Business rule violations
|
|
- Timeout errors
|
|
- Cascading failure modes
|
|
|
|
### Error Propagation Strategy
|
|
- Fail-fast vs graceful degradation vs circuit breaker
|
|
- Fallback behavior
|
|
|
|
### Error Response Format
|
|
|
|
Consistent error response schema across the system.
|
|
|
|
### PRD Edge Case Mapping
|
|
|
|
| Error Category | PRD Edge Case | Handling Strategy |
|
|
|---------------|---------------|-------------------|
|
|
| ... | ... | ... |
|
|
|
|
## Security Boundaries
|
|
|
|
Define security architecture for the system.
|
|
|
|
- Authentication mechanism
|
|
- Authorization model (RBAC, ABAC, etc.)
|
|
- Service identity and service-to-service auth
|
|
- Token propagation strategy
|
|
- Tenant isolation (multi-tenancy model)
|
|
- Secret management approach
|
|
- Audit logging requirements
|
|
|
|
If the feature has no security implications, write `N/A` with a brief reason.
|
|
|
|
## Integration Boundaries
|
|
|
|
Define all integrations with external systems.
|
|
|
|
For each external system integration:
|
|
- External system name and purpose
|
|
- Integration pattern (API call, webhook, polling, event subscription)
|
|
- Rate limits and quotas
|
|
- Failure modes and fallback behavior
|
|
- Retry strategy
|
|
- Data contract (request/response schemas)
|
|
- Authentication mechanism
|
|
|
|
If the feature has no external integrations, write `N/A` with a brief reason.
|
|
|
|
## Observability
|
|
|
|
Define observability strategy for the system.
|
|
|
|
### Logs
|
|
- Log levels and what to log
|
|
- Structured logging format
|
|
- Log aggregation strategy
|
|
|
|
### Metrics
|
|
- Key business metrics
|
|
- Key system metrics
|
|
- Metric naming conventions
|
|
|
|
### Traces
|
|
- Distributed tracing strategy
|
|
- Correlation ID propagation
|
|
- Span boundaries
|
|
|
|
### Alerts
|
|
- Alert conditions and thresholds
|
|
- Alert routing and escalation
|
|
|
|
### SLOs
|
|
- Availability SLOs
|
|
- Latency SLOs
|
|
- Error budget
|
|
|
|
## Scaling Strategy
|
|
|
|
Define how the system scales based on NFRs.
|
|
|
|
- Horizontal scaling approach (which components scale independently)
|
|
- Vertical scaling considerations
|
|
- Database scaling strategy (read replicas, sharding, partitioning)
|
|
- Cache scaling strategy
|
|
- Queue scaling strategy
|
|
- Auto-scaling policies (when applicable)
|
|
- Bottleneck analysis
|
|
|
|
## Non-Functional Requirements
|
|
|
|
Document all NFRs from the PRD and how the architecture addresses each one.
|
|
|
|
| NFR | Requirement | Architectural Decision | Verification Method |
|
|
|-----|-------------|----------------------|---------------------|
|
|
| Performance | ... | ... | ... |
|
|
| Availability | ... | ... | ... |
|
|
| Scalability | ... | ... | ... |
|
|
| Security | ... | ... | ... |
|
|
| Compliance | ... | ... | ... |
|
|
|
|
## Mermaid Diagrams
|
|
|
|
Produce at minimum the following diagrams embedded in the document.
|
|
|
|
### System Architecture Diagram
|
|
|
|
```mermaid
|
|
graph TD
|
|
A[Component A] --> B[Component B]
|
|
B --> C[Database]
|
|
B --> D[Queue]
|
|
```
|
|
|
|
### Sequence Diagram
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Client
|
|
participant Service
|
|
participant DB
|
|
Client->>Service: Request
|
|
Service->>DB: Query
|
|
DB-->>Service: Result
|
|
Service-->>Client: Response
|
|
```
|
|
|
|
### Data Flow Diagram
|
|
|
|
```mermaid
|
|
graph LR
|
|
A[Source] --> B[Processing]
|
|
B --> C[Storage]
|
|
B --> D[Output]
|
|
```
|
|
|
|
Additional diagrams as needed (event flow, state machine, etc.).
|
|
|
|
## ADR
|
|
|
|
Document significant architectural decisions.
|
|
|
|
### ADR-001: {Decision Title}
|
|
|
|
- **Context**: Why this decision was needed, including which PRD requirements drove it
|
|
- **Decision**: What was decided
|
|
- **Consequences**: What trade-offs or implications result
|
|
- **Alternatives**: What other options were considered
|
|
|
|
(Add additional ADRs as needed for each significant decision.)
|
|
|
|
## Risks
|
|
|
|
Identify and document architectural risks:
|
|
|
|
| Risk | Impact | Likelihood | Mitigation |
|
|
|------|--------|-----------|------------|
|
|
| ... | High/Medium/Low | High/Medium/Low | ... |
|
|
|
|
## Open Questions
|
|
|
|
List any unresolved questions that need PM or Engineering input:
|
|
|
|
1. ...
|
|
2. ...
|
|
```
|
|
|
|
## Completeness Check
|
|
|
|
Before finalizing the architecture document, verify:
|
|
|
|
1. All 18 required sections are present (or explicitly marked N/A with reason)
|
|
2. Every PRD functional requirement is traced to at least one architectural component
|
|
3. Every PRD NFR is traced to at least one architectural decision
|
|
4. Every architecture section that is not N/A has substantive content
|
|
5. All API endpoints map to PRD functional requirements
|
|
6. All DB tables map to data requirements from functional requirements or NFRs
|
|
7. All async flows map to PRD requirements
|
|
8. All error handling strategies map to PRD edge cases
|
|
9. ADRs exist for all significant decisions (minimum 1)
|
|
10. At least 3 Mermaid diagrams are present (system, sequence, data flow)
|
|
11. Service boundaries are aligned with domain responsibilities
|
|
12. Security boundaries are defined
|
|
13. Integration boundaries are defined for all external systems
|
|
14. Observability strategy covers logs, metrics, and traces
|
|
15. Consistency model is explicit about strong vs eventual guarantees
|
|
16. No architectural element exists without traceability to a PRD requirement
|
|
|
|
## Guardrails
|
|
|
|
This is a pure Architecture skill.
|
|
|
|
Do:
|
|
- Design system structure and boundaries
|
|
- Define API contracts and data models
|
|
- Define error handling, retry, and consistency strategies
|
|
- Define security boundaries and integration patterns
|
|
- Produce Mermaid diagrams, DB schemas, API specs, and ADRs
|
|
- Make architectural decisions with clear rationale and alternatives
|
|
- Ensure traceability to PRD requirements
|
|
|
|
Do not:
|
|
- Change PRD requirements or scope
|
|
- Create task breakdowns, milestones, or deliverables
|
|
- Write test cases or test plans
|
|
- Write implementation code or pseudocode
|
|
- Choose specific libraries or frameworks at the implementation level
|
|
- Prescribe code patterns, class structures, or function-level logic
|
|
- Produce any file artifact other than `docs/architecture/{feature}.md`
|
|
|
|
The Architect defines HOW the system is structured.
|
|
The Engineering defines HOW the code is written.
|
|
|
|
## Transition
|
|
|
|
After completing the architecture document, invoke `challenge-architecture` to validate and stress-test the architecture. |