9.6 KiB
| name | description |
|---|---|
| challenge-architecture | Stress-test architecture decisions, check PRD traceability, detect over-engineering, validate scalability, consistency, security, integration, and observability. Updates the single architecture file in place. |
Interview the architect relentlessly about every aspect of this architecture until it passes quality gates. Walk down each branch of the architecture decision tree, validating traceability, necessity, and soundness one-by-one.
Focus on system design validation, not implementation details. If a question drifts into code-level patterns, library choices, or implementation specifics, redirect it back to architecture-level concerns.
Announce at start: "I'm using the challenge-architecture skill to validate and stress-test the architecture."
Ask the questions one at a time.
Primary Input
docs/architecture/{feature}.mddocs/prd/{feature}.md
Primary Output (STRICT PATH)
- Updated
docs/architecture/{feature}.md
This is the only file artifact in the Architect pipeline. Challenge results are applied directly to this file. No intermediate files are written.
Process
Phase 1: Traceability Audit
For every architectural element, verify it traces back to at least one PRD requirement:
- Does every API endpoint serve a PRD functional requirement?
- Does every DB table serve a data requirement from functional requirements or NFRs?
- Does every service boundary serve a domain responsibility from the PRD scope?
- Does every async flow serve a PRD requirement?
- Does every error handling strategy serve a PRD edge case or NFR?
- Does every consistency decision serve a PRD requirement?
- Does every security boundary serve a security or compliance requirement?
- Does every integration boundary serve an external system requirement?
- Does every observability decision serve an NFR?
Flag any architectural element that exists without PRD traceability as potential over-engineering.
Phase 2: Requirement Coverage Audit
For every PRD requirement, verify it is covered by the architecture:
- Does every functional requirement have at least one architectural component serving it?
- Does every NFR have at least one architectural decision addressing it?
- Does every edge case have an error handling strategy?
- Does every acceptance criterion have architectural support?
- Are there PRD requirements that the architecture does not address?
Flag any uncovered PRD requirement as a gap.
Phase 3: Architecture Decision Validation
For each Architectural Decision Record, challenge:
- Is the decision necessary, or could a simpler approach work?
- Are the alternatives fairly evaluated, or is there a strawman?
- Is the rationale specific to this use case, or generic boilerplate?
- Are the consequences honestly assessed?
- Does the decision optimize for maintainability, scalability, reliability, clarity, and bounded responsibilities?
- Does the decision avoid over-engineering, premature microservices, unnecessary abstractions, and implementation leakage?
Phase 4: Scalability Validation
- Can each service scale independently?
- Are there single points of failure?
- Are there bottlenecks that prevent horizontal scaling?
- Is database scaling addressed (read replicas, sharding, partitioning)?
- Is cache scaling addressed?
- Are there unbounded data growth scenarios?
- Are there operations that degrade under load?
Phase 5: Consistency Validation
- Is the consistency model explicit for each data domain?
- Are eventual consistency windows acceptable for the use case?
- Are race conditions identified and mitigated?
- Is idempotency designed for operations that require it?
- Are distributed transaction boundaries clear?
- Is the deduplication strategy sound?
- Are retry semantics defined for all async operations?
- Is the outbox pattern used where needed?
- Are saga/compensation patterns defined for multi-step operations?
Phase 6: Security Validation
- Are authentication boundaries clearly defined?
- Is authorization modeled correctly (RBAC, ABAC)?
- Is service-to-service authentication specified?
- Is token propagation defined?
- Is tenant isolation clearly defined (for multi-tenant systems)?
- Is secret management addressed?
- Are there data exposure risks in API responses?
- Is audit logging specified for sensitive operations?
Phase 7: Integration Validation
- Are all external system integrations identified?
- Is the integration pattern appropriate (API, webhook, polling, event)?
- Are rate limits and quotas addressed for external APIs?
- Are failure modes defined for each integration (timeout, circuit breaker, fallback)?
- Are retry strategies defined for transient failures?
- Is data transformation between systems addressed?
- Are there hidden coupling points with external systems?
Phase 8: Observability Validation
- Are logs, metrics, and traces all specified?
- Is correlation ID propagation defined across services?
- Are SLOs defined for critical operations?
- Are alert conditions and thresholds specified?
- Can the system be debugged end-to-end from logs and traces?
- Are there blind spots where failures would be invisible?
Phase 9: Data Integrity Validation
- Are there scenarios where data could be lost?
- Are transaction boundaries appropriate?
- Are there scenarios where data could become inconsistent?
- Is data ownership clear (each data item owned by exactly one service)?
- Are cascading deletes or updates handled correctly?
- Are there data migration risks?
Phase 10: Over-Engineering Detection
Check for common over-engineering patterns:
- Services that could be modules
- Patterns applied "just in case" without PRD justification
- Storage choices that exceed what the requirements demand
- Async processing where sync would suffice
- Abstraction layers that add complexity without solving a real problem
- Consistency guarantees stronger than what the requirements demand
- Security boundaries more complex than the threat model requires
- Observability granularity beyond operational need
Phase 11: Under-Engineering Detection
Check for common under-engineering patterns:
- Missing error handling for edge cases identified in the PRD
- Missing idempotency for operations the PRD marks as requiring it
- Missing NFR accommodations (scaling, latency, availability)
- Missing async processing for operations that the PRD requires to be non-blocking
- Missing security boundaries or authentication where the PRD requires it
- Missing observability for critical operations
- Missing consistency model specification
- Missing integration failure handling
- Missing retry strategies for external dependencies
Validation Checklist
After challenging, verify the architecture satisfies:
- Every architectural element traces to at least one PRD requirement
- Every PRD requirement is covered by at least one architectural element
- Every ADR is necessary, well-reasoned, and honestly assessed
- No over-engineering without PRD justification
- No under-engineering for PRD-identified requirements
- All 18 architecture sections are present and substantive (or explicitly N/A with reason)
- Service boundaries are aligned with domain responsibilities
- API contracts are complete and consistent
- Data model is justified by query and write patterns
- Storage selections are the simplest option that meets requirements
- Async processing is justified by PRD requirements
- Error model covers all PRD edge cases
- Consistency model is explicit (strong vs eventual per domain)
- Security boundaries are defined
- Integration boundaries are defined with failure modes
- Observability covers logs, metrics, traces, and alerts
- Scaling strategy addresses NFRs
- At least 3 Mermaid diagrams are present
- At least 1 ADR is present
- Risks are documented
- Open questions are documented
Architecture Review Output
At the end of the challenge, produce a structured review section to be appended or updated in the architecture document:
## Architecture Review
### Risks
| Risk | Impact | Likelihood | Mitigation |
|------|--------|-----------|------------|
| ... | High/Medium/Low | High/Medium/Low | ... |
### Missing Parts
- [ ] ...
### Over-Engineering
- ... (specific items identified as over-engineered)
### Recommendations
- ... (specific improvements recommended)
### Gate Decision
- [ ] PASS — Architecture is ready for Planner handoff
- [ ] CONDITIONAL PASS — Architecture needs minor adjustments (listed above)
- [ ] FAIL — Architecture needs significant revision (listed above)
When the gate decision is PASS or CONDITIONAL PASS (after adjustments), the architecture is ready for the next step: finalize-architecture.
Outcomes
For each issue found:
- Document the issue
- Propose a fix
- Apply the fix directly to
docs/architecture/{feature}.md - Re-verify the fix against the PRD
After all issues are resolved, proceed to finalize-architecture.
Guardrails
This is a pure validation skill.
Do:
- Challenge architectural decisions with evidence
- Validate traceability to PRD requirements
- Detect over-engineering and under-engineering
- Validate scalability, consistency, security, integration, observability
- Propose specific fixes for identified issues
- Apply fixes directly to
docs/architecture/{feature}.md
Do not:
- Change PRD requirements or scope
- Design architecture from scratch
- Make implementation-level decisions
- Break down tasks or create milestones
- Write test cases
- Produce any file artifact other than
docs/architecture/{feature}.md
Transition
After challenge is complete and issues are resolved, invoke finalize-architecture for final completeness check and format validation.