241 lines
9.6 KiB
Markdown
241 lines
9.6 KiB
Markdown
---
|
|
name: challenge-architecture
|
|
description: "Stress-test architecture decisions, check PRD traceability, detect over-engineering, validate scalability, consistency, security, integration, and observability. Updates the single architecture file in place."
|
|
---
|
|
|
|
Interview the architect relentlessly about every aspect of this architecture until it passes quality gates. Walk down each branch of the architecture decision tree, validating traceability, necessity, and soundness one-by-one.
|
|
|
|
Focus on system design validation, not implementation details. If a question drifts into code-level patterns, library choices, or implementation specifics, redirect it back to architecture-level concerns.
|
|
|
|
**Announce at start:** "I'm using the challenge-architecture skill to validate and stress-test the architecture."
|
|
|
|
Ask the questions one at a time.
|
|
|
|
## Primary Input
|
|
|
|
- `docs/architecture/{feature}.md`
|
|
- `docs/prd/{feature}.md`
|
|
|
|
## Primary Output (STRICT PATH)
|
|
|
|
- Updated `docs/architecture/{feature}.md`
|
|
|
|
This is the **only** file artifact in the Architect pipeline. Challenge results are applied directly to this file. No intermediate files are written.
|
|
|
|
## Process
|
|
|
|
### Phase 1: Traceability Audit
|
|
|
|
For every architectural element, verify it traces back to at least one PRD requirement:
|
|
|
|
- Does every API endpoint serve a PRD functional requirement?
|
|
- Does every DB table serve a data requirement from functional requirements or NFRs?
|
|
- Does every service boundary serve a domain responsibility from the PRD scope?
|
|
- Does every async flow serve a PRD requirement?
|
|
- Does every error handling strategy serve a PRD edge case or NFR?
|
|
- Does every consistency decision serve a PRD requirement?
|
|
- Does every security boundary serve a security or compliance requirement?
|
|
- Does every integration boundary serve an external system requirement?
|
|
- Does every observability decision serve an NFR?
|
|
|
|
Flag any architectural element that exists without PRD traceability as **potential over-engineering**.
|
|
|
|
### Phase 2: Requirement Coverage Audit
|
|
|
|
For every PRD requirement, verify it is covered by the architecture:
|
|
|
|
- Does every functional requirement have at least one architectural component serving it?
|
|
- Does every NFR have at least one architectural decision addressing it?
|
|
- Does every edge case have an error handling strategy?
|
|
- Does every acceptance criterion have architectural support?
|
|
- Are there PRD requirements that the architecture does not address?
|
|
|
|
Flag any uncovered PRD requirement as a **gap**.
|
|
|
|
### Phase 3: Architecture Decision Validation
|
|
|
|
For each Architectural Decision Record, challenge:
|
|
|
|
- Is the decision necessary, or could a simpler approach work?
|
|
- Are the alternatives fairly evaluated, or is there a strawman?
|
|
- Is the rationale specific to this use case, or generic boilerplate?
|
|
- Are the consequences honestly assessed?
|
|
- Does the decision optimize for maintainability, scalability, reliability, clarity, and bounded responsibilities?
|
|
- Does the decision avoid over-engineering, premature microservices, unnecessary abstractions, and implementation leakage?
|
|
|
|
### Phase 4: Scalability Validation
|
|
|
|
- Can each service scale independently?
|
|
- Are there single points of failure?
|
|
- Are there bottlenecks that prevent horizontal scaling?
|
|
- Is database scaling addressed (read replicas, sharding, partitioning)?
|
|
- Is cache scaling addressed?
|
|
- Are there unbounded data growth scenarios?
|
|
- Are there operations that degrade under load?
|
|
|
|
### Phase 5: Consistency Validation
|
|
|
|
- Is the consistency model explicit for each data domain?
|
|
- Are eventual consistency windows acceptable for the use case?
|
|
- Are race conditions identified and mitigated?
|
|
- Is idempotency designed for operations that require it?
|
|
- Are distributed transaction boundaries clear?
|
|
- Is the deduplication strategy sound?
|
|
- Are retry semantics defined for all async operations?
|
|
- Is the outbox pattern used where needed?
|
|
- Are saga/compensation patterns defined for multi-step operations?
|
|
|
|
### Phase 6: Security Validation
|
|
|
|
- Are authentication boundaries clearly defined?
|
|
- Is authorization modeled correctly (RBAC, ABAC)?
|
|
- Is service-to-service authentication specified?
|
|
- Is token propagation defined?
|
|
- Is tenant isolation clearly defined (for multi-tenant systems)?
|
|
- Is secret management addressed?
|
|
- Are there data exposure risks in API responses?
|
|
- Is audit logging specified for sensitive operations?
|
|
|
|
### Phase 7: Integration Validation
|
|
|
|
- Are all external system integrations identified?
|
|
- Is the integration pattern appropriate (API, webhook, polling, event)?
|
|
- Are rate limits and quotas addressed for external APIs?
|
|
- Are failure modes defined for each integration (timeout, circuit breaker, fallback)?
|
|
- Are retry strategies defined for transient failures?
|
|
- Is data transformation between systems addressed?
|
|
- Are there hidden coupling points with external systems?
|
|
|
|
### Phase 8: Observability Validation
|
|
|
|
- Are logs, metrics, and traces all specified?
|
|
- Is correlation ID propagation defined across services?
|
|
- Are SLOs defined for critical operations?
|
|
- Are alert conditions and thresholds specified?
|
|
- Can the system be debugged end-to-end from logs and traces?
|
|
- Are there blind spots where failures would be invisible?
|
|
|
|
### Phase 9: Data Integrity Validation
|
|
|
|
- Are there scenarios where data could be lost?
|
|
- Are transaction boundaries appropriate?
|
|
- Are there scenarios where data could become inconsistent?
|
|
- Is data ownership clear (each data item owned by exactly one service)?
|
|
- Are cascading deletes or updates handled correctly?
|
|
- Are there data migration risks?
|
|
|
|
### Phase 10: Over-Engineering Detection
|
|
|
|
Check for common over-engineering patterns:
|
|
|
|
- Services that could be modules
|
|
- Patterns applied "just in case" without PRD justification
|
|
- Storage choices that exceed what the requirements demand
|
|
- Async processing where sync would suffice
|
|
- Abstraction layers that add complexity without solving a real problem
|
|
- Consistency guarantees stronger than what the requirements demand
|
|
- Security boundaries more complex than the threat model requires
|
|
- Observability granularity beyond operational need
|
|
|
|
### Phase 11: Under-Engineering Detection
|
|
|
|
Check for common under-engineering patterns:
|
|
|
|
- Missing error handling for edge cases identified in the PRD
|
|
- Missing idempotency for operations the PRD marks as requiring it
|
|
- Missing NFR accommodations (scaling, latency, availability)
|
|
- Missing async processing for operations that the PRD requires to be non-blocking
|
|
- Missing security boundaries or authentication where the PRD requires it
|
|
- Missing observability for critical operations
|
|
- Missing consistency model specification
|
|
- Missing integration failure handling
|
|
- Missing retry strategies for external dependencies
|
|
|
|
## Validation Checklist
|
|
|
|
After challenging, verify the architecture satisfies:
|
|
|
|
1. Every architectural element traces to at least one PRD requirement
|
|
2. Every PRD requirement is covered by at least one architectural element
|
|
3. Every ADR is necessary, well-reasoned, and honestly assessed
|
|
4. No over-engineering without PRD justification
|
|
5. No under-engineering for PRD-identified requirements
|
|
6. All 18 architecture sections are present and substantive (or explicitly N/A with reason)
|
|
7. Service boundaries are aligned with domain responsibilities
|
|
8. API contracts are complete and consistent
|
|
9. Data model is justified by query and write patterns
|
|
10. Storage selections are the simplest option that meets requirements
|
|
11. Async processing is justified by PRD requirements
|
|
12. Error model covers all PRD edge cases
|
|
13. Consistency model is explicit (strong vs eventual per domain)
|
|
14. Security boundaries are defined
|
|
15. Integration boundaries are defined with failure modes
|
|
16. Observability covers logs, metrics, traces, and alerts
|
|
17. Scaling strategy addresses NFRs
|
|
18. At least 3 Mermaid diagrams are present
|
|
19. At least 1 ADR is present
|
|
20. Risks are documented
|
|
21. Open questions are documented
|
|
|
|
## Architecture Review Output
|
|
|
|
At the end of the challenge, produce a structured review section to be appended or updated in the architecture document:
|
|
|
|
```markdown
|
|
## Architecture Review
|
|
|
|
### Risks
|
|
| Risk | Impact | Likelihood | Mitigation |
|
|
|------|--------|-----------|------------|
|
|
| ... | High/Medium/Low | High/Medium/Low | ... |
|
|
|
|
### Missing Parts
|
|
- [ ] ...
|
|
|
|
### Over-Engineering
|
|
- ... (specific items identified as over-engineered)
|
|
|
|
### Recommendations
|
|
- ... (specific improvements recommended)
|
|
|
|
### Gate Decision
|
|
- [ ] PASS — Architecture is ready for Planner handoff
|
|
- [ ] CONDITIONAL PASS — Architecture needs minor adjustments (listed above)
|
|
- [ ] FAIL — Architecture needs significant revision (listed above)
|
|
```
|
|
|
|
When the gate decision is PASS or CONDITIONAL PASS (after adjustments), the architecture is ready for the next step: `finalize-architecture`.
|
|
|
|
## Outcomes
|
|
|
|
For each issue found:
|
|
1. Document the issue
|
|
2. Propose a fix
|
|
3. Apply the fix directly to `docs/architecture/{feature}.md`
|
|
4. Re-verify the fix against the PRD
|
|
|
|
After all issues are resolved, proceed to `finalize-architecture`.
|
|
|
|
## Guardrails
|
|
|
|
This is a pure validation skill.
|
|
|
|
Do:
|
|
- Challenge architectural decisions with evidence
|
|
- Validate traceability to PRD requirements
|
|
- Detect over-engineering and under-engineering
|
|
- Validate scalability, consistency, security, integration, observability
|
|
- Propose specific fixes for identified issues
|
|
- Apply fixes directly to `docs/architecture/{feature}.md`
|
|
|
|
Do not:
|
|
- Change PRD requirements or scope
|
|
- Design architecture from scratch
|
|
- Make implementation-level decisions
|
|
- Break down tasks or create milestones
|
|
- Write test cases
|
|
- Produce any file artifact other than `docs/architecture/{feature}.md`
|
|
|
|
## Transition
|
|
|
|
After challenge is complete and issues are resolved, invoke `finalize-architecture` for final completeness check and format validation. |