opencode-workflow/skills/challenge-architecture/SKILL.md

---
name: challenge-architecture
description: "Stress-test architecture decisions, check PRD traceability, detect over-engineering, validate scalability, consistency, security, integration, and observability. Updates the single architecture file in place."
---

Interview the architect relentlessly about every aspect of this architecture until it passes quality gates. Walk down each branch of the architecture decision tree, validating traceability, necessity, and soundness one-by-one.

Focus on system design validation, not implementation details. If a question drifts into code-level patterns, library choices, or implementation specifics, redirect it back to architecture-level concerns.

**Announce at start:** "I'm using the challenge-architecture skill to validate and stress-test the architecture."

Ask the questions one at a time.

## Primary Input

- `docs/architecture/{feature}.md`
- `docs/prd/{feature}.md`

## Primary Output (STRICT PATH)

- Updated `docs/architecture/{feature}.md`

This is the **only** file artifact in the Architect pipeline. Challenge results are applied directly to this file. No intermediate files are written.

## Process

### Phase 1: Traceability Audit

For every architectural element, verify it traces back to at least one PRD requirement:

- Does every API endpoint serve a PRD functional requirement?
- Does every DB table serve a data requirement from functional requirements or NFRs?
- Does every service boundary serve a domain responsibility from the PRD scope?
- Does every async flow serve a PRD requirement?
- Does every error handling strategy serve a PRD edge case or NFR?
- Does every consistency decision serve a PRD requirement?
- Does every security boundary serve a security or compliance requirement?
- Does every integration boundary serve an external system requirement?
- Does every observability decision serve an NFR?

Flag any architectural element that exists without PRD traceability as **potential over-engineering**.

### Phase 2: Requirement Coverage Audit

For every PRD requirement, verify it is covered by the architecture:

- Does every functional requirement have at least one architectural component serving it?
- Does every NFR have at least one architectural decision addressing it?
- Does every edge case have an error handling strategy?
- Does every acceptance criterion have architectural support?
- Are there PRD requirements that the architecture does not address?

Flag any uncovered PRD requirement as a **gap**.

### Phase 3: Architecture Decision Validation

For each Architectural Decision Record, challenge:

- Is the decision necessary, or could a simpler approach work?
- Are the alternatives fairly evaluated, or is there a strawman?
- Is the rationale specific to this use case, or generic boilerplate?
- Are the consequences honestly assessed?
- Does the decision optimize for maintainability, scalability, reliability, clarity, and bounded responsibilities?
- Does the decision avoid over-engineering, premature microservices, unnecessary abstractions, and implementation leakage?

### Phase 4: Scalability Validation

- Can each service scale independently?
- Are there single points of failure?
- Are there bottlenecks that prevent horizontal scaling?
- Is database scaling addressed (read replicas, sharding, partitioning)?
- Is cache scaling addressed?
- Are there unbounded data growth scenarios?
- Are there operations that degrade under load?

### Phase 5: Consistency Validation

- Is the consistency model explicit for each data domain?
- Are eventual consistency windows acceptable for the use case?
- Are race conditions identified and mitigated?
- Is idempotency designed for operations that require it?
- Are distributed transaction boundaries clear?
- Is the deduplication strategy sound?
- Are retry semantics defined for all async operations?
- Is the outbox pattern used where needed?
- Are saga/compensation patterns defined for multi-step operations?

### Phase 6: Security Validation

- Are authentication boundaries clearly defined?
- Is authorization modeled correctly (RBAC, ABAC)?
- Is service-to-service authentication specified?
- Is token propagation defined?
- Is tenant isolation clearly defined (for multi-tenant systems)?
- Is secret management addressed?
- Are there data exposure risks in API responses?
- Is audit logging specified for sensitive operations?

### Phase 7: Integration Validation

- Are all external system integrations identified?
- Is the integration pattern appropriate (API, webhook, polling, event)?
- Are rate limits and quotas addressed for external APIs?
- Are failure modes defined for each integration (timeout, circuit breaker, fallback)?
- Are retry strategies defined for transient failures?
- Is data transformation between systems addressed?
- Are there hidden coupling points with external systems?

### Phase 8: Observability Validation

- Are logs, metrics, and traces all specified?
- Is correlation ID propagation defined across services?
- Are SLOs defined for critical operations?
- Are alert conditions and thresholds specified?
- Can the system be debugged end-to-end from logs and traces?
- Are there blind spots where failures would be invisible?

### Phase 9: Data Integrity Validation

- Are there scenarios where data could be lost?
- Are transaction boundaries appropriate?
- Are there scenarios where data could become inconsistent?
- Is data ownership clear (each data item owned by exactly one service)?
- Are cascading deletes or updates handled correctly?
- Are there data migration risks?

### Phase 10: Over-Engineering Detection

Check for common over-engineering patterns:

- Services that could be modules
- Patterns applied "just in case" without PRD justification
- Storage choices that exceed what the requirements demand
- Async processing where sync would suffice
- Abstraction layers that add complexity without solving a real problem
- Consistency guarantees stronger than what the requirements demand
- Security boundaries more complex than the threat model requires
- Observability granularity beyond operational need

### Phase 11: Under-Engineering Detection

Check for common under-engineering patterns:

- Missing error handling for edge cases identified in the PRD
- Missing idempotency for operations the PRD marks as requiring it
- Missing NFR accommodations (scaling, latency, availability)
- Missing async processing for operations that the PRD requires to be non-blocking
- Missing security boundaries or authentication where the PRD requires it
- Missing observability for critical operations
- Missing consistency model specification
- Missing integration failure handling
- Missing retry strategies for external dependencies

## Validation Checklist

After challenging, verify the architecture satisfies:

1. Every architectural element traces to at least one PRD requirement
2. Every PRD requirement is covered by at least one architectural element
3. Every ADR is necessary, well-reasoned, and honestly assessed
4. No over-engineering without PRD justification
5. No under-engineering for PRD-identified requirements
6. All 18 architecture sections are present and substantive (or explicitly N/A with reason)
7. Service boundaries are aligned with domain responsibilities
8. API contracts are complete and consistent
9. Data model is justified by query and write patterns
10. Storage selections are the simplest option that meets requirements
11. Async processing is justified by PRD requirements
12. Error model covers all PRD edge cases
13. Consistency model is explicit (strong vs eventual per domain)
14. Security boundaries are defined
15. Integration boundaries are defined with failure modes
16. Observability covers logs, metrics, traces, and alerts
17. Scaling strategy addresses NFRs
18. At least 3 Mermaid diagrams are present
19. At least 1 ADR is present
20. Risks are documented
21. Open questions are documented

## Architecture Review Output

At the end of the challenge, produce a structured review section to be appended or updated in the architecture document:

```markdown
## Architecture Review

### Risks
| Risk | Impact | Likelihood | Mitigation |
|------|--------|-----------|------------|
| ... | High/Medium/Low | High/Medium/Low | ... |

### Missing Parts
- [ ] ...

### Over-Engineering
- ... (specific items identified as over-engineered)

### Recommendations
- ... (specific improvements recommended)

### Gate Decision
- [ ] PASS — Architecture is ready for Planner handoff
- [ ] CONDITIONAL PASS — Architecture needs minor adjustments (listed above)
- [ ] FAIL — Architecture needs significant revision (listed above)
```

When the gate decision is PASS or CONDITIONAL PASS (after adjustments), the architecture is ready for the next step: `finalize-architecture`.

## Outcomes

For each issue found:
1. Document the issue
2. Propose a fix
3. Apply the fix directly to `docs/architecture/{feature}.md`
4. Re-verify the fix against the PRD

After all issues are resolved, proceed to `finalize-architecture`.

## Guardrails

This is a pure validation skill.

Do:
- Challenge architectural decisions with evidence
- Validate traceability to PRD requirements
- Detect over-engineering and under-engineering
- Validate scalability, consistency, security, integration, observability
- Propose specific fixes for identified issues
- Apply fixes directly to `docs/architecture/{feature}.md`

Do not:
- Change PRD requirements or scope
- Design architecture from scratch
- Make implementation-level decisions
- Break down tasks or create milestones
- Write test cases
- Produce any file artifact other than `docs/architecture/{feature}.md`

## Transition

After challenge is complete and issues are resolved, invoke `finalize-architecture` for final completeness check and format validation.