opencode-workflow/SKILL.md at b61495d34d2db62663f240ab4b943e2597011a7a

9.6 KiB

Raw Blame History

name	description
challenge-architecture	Stress-test architecture decisions, check PRD traceability, detect over-engineering, validate scalability, consistency, security, integration, and observability. Updates the single architecture file in place.

Interview the architect relentlessly about every aspect of this architecture until it passes quality gates. Walk down each branch of the architecture decision tree, validating traceability, necessity, and soundness one-by-one.

Focus on system design validation, not implementation details. If a question drifts into code-level patterns, library choices, or implementation specifics, redirect it back to architecture-level concerns.

Announce at start: "I'm using the challenge-architecture skill to validate and stress-test the architecture."

Ask the questions one at a time.

Primary Input

docs/architecture/{feature}.md
docs/prd/{feature}.md

Primary Output (STRICT PATH)

Updated docs/architecture/{feature}.md

This is the only file artifact in the Architect pipeline. Challenge results are applied directly to this file. No intermediate files are written.

Process

Phase 1: Traceability Audit

For every architectural element, verify it traces back to at least one PRD requirement:

Does every API endpoint serve a PRD functional requirement?
Does every DB table serve a data requirement from functional requirements or NFRs?
Does every service boundary serve a domain responsibility from the PRD scope?
Does every async flow serve a PRD requirement?
Does every error handling strategy serve a PRD edge case or NFR?
Does every consistency decision serve a PRD requirement?
Does every security boundary serve a security or compliance requirement?
Does every integration boundary serve an external system requirement?
Does every observability decision serve an NFR?

Flag any architectural element that exists without PRD traceability as potential over-engineering.

Phase 2: Requirement Coverage Audit

For every PRD requirement, verify it is covered by the architecture:

Does every functional requirement have at least one architectural component serving it?
Does every NFR have at least one architectural decision addressing it?
Does every edge case have an error handling strategy?
Does every acceptance criterion have architectural support?
Are there PRD requirements that the architecture does not address?

Flag any uncovered PRD requirement as a gap.

Phase 3: Architecture Decision Validation

For each Architectural Decision Record, challenge:

Is the decision necessary, or could a simpler approach work?
Are the alternatives fairly evaluated, or is there a strawman?
Is the rationale specific to this use case, or generic boilerplate?
Are the consequences honestly assessed?
Does the decision optimize for maintainability, scalability, reliability, clarity, and bounded responsibilities?
Does the decision avoid over-engineering, premature microservices, unnecessary abstractions, and implementation leakage?

Phase 4: Scalability Validation

Can each service scale independently?
Are there single points of failure?
Are there bottlenecks that prevent horizontal scaling?
Is database scaling addressed (read replicas, sharding, partitioning)?
Is cache scaling addressed?
Are there unbounded data growth scenarios?
Are there operations that degrade under load?

Phase 5: Consistency Validation

Is the consistency model explicit for each data domain?
Are eventual consistency windows acceptable for the use case?
Are race conditions identified and mitigated?
Is idempotency designed for operations that require it?
Are distributed transaction boundaries clear?
Is the deduplication strategy sound?
Are retry semantics defined for all async operations?
Is the outbox pattern used where needed?
Are saga/compensation patterns defined for multi-step operations?

Phase 6: Security Validation

Are authentication boundaries clearly defined?
Is authorization modeled correctly (RBAC, ABAC)?
Is service-to-service authentication specified?
Is token propagation defined?
Is tenant isolation clearly defined (for multi-tenant systems)?
Is secret management addressed?
Are there data exposure risks in API responses?
Is audit logging specified for sensitive operations?

Phase 7: Integration Validation

Are all external system integrations identified?
Is the integration pattern appropriate (API, webhook, polling, event)?
Are rate limits and quotas addressed for external APIs?
Are failure modes defined for each integration (timeout, circuit breaker, fallback)?
Are retry strategies defined for transient failures?
Is data transformation between systems addressed?
Are there hidden coupling points with external systems?

Phase 8: Observability Validation

Are logs, metrics, and traces all specified?
Is correlation ID propagation defined across services?
Are SLOs defined for critical operations?
Are alert conditions and thresholds specified?
Can the system be debugged end-to-end from logs and traces?
Are there blind spots where failures would be invisible?

Phase 9: Data Integrity Validation

Are there scenarios where data could be lost?
Are transaction boundaries appropriate?
Are there scenarios where data could become inconsistent?
Is data ownership clear (each data item owned by exactly one service)?
Are cascading deletes or updates handled correctly?
Are there data migration risks?

Phase 10: Over-Engineering Detection

Check for common over-engineering patterns:

Services that could be modules
Patterns applied "just in case" without PRD justification
Storage choices that exceed what the requirements demand
Async processing where sync would suffice
Abstraction layers that add complexity without solving a real problem
Consistency guarantees stronger than what the requirements demand
Security boundaries more complex than the threat model requires
Observability granularity beyond operational need

Phase 11: Under-Engineering Detection

Check for common under-engineering patterns:

Missing error handling for edge cases identified in the PRD
Missing idempotency for operations the PRD marks as requiring it
Missing NFR accommodations (scaling, latency, availability)
Missing async processing for operations that the PRD requires to be non-blocking
Missing security boundaries or authentication where the PRD requires it
Missing observability for critical operations
Missing consistency model specification
Missing integration failure handling
Missing retry strategies for external dependencies

Validation Checklist

After challenging, verify the architecture satisfies:

Every architectural element traces to at least one PRD requirement
Every PRD requirement is covered by at least one architectural element
Every ADR is necessary, well-reasoned, and honestly assessed
No over-engineering without PRD justification
No under-engineering for PRD-identified requirements
All 18 architecture sections are present and substantive (or explicitly N/A with reason)
Service boundaries are aligned with domain responsibilities
API contracts are complete and consistent
Data model is justified by query and write patterns
Storage selections are the simplest option that meets requirements
Async processing is justified by PRD requirements
Error model covers all PRD edge cases
Consistency model is explicit (strong vs eventual per domain)
Security boundaries are defined
Integration boundaries are defined with failure modes
Observability covers logs, metrics, traces, and alerts
Scaling strategy addresses NFRs
At least 3 Mermaid diagrams are present
At least 1 ADR is present
Risks are documented
Open questions are documented

Architecture Review Output

At the end of the challenge, produce a structured review section to be appended or updated in the architecture document:

## Architecture Review

### Risks
| Risk | Impact | Likelihood | Mitigation |
|------|--------|-----------|------------|
| ... | High/Medium/Low | High/Medium/Low | ... |

### Missing Parts
- [ ] ...

### Over-Engineering
- ... (specific items identified as over-engineered)

### Recommendations
- ... (specific improvements recommended)

### Gate Decision
- [ ] PASS — Architecture is ready for Planner handoff
- [ ] CONDITIONAL PASS — Architecture needs minor adjustments (listed above)
- [ ] FAIL — Architecture needs significant revision (listed above)

When the gate decision is PASS or CONDITIONAL PASS (after adjustments), the architecture is ready for the next step: finalize-architecture.

Outcomes

For each issue found:

Document the issue
Propose a fix
Apply the fix directly to docs/architecture/{feature}.md
Re-verify the fix against the PRD

After all issues are resolved, proceed to finalize-architecture.

Guardrails

This is a pure validation skill.

Do:

Challenge architectural decisions with evidence
Validate traceability to PRD requirements
Detect over-engineering and under-engineering
Validate scalability, consistency, security, integration, observability
Propose specific fixes for identified issues
Apply fixes directly to docs/architecture/{feature}.md

Do not:

Change PRD requirements or scope
Design architecture from scratch
Make implementation-level decisions
Break down tasks or create milestones
Write test cases
Produce any file artifact other than docs/architecture/{feature}.md

Transition

After challenge is complete and issues are resolved, invoke finalize-architecture for final completeness check and format validation.

9.6 KiB Raw Blame History