opencode-code-agent/SKILL.md at 0daffeb8c2c3a283ad24ec6e5592cc65fefa5a83

14 KiB

Raw Blame History

name: qa-only description: "Report-only QA testing. Systematically tests a web application and produces a structured report with health score, screenshots, and repro steps — but never fixes anything. Use when asked to "just re"

Test Plan Context

Before falling back to git diff heuristics, check for richer test plan sources:

Project-scoped test plans: Check ~/.gstack/projects/ for recent *-test-plan-*.md files for this repo

eval "$(${GSTACK_OPENCODE_DIR}/bin/gstack-slug 2>/dev/null)"
ls -t ~/.gstack/projects/$SLUG/*-test-plan-*.md 2>/dev/null | head -1

Conversation context: Check if a prior /plan-eng-review or /plan-ceo-review produced test plan output in this conversation
Use whichever source is richer. Fall back to git diff analysis only if neither is available.

Modes

Diff-aware (automatic when on a feature branch with no URL)

This is the primary mode for developers verifying their work. When the user says /qa without a URL and the repo is on a feature branch, automatically:

Analyze the branch diff to understand what changed:

git diff main...HEAD --name-only
git log main..HEAD --oneline

Identify affected pages/routes from the changed files:
- Controller/route files → which URL paths they serve
- View/template/component files → which pages render them
- Model/service files → which pages use those models (check controllers that reference them)
- CSS/style files → which pages include those stylesheets
- API endpoints → test them directly with ${GSTACK_BROWSE} js "await fetch('/api/...')"
- Static pages (markdown, HTML) → navigate to them directly
If no obvious pages/routes are identified from the diff: Do not skip browser testing. The user invoked /qa because they want browser-based verification. Fall back to Quick mode — navigate to the homepage, follow the top 5 navigation targets, check console for errors, and test any interactive elements found. Backend, config, and infrastructure changes affect app behavior — always verify the app still works.

Detect the running app — check common local dev ports:

${GSTACK_BROWSE} goto http://localhost:3000 2>/dev/null && echo "Found app on :3000" || \
${GSTACK_BROWSE} goto http://localhost:4000 2>/dev/null && echo "Found app on :4000" || \
${GSTACK_BROWSE} goto http://localhost:8080 2>/dev/null && echo "Found app on :8080"

If no local app is found, check for a staging/preview URL in the PR or environment. If nothing works, ask the user for the URL.

Test each affected page/route:
- Navigate to the page
- Take a screenshot
- Check console for errors
- If the change was interactive (forms, buttons, flows), test the interaction end-to-end
- Use snapshot -D before and after actions to verify the change had the expected effect
Cross-reference with commit messages and PR description to understand intent — what should the change do? Verify it actually does that.
Check TODOS.md (if it exists) for known bugs or issues related to the changed files. If a TODO describes a bug that this branch should fix, add it to your test plan. If you find a new bug during QA that isn't in TODOS.md, note it in the report.
Report findings scoped to the branch changes:
- "Changes tested: N pages/routes affected by this branch"
- For each: does it work? Screenshot evidence.
- Any regressions on adjacent pages?

If the user provides a URL with diff-aware mode: Use that URL as the base but still scope testing to the changed files.

Full (default when URL is provided)

Systematic exploration. Visit every reachable page. Document 5-10 well-evidenced issues. Produce health score. Takes 5-15 minutes depending on app size.

Quick (`--quick`)

30-second smoke test. Visit homepage + top 5 navigation targets. Check: page loads? Console errors? Broken links? Produce health score. No detailed issue documentation.

Regression (`--regression <baseline>`)

Run full mode, then load baseline.json from a previous run. Diff: which issues are fixed? Which are new? What's the score delta? Append regression section to report.

Workflow

Phase 1: Initialize

Find browse binary (see Setup above)
Create output directories
Copy report template from qa/templates/qa-report-template.md to output dir
Start timer for duration tracking

Phase 2: Authenticate (if needed)

If the user specified auth credentials:

${GSTACK_BROWSE} goto <login-url>
${GSTACK_BROWSE} snapshot -i                    # find the login form
${GSTACK_BROWSE} fill @e3 "user@example.com"
${GSTACK_BROWSE} fill @e4 "[REDACTED]"         # NEVER include real passwords in report
${GSTACK_BROWSE} click @e5                      # submit
${GSTACK_BROWSE} snapshot -D                    # verify login succeeded

If the user provided a cookie file:

${GSTACK_BROWSE} cookie-import cookies.json
${GSTACK_BROWSE} goto <target-url>

If 2FA/OTP is required: Ask the user for the code and wait.

If CAPTCHA blocks you: Tell the user: "Please complete the CAPTCHA in the browser, then tell me to continue."

Phase 3: Orient

Get a map of the application:

${GSTACK_BROWSE} goto <target-url>
${GSTACK_BROWSE} snapshot -i -a -o "$REPORT_DIR/screenshots/initial.png"
${GSTACK_BROWSE} links                          # map navigation structure
${GSTACK_BROWSE} console --errors               # any errors on landing?

Detect framework (note in report metadata):

__next in HTML or _next/data requests → Next.js
csrf-token meta tag → Rails
wp-content in URLs → WordPress
Client-side routing with no page reloads → SPA

For SPAs: The links command may return few results because navigation is client-side. Use snapshot -i to find nav elements (buttons, menu items) instead.

Phase 4: Explore

Visit pages systematically. At each page:

${GSTACK_BROWSE} goto <page-url>
${GSTACK_BROWSE} snapshot -i -a -o "$REPORT_DIR/screenshots/page-name.png"
${GSTACK_BROWSE} console --errors

Then follow the per-page exploration checklist (see qa/references/issue-taxonomy.md):

Visual scan — Look at the annotated screenshot for layout issues
Interactive elements — Click buttons, links, controls. Do they work?
Forms — Fill and submit. Test empty, invalid, edge cases
Navigation — Check all paths in and out
States — Empty state, loading, error, overflow
Console — Any new JS errors after interactions?

Responsiveness — Check mobile viewport if relevant:

${GSTACK_BROWSE} viewport 375x812
${GSTACK_BROWSE} screenshot "$REPORT_DIR/screenshots/page-mobile.png"
${GSTACK_BROWSE} viewport 1280x720

Depth judgment: Spend more time on core features (homepage, dashboard, checkout, search) and less on secondary pages (about, terms, privacy).

Quick mode: Only visit homepage + top 5 navigation targets from the Orient phase. Skip the per-page checklist — just check: loads? Console errors? Broken links visible?

Phase 5: Document

Document each issue immediately when found — don't batch them.

Two evidence tiers:

Interactive bugs (broken flows, dead buttons, form failures):

Take a screenshot before the action
Perform the action
Take a screenshot showing the result
Use snapshot -D to show what changed
Write repro steps referencing screenshots

${GSTACK_BROWSE} screenshot "$REPORT_DIR/screenshots/issue-001-step-1.png"
${GSTACK_BROWSE} click @e5
${GSTACK_BROWSE} screenshot "$REPORT_DIR/screenshots/issue-001-result.png"
${GSTACK_BROWSE} snapshot -D

Static bugs (typos, layout issues, missing images):

Take a single annotated screenshot showing the problem
Describe what's wrong

${GSTACK_BROWSE} snapshot -i -a -o "$REPORT_DIR/screenshots/issue-002.png"

Write each issue to the report immediately using the template format from qa/templates/qa-report-template.md.

Phase 6: Wrap Up

Compute health score using the rubric below
Write "Top 3 Things to Fix" — the 3 highest-severity issues
Write console health summary — aggregate all console errors seen across pages
Update severity counts in the summary table
Fill in report metadata — date, duration, pages visited, screenshot count, framework

Save baseline — write baseline.json with:

{
  "date": "YYYY-MM-DD",
  "url": "<target>",
  "healthScore": N,
  "issues": [{ "id": "ISSUE-001", "title": "...", "severity": "...", "category": "..." }],
  "categoryScores": { "console": N, "links": N, ... }
}

Regression mode: After writing the report, load the baseline file. Compare:

Health score delta
Issues fixed (in baseline but not current)
New issues (in current but not baseline)
Append the regression section to the report

Health Score Rubric

Compute each category score (0-100), then take the weighted average.

Console (weight: 15%)

0 errors → 100
1-3 errors → 70
4-10 errors → 40
10+ errors → 10

Links (weight: 10%)

0 broken → 100
Each broken link → -15 (minimum 0)

Per-Category Scoring (Visual, Functional, UX, Content, Performance, Accessibility)

Each category starts at 100. Deduct per finding:

Critical issue → -25
High issue → -15
Medium issue → -8
Low issue → -3 Minimum 0 per category.

Weights

Category	Weight
Console	15%
Links	10%
Visual	10%
Functional	20%
UX	15%
Performance	10%
Content	5%
Accessibility	15%

Final Score

score = Σ (category_score × weight)

Framework-Specific Guidance

Next.js

Check console for hydration errors (Hydration failed, Text content did not match)
Monitor _next/data requests in network — 404s indicate broken data fetching
Test client-side navigation (click links, don't just goto) — catches routing issues
Check for CLS (Cumulative Layout Shift) on pages with dynamic content

Rails

Check for N+1 query warnings in console (if development mode)
Verify CSRF token presence in forms
Test Turbo/Stimulus integration — do page transitions work smoothly?
Check for flash messages appearing and dismissing correctly

WordPress

Check for plugin conflicts (JS errors from different plugins)
Verify admin bar visibility for logged-in users
Test REST API endpoints (/wp-json/)
Check for mixed content warnings (common with WP)

General SPA (React, Vue, Angular)

Use snapshot -i for navigation — links command misses client-side routes
Check for stale state (navigate away and back — does data refresh?)
Test browser back/forward — does the app handle history correctly?
Check for memory leaks (monitor console after extended use)

Important Rules

Repro is everything. Every issue needs at least one screenshot. No exceptions.
Verify before documenting. Retry the issue once to confirm it's reproducible, not a fluke.
Never include credentials. Write [REDACTED] for passwords in repro steps.
Write incrementally. Append each issue to the report as you find it. Don't batch.
Never read source code. Test as a user, not a developer.
Check console after every interaction. JS errors that don't surface visually are still bugs.
Test like a user. Use realistic data. Walk through complete workflows end-to-end.
Depth over breadth. 5-10 well-documented issues with evidence > 20 vague descriptions.
Never delete output files. Screenshots and reports accumulate — that's intentional.
Use snapshot -C for tricky UIs. Finds clickable divs that the accessibility tree misses.
Show screenshots to the user. After every ${GSTACK_BROWSE} screenshot, ${GSTACK_BROWSE} snapshot -a -o, or ${GSTACK_BROWSE} responsive command, use the Read tool on the output file(s) so the user can see them inline. For responsive (3 files), Read all three. This is critical — without it, screenshots are invisible to the user.
Never refuse to use the browser. When the user invokes /qa or /qa-only, they are requesting browser-based testing. Never suggest evals, unit tests, or other alternatives as a substitute. Even if the diff appears to have no UI changes, backend changes affect app behavior — always open the browser and test.

Output

Write the report to both local and project-scoped locations:

Local: .gstack/qa-reports/qa-report-{domain}-{YYYY-MM-DD}.md

Project-scoped: Write test outcome artifact for cross-session context:

eval "$(${GSTACK_OPENCODE_DIR}/bin/gstack-slug 2>/dev/null)" && mkdir -p ~/.gstack/projects/$SLUG

Write to ~/.gstack/projects/{slug}/{user}-{branch}-test-outcome-{datetime}.md

Output Structure

.gstack/qa-reports/
├── qa-report-{domain}-{YYYY-MM-DD}.md    # Structured report
├── screenshots/
│   ├── initial.png                        # Landing page annotated screenshot
│   ├── issue-001-step-1.png               # Per-issue evidence
│   ├── issue-001-result.png
│   └── ...
└── baseline.json                          # For regression mode

Report filenames use the domain and date: qa-report-myapp-com-2026-03-12.md

Additional Rules (qa-only specific)

Never fix bugs. Find and document only. Do not read source code, edit files, or suggest fixes in the report. Your job is to report what's broken, not to fix it. Use /qa for the test-fix-verify loop.
No test framework detected? If the project has no test infrastructure (no test config files, no test directories), include in the report summary: "No test framework detected. Run /qa to bootstrap one and enable regression test generation."

14 KiB Raw Blame History Unescape Escape