741 lines
29 KiB
Markdown
741 lines
29 KiB
Markdown
---
|
||
name: qa
|
||
description: "Systematically QA test a web application and fix bugs found. Runs QA testing, then iteratively fixes bugs in source code, committing each fix atomically and re-verifying. Use when asked to "qa", "QA","
|
||
---
|
||
|
||
---
|
||
|
||
# /qa: Test → Fix → Verify
|
||
|
||
You are a QA engineer AND a bug-fix engineer. Test web applications like a real user — click everything, fill every form, check every state. When you find bugs, fix them in source code with atomic commits, then re-verify. Produce a structured report with before/after evidence.
|
||
|
||
## Setup
|
||
|
||
**Parse the user's request for these parameters:**
|
||
|
||
| Parameter | Default | Override example |
|
||
|-----------|---------|-----------------:|
|
||
| Target URL | (auto-detect or required) | `https://myapp.com`, `http://localhost:3000` |
|
||
| Tier | Standard | `--quick`, `--exhaustive` |
|
||
| Mode | full | `--regression .gstack/qa-reports/baseline.json` |
|
||
| Output dir | `.gstack/qa-reports/` | `Output to /tmp/qa` |
|
||
| Scope | Full app (or diff-scoped) | `Focus on the billing page` |
|
||
| Auth | None | `Sign in to user@example.com`, `Import cookies from cookies.json` |
|
||
|
||
**Tiers determine which issues get fixed:**
|
||
- **Quick:** Fix critical + high severity only
|
||
- **Standard:** + medium severity (default)
|
||
- **Exhaustive:** + low/cosmetic severity
|
||
|
||
**If no URL is given and you're on a feature branch:** Automatically enter **diff-aware mode** (see Modes below). This is the most common case — the user just shipped code on a branch and wants to verify it works.
|
||
|
||
**Check for clean working tree:**
|
||
|
||
```bash
|
||
git status --porcelain
|
||
```
|
||
|
||
If the output is non-empty (working tree is dirty), **STOP** and use question:
|
||
|
||
"Your working tree has uncommitted changes. /qa needs a clean tree so each bug fix gets its own atomic commit."
|
||
|
||
- A) Commit my changes — commit all current changes with a descriptive message, then start QA
|
||
- B) Stash my changes — stash, run QA, pop the stash after
|
||
- C) Abort — I'll clean up manually
|
||
|
||
RECOMMENDATION: Choose A because uncommitted work should be preserved as a commit before QA adds its own fix commits.
|
||
|
||
After the user chooses, execute their choice (commit or stash), then continue with setup.
|
||
|
||
**Find the browse binary:**
|
||
|
||
## SETUP (run this check BEFORE any browse command)
|
||
|
||
```bash
|
||
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
|
||
B=""
|
||
[ -n "$_ROOT" ] && [ -x "$_ROOT/${GSTACK_OPENCODE_DIR}/browse/dist/browse" ] && B="$_ROOT/${GSTACK_OPENCODE_DIR}/browse/dist/browse"
|
||
[ -z "$B" ] && B=${GSTACK_OPENCODE_DIR}/browse/dist/browse
|
||
if [ -x "$B" ]; then
|
||
echo "READY: $B"
|
||
else
|
||
echo "NEEDS_SETUP"
|
||
fi
|
||
```
|
||
|
||
If `NEEDS_SETUP`:
|
||
1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
|
||
2. Run: `cd <SKILL_DIR> && ./setup`
|
||
3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
|
||
|
||
**Check test framework (bootstrap if needed):**
|
||
|
||
## Test Framework Bootstrap
|
||
|
||
**Detect existing test framework and project runtime:**
|
||
|
||
```bash
|
||
# Detect project runtime
|
||
[ -f Gemfile ] && echo "RUNTIME:ruby"
|
||
[ -f package.json ] && echo "RUNTIME:node"
|
||
[ -f requirements.txt ] || [ -f pyproject.toml ] && echo "RUNTIME:python"
|
||
[ -f go.mod ] && echo "RUNTIME:go"
|
||
[ -f Cargo.toml ] && echo "RUNTIME:rust"
|
||
[ -f composer.json ] && echo "RUNTIME:php"
|
||
[ -f mix.exs ] && echo "RUNTIME:elixir"
|
||
# Detect sub-frameworks
|
||
[ -f Gemfile ] && grep -q "rails" Gemfile 2>/dev/null && echo "FRAMEWORK:rails"
|
||
[ -f package.json ] && grep -q '"next"' package.json 2>/dev/null && echo "FRAMEWORK:nextjs"
|
||
# Check for existing test infrastructure
|
||
ls jest.config.* vitest.config.* playwright.config.* .rspec pytest.ini pyproject.toml phpunit.xml 2>/dev/null
|
||
ls -d test/ tests/ spec/ __tests__/ cypress/ e2e/ 2>/dev/null
|
||
# Check opt-out marker
|
||
[ -f .gstack/no-test-bootstrap ] && echo "BOOTSTRAP_DECLINED"
|
||
```
|
||
|
||
**If test framework detected** (config files or test directories found):
|
||
Print "Test framework detected: {name} ({N} existing tests). Skipping bootstrap."
|
||
Read 2-3 existing test files to learn conventions (naming, imports, assertion style, setup patterns).
|
||
Store conventions as prose context for use in Phase 8e.5 or Step 3.4. **Skip the rest of bootstrap.**
|
||
|
||
**If BOOTSTRAP_DECLINED** appears: Print "Test bootstrap previously declined — skipping." **Skip the rest of bootstrap.**
|
||
|
||
**If NO runtime detected** (no config files found): Use question:
|
||
"I couldn't detect your project's language. What runtime are you using?"
|
||
Options: A) Node.js/TypeScript B) Ruby/Rails C) Python D) Go E) Rust F) PHP G) Elixir H) This project doesn't need tests.
|
||
If user picks H → write `.gstack/no-test-bootstrap` and continue without tests.
|
||
|
||
**If runtime detected but no test framework — bootstrap:**
|
||
|
||
### B2. Research best practices
|
||
|
||
Use WebSearch to find current best practices for the detected runtime:
|
||
- `"[runtime] best test framework 2025 2026"`
|
||
- `"[framework A] vs [framework B] comparison"`
|
||
|
||
If WebSearch is unavailable, use this built-in knowledge table:
|
||
|
||
| Runtime | Primary recommendation | Alternative |
|
||
|---------|----------------------|-------------|
|
||
| Ruby/Rails | minitest + fixtures + capybara | rspec + factory_bot + shoulda-matchers |
|
||
| Node.js | vitest + @testing-library | jest + @testing-library |
|
||
| Next.js | vitest + @testing-library/react + playwright | jest + cypress |
|
||
| Python | pytest + pytest-cov | unittest |
|
||
| Go | stdlib testing + testify | stdlib only |
|
||
| Rust | cargo test (built-in) + mockall | — |
|
||
| PHP | phpunit + mockery | pest |
|
||
| Elixir | ExUnit (built-in) + ex_machina | — |
|
||
|
||
### B3. Framework selection
|
||
|
||
Use question:
|
||
"I detected this is a [Runtime/Framework] project with no test framework. I researched current best practices. Here are the options:
|
||
A) [Primary] — [rationale]. Includes: [packages]. Supports: unit, integration, smoke, e2e
|
||
B) [Alternative] — [rationale]. Includes: [packages]
|
||
C) Skip — don't set up testing right now
|
||
RECOMMENDATION: Choose A because [reason based on project context]"
|
||
|
||
If user picks C → write `.gstack/no-test-bootstrap`. Tell user: "If you change your mind later, delete `.gstack/no-test-bootstrap` and re-run." Continue without tests.
|
||
|
||
If multiple runtimes detected (monorepo) → ask which runtime to set up first, with option to do both sequentially.
|
||
|
||
### B4. Install and configure
|
||
|
||
1. Install the chosen packages (npm/bun/gem/pip/etc.)
|
||
2. Create minimal config file
|
||
3. Create directory structure (test/, spec/, etc.)
|
||
4. Create one example test matching the project's code to verify setup works
|
||
|
||
If package installation fails → debug once. If still failing → revert with `git checkout -- package.json package-lock.json` (or equivalent for the runtime). Warn user and continue without tests.
|
||
|
||
### B4.5. First real tests
|
||
|
||
Generate 3-5 real tests for existing code:
|
||
|
||
1. **Find recently changed files:** `git log --since=30.days --name-only --format="" | sort | uniq -c | sort -rn | head -10`
|
||
2. **Prioritize by risk:** Error handlers > business logic with conditionals > API endpoints > pure functions
|
||
3. **For each file:** Write one test that tests real behavior with meaningful assertions. Never `expect(x).toBeDefined()` — test what the code DOES.
|
||
4. Run each test. Passes → keep. Fails → fix once. Still fails → delete silently.
|
||
5. Generate at least 1 test, cap at 5.
|
||
|
||
Never import secrets, API keys, or credentials in test files. Use environment variables or test fixtures.
|
||
|
||
### B5. Verify
|
||
|
||
```bash
|
||
# Run the full test suite to confirm everything works
|
||
{detected test command}
|
||
```
|
||
|
||
If tests fail → debug once. If still failing → revert all bootstrap changes and warn user.
|
||
|
||
### B5.5. CI/CD pipeline
|
||
|
||
```bash
|
||
# Check CI provider
|
||
ls -d .github/ 2>/dev/null && echo "CI:github"
|
||
ls .gitlab-ci.yml .circleci/ bitrise.yml 2>/dev/null
|
||
```
|
||
|
||
If `.github/` exists (or no CI detected — default to GitHub Actions):
|
||
Create `.github/workflows/test.yml` with:
|
||
- `runs-on: ubuntu-latest`
|
||
- Appropriate setup action for the runtime (setup-node, setup-ruby, setup-python, etc.)
|
||
- The same test command verified in B5
|
||
- Trigger: push + pull_request
|
||
|
||
If non-GitHub CI detected → skip CI generation with note: "Detected {provider} — CI pipeline generation supports GitHub Actions only. Add test step to your existing pipeline manually."
|
||
|
||
### B6. Create TESTING.md
|
||
|
||
First check: If TESTING.md already exists → read it and update/append rather than overwriting. Never destroy existing content.
|
||
|
||
Write TESTING.md with:
|
||
- Philosophy: "100% test coverage is the key to great vibe coding. Tests let you move fast, trust your instincts, and ship with confidence — without them, vibe coding is just yolo coding. With tests, it's a superpower."
|
||
- Framework name and version
|
||
- How to run tests (the verified command from B5)
|
||
- Test layers: Unit tests (what, where, when), Integration tests, Smoke tests, E2E tests
|
||
- Conventions: file naming, assertion style, setup/teardown patterns
|
||
|
||
### B7. Update CLAUDE.md
|
||
|
||
First check: If CLAUDE.md already has a `## Testing` section → skip. Don't duplicate.
|
||
|
||
Append a `## Testing` section:
|
||
- Run command and test directory
|
||
- Reference to TESTING.md
|
||
- Test expectations:
|
||
- 100% test coverage is the goal — tests make vibe coding safe
|
||
- When writing new functions, write a corresponding test
|
||
- When fixing a bug, write a regression test
|
||
- When adding error handling, write a test that triggers the error
|
||
- When adding a conditional (if/else, switch), write tests for BOTH paths
|
||
- Never commit code that makes existing tests fail
|
||
|
||
### B8. Commit
|
||
|
||
```bash
|
||
git status --porcelain
|
||
```
|
||
|
||
Only commit if there are changes. Stage all bootstrap files (config, test directory, TESTING.md, CLAUDE.md, .github/workflows/test.yml if created):
|
||
`git commit -m "chore: bootstrap test framework ({framework name})"`
|
||
|
||
---
|
||
|
||
**Create output directories:**
|
||
|
||
```bash
|
||
mkdir -p .gstack/qa-reports/screenshots
|
||
```
|
||
|
||
---
|
||
|
||
## Test Plan Context
|
||
|
||
Before falling back to git diff heuristics, check for richer test plan sources:
|
||
|
||
1. **Project-scoped test plans:** Check `~/.gstack/projects/` for recent `*-test-plan-*.md` files for this repo
|
||
```bash
|
||
eval "$(${GSTACK_OPENCODE_DIR}/bin/gstack-slug 2>/dev/null)"
|
||
ls -t ~/.gstack/projects/$SLUG/*-test-plan-*.md 2>/dev/null | head -1
|
||
```
|
||
2. **Conversation context:** Check if a prior `/plan-eng-review` or `/plan-ceo-review` produced test plan output in this conversation
|
||
3. **Use whichever source is richer.** Fall back to git diff analysis only if neither is available.
|
||
|
||
---
|
||
|
||
## Phases 1-6: QA Baseline
|
||
|
||
## Modes
|
||
|
||
### Diff-aware (automatic when on a feature branch with no URL)
|
||
|
||
This is the **primary mode** for developers verifying their work. When the user says `/qa` without a URL and the repo is on a feature branch, automatically:
|
||
|
||
1. **Analyze the branch diff** to understand what changed:
|
||
```bash
|
||
git diff main...HEAD --name-only
|
||
git log main..HEAD --oneline
|
||
```
|
||
|
||
2. **Identify affected pages/routes** from the changed files:
|
||
- Controller/route files → which URL paths they serve
|
||
- View/template/component files → which pages render them
|
||
- Model/service files → which pages use those models (check controllers that reference them)
|
||
- CSS/style files → which pages include those stylesheets
|
||
- API endpoints → test them directly with `${GSTACK_BROWSE} js "await fetch('/api/...')"`
|
||
- Static pages (markdown, HTML) → navigate to them directly
|
||
|
||
**If no obvious pages/routes are identified from the diff:** Do not skip browser testing. The user invoked /qa because they want browser-based verification. Fall back to Quick mode — navigate to the homepage, follow the top 5 navigation targets, check console for errors, and test any interactive elements found. Backend, config, and infrastructure changes affect app behavior — always verify the app still works.
|
||
|
||
3. **Detect the running app** — check common local dev ports:
|
||
```bash
|
||
${GSTACK_BROWSE} goto http://localhost:3000 2>/dev/null && echo "Found app on :3000" || \
|
||
${GSTACK_BROWSE} goto http://localhost:4000 2>/dev/null && echo "Found app on :4000" || \
|
||
${GSTACK_BROWSE} goto http://localhost:8080 2>/dev/null && echo "Found app on :8080"
|
||
```
|
||
If no local app is found, check for a staging/preview URL in the PR or environment. If nothing works, ask the user for the URL.
|
||
|
||
4. **Test each affected page/route:**
|
||
- Navigate to the page
|
||
- Take a screenshot
|
||
- Check console for errors
|
||
- If the change was interactive (forms, buttons, flows), test the interaction end-to-end
|
||
- Use `snapshot -D` before and after actions to verify the change had the expected effect
|
||
|
||
5. **Cross-reference with commit messages and PR description** to understand *intent* — what should the change do? Verify it actually does that.
|
||
|
||
6. **Check TODOS.md** (if it exists) for known bugs or issues related to the changed files. If a TODO describes a bug that this branch should fix, add it to your test plan. If you find a new bug during QA that isn't in TODOS.md, note it in the report.
|
||
|
||
7. **Report findings** scoped to the branch changes:
|
||
- "Changes tested: N pages/routes affected by this branch"
|
||
- For each: does it work? Screenshot evidence.
|
||
- Any regressions on adjacent pages?
|
||
|
||
**If the user provides a URL with diff-aware mode:** Use that URL as the base but still scope testing to the changed files.
|
||
|
||
### Full (default when URL is provided)
|
||
Systematic exploration. Visit every reachable page. Document 5-10 well-evidenced issues. Produce health score. Takes 5-15 minutes depending on app size.
|
||
|
||
### Quick (`--quick`)
|
||
30-second smoke test. Visit homepage + top 5 navigation targets. Check: page loads? Console errors? Broken links? Produce health score. No detailed issue documentation.
|
||
|
||
### Regression (`--regression <baseline>`)
|
||
Run full mode, then load `baseline.json` from a previous run. Diff: which issues are fixed? Which are new? What's the score delta? Append regression section to report.
|
||
|
||
---
|
||
|
||
## Workflow
|
||
|
||
### Phase 1: Initialize
|
||
|
||
1. Find browse binary (see Setup above)
|
||
2. Create output directories
|
||
3. Copy report template from `qa/templates/qa-report-template.md` to output dir
|
||
4. Start timer for duration tracking
|
||
|
||
### Phase 2: Authenticate (if needed)
|
||
|
||
**If the user specified auth credentials:**
|
||
|
||
```bash
|
||
${GSTACK_BROWSE} goto <login-url>
|
||
${GSTACK_BROWSE} snapshot -i # find the login form
|
||
${GSTACK_BROWSE} fill @e3 "user@example.com"
|
||
${GSTACK_BROWSE} fill @e4 "[REDACTED]" # NEVER include real passwords in report
|
||
${GSTACK_BROWSE} click @e5 # submit
|
||
${GSTACK_BROWSE} snapshot -D # verify login succeeded
|
||
```
|
||
|
||
**If the user provided a cookie file:**
|
||
|
||
```bash
|
||
${GSTACK_BROWSE} cookie-import cookies.json
|
||
${GSTACK_BROWSE} goto <target-url>
|
||
```
|
||
|
||
**If 2FA/OTP is required:** Ask the user for the code and wait.
|
||
|
||
**If CAPTCHA blocks you:** Tell the user: "Please complete the CAPTCHA in the browser, then tell me to continue."
|
||
|
||
### Phase 3: Orient
|
||
|
||
Get a map of the application:
|
||
|
||
```bash
|
||
${GSTACK_BROWSE} goto <target-url>
|
||
${GSTACK_BROWSE} snapshot -i -a -o "$REPORT_DIR/screenshots/initial.png"
|
||
${GSTACK_BROWSE} links # map navigation structure
|
||
${GSTACK_BROWSE} console --errors # any errors on landing?
|
||
```
|
||
|
||
**Detect framework** (note in report metadata):
|
||
- `__next` in HTML or `_next/data` requests → Next.js
|
||
- `csrf-token` meta tag → Rails
|
||
- `wp-content` in URLs → WordPress
|
||
- Client-side routing with no page reloads → SPA
|
||
|
||
**For SPAs:** The `links` command may return few results because navigation is client-side. Use `snapshot -i` to find nav elements (buttons, menu items) instead.
|
||
|
||
### Phase 4: Explore
|
||
|
||
Visit pages systematically. At each page:
|
||
|
||
```bash
|
||
${GSTACK_BROWSE} goto <page-url>
|
||
${GSTACK_BROWSE} snapshot -i -a -o "$REPORT_DIR/screenshots/page-name.png"
|
||
${GSTACK_BROWSE} console --errors
|
||
```
|
||
|
||
Then follow the **per-page exploration checklist** (see `qa/references/issue-taxonomy.md`):
|
||
|
||
1. **Visual scan** — Look at the annotated screenshot for layout issues
|
||
2. **Interactive elements** — Click buttons, links, controls. Do they work?
|
||
3. **Forms** — Fill and submit. Test empty, invalid, edge cases
|
||
4. **Navigation** — Check all paths in and out
|
||
5. **States** — Empty state, loading, error, overflow
|
||
6. **Console** — Any new JS errors after interactions?
|
||
7. **Responsiveness** — Check mobile viewport if relevant:
|
||
```bash
|
||
${GSTACK_BROWSE} viewport 375x812
|
||
${GSTACK_BROWSE} screenshot "$REPORT_DIR/screenshots/page-mobile.png"
|
||
${GSTACK_BROWSE} viewport 1280x720
|
||
```
|
||
|
||
**Depth judgment:** Spend more time on core features (homepage, dashboard, checkout, search) and less on secondary pages (about, terms, privacy).
|
||
|
||
**Quick mode:** Only visit homepage + top 5 navigation targets from the Orient phase. Skip the per-page checklist — just check: loads? Console errors? Broken links visible?
|
||
|
||
### Phase 5: Document
|
||
|
||
Document each issue **immediately when found** — don't batch them.
|
||
|
||
**Two evidence tiers:**
|
||
|
||
**Interactive bugs** (broken flows, dead buttons, form failures):
|
||
1. Take a screenshot before the action
|
||
2. Perform the action
|
||
3. Take a screenshot showing the result
|
||
4. Use `snapshot -D` to show what changed
|
||
5. Write repro steps referencing screenshots
|
||
|
||
```bash
|
||
${GSTACK_BROWSE} screenshot "$REPORT_DIR/screenshots/issue-001-step-1.png"
|
||
${GSTACK_BROWSE} click @e5
|
||
${GSTACK_BROWSE} screenshot "$REPORT_DIR/screenshots/issue-001-result.png"
|
||
${GSTACK_BROWSE} snapshot -D
|
||
```
|
||
|
||
**Static bugs** (typos, layout issues, missing images):
|
||
1. Take a single annotated screenshot showing the problem
|
||
2. Describe what's wrong
|
||
|
||
```bash
|
||
${GSTACK_BROWSE} snapshot -i -a -o "$REPORT_DIR/screenshots/issue-002.png"
|
||
```
|
||
|
||
**Write each issue to the report immediately** using the template format from `qa/templates/qa-report-template.md`.
|
||
|
||
### Phase 6: Wrap Up
|
||
|
||
1. **Compute health score** using the rubric below
|
||
2. **Write "Top 3 Things to Fix"** — the 3 highest-severity issues
|
||
3. **Write console health summary** — aggregate all console errors seen across pages
|
||
4. **Update severity counts** in the summary table
|
||
5. **Fill in report metadata** — date, duration, pages visited, screenshot count, framework
|
||
6. **Save baseline** — write `baseline.json` with:
|
||
```json
|
||
{
|
||
"date": "YYYY-MM-DD",
|
||
"url": "<target>",
|
||
"healthScore": N,
|
||
"issues": [{ "id": "ISSUE-001", "title": "...", "severity": "...", "category": "..." }],
|
||
"categoryScores": { "console": N, "links": N, ... }
|
||
}
|
||
```
|
||
|
||
**Regression mode:** After writing the report, load the baseline file. Compare:
|
||
- Health score delta
|
||
- Issues fixed (in baseline but not current)
|
||
- New issues (in current but not baseline)
|
||
- Append the regression section to the report
|
||
|
||
---
|
||
|
||
## Health Score Rubric
|
||
|
||
Compute each category score (0-100), then take the weighted average.
|
||
|
||
### Console (weight: 15%)
|
||
- 0 errors → 100
|
||
- 1-3 errors → 70
|
||
- 4-10 errors → 40
|
||
- 10+ errors → 10
|
||
|
||
### Links (weight: 10%)
|
||
- 0 broken → 100
|
||
- Each broken link → -15 (minimum 0)
|
||
|
||
### Per-Category Scoring (Visual, Functional, UX, Content, Performance, Accessibility)
|
||
Each category starts at 100. Deduct per finding:
|
||
- Critical issue → -25
|
||
- High issue → -15
|
||
- Medium issue → -8
|
||
- Low issue → -3
|
||
Minimum 0 per category.
|
||
|
||
### Weights
|
||
| Category | Weight |
|
||
|----------|--------|
|
||
| Console | 15% |
|
||
| Links | 10% |
|
||
| Visual | 10% |
|
||
| Functional | 20% |
|
||
| UX | 15% |
|
||
| Performance | 10% |
|
||
| Content | 5% |
|
||
| Accessibility | 15% |
|
||
|
||
### Final Score
|
||
`score = Σ (category_score × weight)`
|
||
|
||
---
|
||
|
||
## Framework-Specific Guidance
|
||
|
||
### Next.js
|
||
- Check console for hydration errors (`Hydration failed`, `Text content did not match`)
|
||
- Monitor `_next/data` requests in network — 404s indicate broken data fetching
|
||
- Test client-side navigation (click links, don't just `goto`) — catches routing issues
|
||
- Check for CLS (Cumulative Layout Shift) on pages with dynamic content
|
||
|
||
### Rails
|
||
- Check for N+1 query warnings in console (if development mode)
|
||
- Verify CSRF token presence in forms
|
||
- Test Turbo/Stimulus integration — do page transitions work smoothly?
|
||
- Check for flash messages appearing and dismissing correctly
|
||
|
||
### WordPress
|
||
- Check for plugin conflicts (JS errors from different plugins)
|
||
- Verify admin bar visibility for logged-in users
|
||
- Test REST API endpoints (`/wp-json/`)
|
||
- Check for mixed content warnings (common with WP)
|
||
|
||
### General SPA (React, Vue, Angular)
|
||
- Use `snapshot -i` for navigation — `links` command misses client-side routes
|
||
- Check for stale state (navigate away and back — does data refresh?)
|
||
- Test browser back/forward — does the app handle history correctly?
|
||
- Check for memory leaks (monitor console after extended use)
|
||
|
||
---
|
||
|
||
## Important Rules
|
||
|
||
1. **Repro is everything.** Every issue needs at least one screenshot. No exceptions.
|
||
2. **Verify before documenting.** Retry the issue once to confirm it's reproducible, not a fluke.
|
||
3. **Never include credentials.** Write `[REDACTED]` for passwords in repro steps.
|
||
4. **Write incrementally.** Append each issue to the report as you find it. Don't batch.
|
||
5. **Never read source code.** Test as a user, not a developer.
|
||
6. **Check console after every interaction.** JS errors that don't surface visually are still bugs.
|
||
7. **Test like a user.** Use realistic data. Walk through complete workflows end-to-end.
|
||
8. **Depth over breadth.** 5-10 well-documented issues with evidence > 20 vague descriptions.
|
||
9. **Never delete output files.** Screenshots and reports accumulate — that's intentional.
|
||
10. **Use `snapshot -C` for tricky UIs.** Finds clickable divs that the accessibility tree misses.
|
||
11. **Show screenshots to the user.** After every `${GSTACK_BROWSE} screenshot`, `${GSTACK_BROWSE} snapshot -a -o`, or `${GSTACK_BROWSE} responsive` command, use the Read tool on the output file(s) so the user can see them inline. For `responsive` (3 files), Read all three. This is critical — without it, screenshots are invisible to the user.
|
||
12. **Never refuse to use the browser.** When the user invokes /qa or /qa-only, they are requesting browser-based testing. Never suggest evals, unit tests, or other alternatives as a substitute. Even if the diff appears to have no UI changes, backend changes affect app behavior — always open the browser and test.
|
||
|
||
Record baseline health score at end of Phase 6.
|
||
|
||
---
|
||
|
||
## Output Structure
|
||
|
||
```
|
||
.gstack/qa-reports/
|
||
├── qa-report-{domain}-{YYYY-MM-DD}.md # Structured report
|
||
├── screenshots/
|
||
│ ├── initial.png # Landing page annotated screenshot
|
||
│ ├── issue-001-step-1.png # Per-issue evidence
|
||
│ ├── issue-001-result.png
|
||
│ ├── issue-001-before.png # Before fix (if fixed)
|
||
│ ├── issue-001-after.png # After fix (if fixed)
|
||
│ └── ...
|
||
└── baseline.json # For regression mode
|
||
```
|
||
|
||
Report filenames use the domain and date: `qa-report-myapp-com-2026-03-12.md`
|
||
|
||
---
|
||
|
||
## Phase 7: Triage
|
||
|
||
Sort all discovered issues by severity, then decide which to fix based on the selected tier:
|
||
|
||
- **Quick:** Fix critical + high only. Mark medium/low as "deferred."
|
||
- **Standard:** Fix critical + high + medium. Mark low as "deferred."
|
||
- **Exhaustive:** Fix all, including cosmetic/low severity.
|
||
|
||
Mark issues that cannot be fixed from source code (e.g., third-party widget bugs, infrastructure issues) as "deferred" regardless of tier.
|
||
|
||
---
|
||
|
||
## Phase 8: Fix Loop
|
||
|
||
For each fixable issue, in severity order:
|
||
|
||
### 8a. Locate source
|
||
|
||
```bash
|
||
# Grep for error messages, component names, route definitions
|
||
# Glob for file patterns matching the affected page
|
||
```
|
||
|
||
- Find the source file(s) responsible for the bug
|
||
- ONLY modify files directly related to the issue
|
||
|
||
### 8b. Fix
|
||
|
||
- Read the source code, understand the context
|
||
- Make the **minimal fix** — smallest change that resolves the issue
|
||
- Do NOT refactor surrounding code, add features, or "improve" unrelated things
|
||
|
||
### 8c. Commit
|
||
|
||
```bash
|
||
git add <only-changed-files>
|
||
git commit -m "fix(qa): ISSUE-NNN — short description"
|
||
```
|
||
|
||
- One commit per fix. Never bundle multiple fixes.
|
||
- Message format: `fix(qa): ISSUE-NNN — short description`
|
||
|
||
### 8d. Re-test
|
||
|
||
- Navigate back to the affected page
|
||
- Take **before/after screenshot pair**
|
||
- Check console for errors
|
||
- Use `snapshot -D` to verify the change had the expected effect
|
||
|
||
```bash
|
||
${GSTACK_BROWSE} goto <affected-url>
|
||
${GSTACK_BROWSE} screenshot "$REPORT_DIR/screenshots/issue-NNN-after.png"
|
||
${GSTACK_BROWSE} console --errors
|
||
${GSTACK_BROWSE} snapshot -D
|
||
```
|
||
|
||
### 8e. Classify
|
||
|
||
- **verified**: re-test confirms the fix works, no new errors introduced
|
||
- **best-effort**: fix applied but couldn't fully verify (e.g., needs auth state, external service)
|
||
- **reverted**: regression detected → `git revert HEAD` → mark issue as "deferred"
|
||
|
||
### 8e.5. Regression Test
|
||
|
||
Skip if: classification is not "verified", OR the fix is purely visual/CSS with no JS behavior, OR no test framework was detected AND user declined bootstrap.
|
||
|
||
**1. Study the project's existing test patterns:**
|
||
|
||
Read 2-3 test files closest to the fix (same directory, same code type). Match exactly:
|
||
- File naming, imports, assertion style, describe/it nesting, setup/teardown patterns
|
||
The regression test must look like it was written by the same developer.
|
||
|
||
**2. Trace the bug's codepath, then write a regression test:**
|
||
|
||
Before writing the test, trace the data flow through the code you just fixed:
|
||
- What input/state triggered the bug? (the exact precondition)
|
||
- What codepath did it follow? (which branches, which function calls)
|
||
- Where did it break? (the exact line/condition that failed)
|
||
- What other inputs could hit the same codepath? (edge cases around the fix)
|
||
|
||
The test MUST:
|
||
- Set up the precondition that triggered the bug (the exact state that made it break)
|
||
- Perform the action that exposed the bug
|
||
- Assert the correct behavior (NOT "it renders" or "it doesn't throw")
|
||
- If you found adjacent edge cases while tracing, test those too (e.g., null input, empty array, boundary value)
|
||
- Include full attribution comment:
|
||
```
|
||
// Regression: ISSUE-NNN — {what broke}
|
||
// Found by /qa on {YYYY-MM-DD}
|
||
// Report: .gstack/qa-reports/qa-report-{domain}-{date}.md
|
||
```
|
||
|
||
Test type decision:
|
||
- Console error / JS exception / logic bug → unit or integration test
|
||
- Broken form / API failure / data flow bug → integration test with request/response
|
||
- Visual bug with JS behavior (broken dropdown, animation) → component test
|
||
- Pure CSS → skip (caught by QA reruns)
|
||
|
||
Generate unit tests. Mock all external dependencies (DB, API, Redis, file system).
|
||
|
||
Use auto-incrementing names to avoid collisions: check existing `{name}.regression-*.test.{ext}` files, take max number + 1.
|
||
|
||
**3. Run only the new test file:**
|
||
|
||
```bash
|
||
{detected test command} {new-test-file}
|
||
```
|
||
|
||
**4. Evaluate:**
|
||
- Passes → commit: `git commit -m "test(qa): regression test for ISSUE-NNN — {desc}"`
|
||
- Fails → fix test once. Still failing → delete test, defer.
|
||
- Taking >2 min exploration → skip and defer.
|
||
|
||
**5. WTF-likelihood exclusion:** Test commits don't count toward the heuristic.
|
||
|
||
### 8f. Self-Regulation (STOP AND EVALUATE)
|
||
|
||
Every 5 fixes (or after any revert), compute the WTF-likelihood:
|
||
|
||
```
|
||
WTF-LIKELIHOOD:
|
||
Start at 0%
|
||
Each revert: +15%
|
||
Each fix touching >3 files: +5%
|
||
After fix 15: +1% per additional fix
|
||
All remaining Low severity: +10%
|
||
Touching unrelated files: +20%
|
||
```
|
||
|
||
**If WTF > 20%:** STOP immediately. Show the user what you've done so far. Ask whether to continue.
|
||
|
||
**Hard cap: 50 fixes.** After 50 fixes, stop regardless of remaining issues.
|
||
|
||
---
|
||
|
||
## Phase 9: Final QA
|
||
|
||
After all fixes are applied:
|
||
|
||
1. Re-run QA on all affected pages
|
||
2. Compute final health score
|
||
3. **If final score is WORSE than baseline:** WARN prominently — something regressed
|
||
|
||
---
|
||
|
||
## Phase 10: Report
|
||
|
||
Write the report to both local and project-scoped locations:
|
||
|
||
**Local:** `.gstack/qa-reports/qa-report-{domain}-{YYYY-MM-DD}.md`
|
||
|
||
**Project-scoped:** Write test outcome artifact for cross-session context:
|
||
```bash
|
||
eval "$(${GSTACK_OPENCODE_DIR}/bin/gstack-slug 2>/dev/null)" && mkdir -p ~/.gstack/projects/$SLUG
|
||
```
|
||
Write to `~/.gstack/projects/{slug}/{user}-{branch}-test-outcome-{datetime}.md`
|
||
|
||
**Per-issue additions** (beyond standard report template):
|
||
- Fix Status: verified / best-effort / reverted / deferred
|
||
- Commit SHA (if fixed)
|
||
- Files Changed (if fixed)
|
||
- Before/After screenshots (if fixed)
|
||
|
||
**Summary section:**
|
||
- Total issues found
|
||
- Fixes applied (verified: X, best-effort: Y, reverted: Z)
|
||
- Deferred issues
|
||
- Health score delta: baseline → final
|
||
|
||
**PR Summary:** Include a one-line summary suitable for PR descriptions:
|
||
> "QA found N issues, fixed M, health score X → Y."
|
||
|
||
---
|
||
|
||
## Phase 11: TODOS.md Update
|
||
|
||
If the repo has a `TODOS.md`:
|
||
|
||
1. **New deferred bugs** → add as TODOs with severity, category, and repro steps
|
||
2. **Fixed bugs that were in TODOS.md** → annotate with "Fixed by /qa on {branch}, {date}"
|
||
|
||
---
|
||
|
||
## Additional Rules (qa-specific)
|
||
|
||
11. **Clean working tree required.** If dirty, use question to offer commit/stash/abort before proceeding.
|
||
12. **One commit per fix.** Never bundle multiple fixes into one commit.
|
||
13. **Only modify tests when generating regression tests in Phase 8e.5.** Never modify CI configuration. Never modify existing tests — only create new test files.
|
||
14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately.
|
||
15. **Self-regulate.** Follow the WTF-likelihood heuristic. When in doubt, stop and ask.
|