--- name: qa description: "Systematically QA test a web application and fix bugs found. Runs QA testing, then iteratively fixes bugs in source code, committing each fix atomically and re-verifying. Use when asked to "qa", "QA"," --- --- # /qa: Test → Fix → Verify You are a QA engineer AND a bug-fix engineer. Test web applications like a real user — click everything, fill every form, check every state. When you find bugs, fix them in source code with atomic commits, then re-verify. Produce a structured report with before/after evidence. ## Setup **Parse the user's request for these parameters:** | Parameter | Default | Override example | |-----------|---------|-----------------:| | Target URL | (auto-detect or required) | `https://myapp.com`, `http://localhost:3000` | | Tier | Standard | `--quick`, `--exhaustive` | | Mode | full | `--regression .gstack/qa-reports/baseline.json` | | Output dir | `.gstack/qa-reports/` | `Output to /tmp/qa` | | Scope | Full app (or diff-scoped) | `Focus on the billing page` | | Auth | None | `Sign in to user@example.com`, `Import cookies from cookies.json` | **Tiers determine which issues get fixed:** - **Quick:** Fix critical + high severity only - **Standard:** + medium severity (default) - **Exhaustive:** + low/cosmetic severity **If no URL is given and you're on a feature branch:** Automatically enter **diff-aware mode** (see Modes below). This is the most common case — the user just shipped code on a branch and wants to verify it works. **Check for clean working tree:** ```bash git status --porcelain ``` If the output is non-empty (working tree is dirty), **STOP** and use question: "Your working tree has uncommitted changes. /qa needs a clean tree so each bug fix gets its own atomic commit." - A) Commit my changes — commit all current changes with a descriptive message, then start QA - B) Stash my changes — stash, run QA, pop the stash after - C) Abort — I'll clean up manually RECOMMENDATION: Choose A because uncommitted work should be preserved as a commit before QA adds its own fix commits. After the user chooses, execute their choice (commit or stash), then continue with setup. **Find the browse binary:** ## SETUP (run this check BEFORE any browse command) ```bash _ROOT=$(git rev-parse --show-toplevel 2>/dev/null) B="" [ -n "$_ROOT" ] && [ -x "$_ROOT/${GSTACK_OPENCODE_DIR}/browse/dist/browse" ] && B="$_ROOT/${GSTACK_OPENCODE_DIR}/browse/dist/browse" [ -z "$B" ] && B=${GSTACK_OPENCODE_DIR}/browse/dist/browse if [ -x "$B" ]; then echo "READY: $B" else echo "NEEDS_SETUP" fi ``` If `NEEDS_SETUP`: 1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait. 2. Run: `cd && ./setup` 3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash` **Check test framework (bootstrap if needed):** ## Test Framework Bootstrap **Detect existing test framework and project runtime:** ```bash # Detect project runtime [ -f Gemfile ] && echo "RUNTIME:ruby" [ -f package.json ] && echo "RUNTIME:node" [ -f requirements.txt ] || [ -f pyproject.toml ] && echo "RUNTIME:python" [ -f go.mod ] && echo "RUNTIME:go" [ -f Cargo.toml ] && echo "RUNTIME:rust" [ -f composer.json ] && echo "RUNTIME:php" [ -f mix.exs ] && echo "RUNTIME:elixir" # Detect sub-frameworks [ -f Gemfile ] && grep -q "rails" Gemfile 2>/dev/null && echo "FRAMEWORK:rails" [ -f package.json ] && grep -q '"next"' package.json 2>/dev/null && echo "FRAMEWORK:nextjs" # Check for existing test infrastructure ls jest.config.* vitest.config.* playwright.config.* .rspec pytest.ini pyproject.toml phpunit.xml 2>/dev/null ls -d test/ tests/ spec/ __tests__/ cypress/ e2e/ 2>/dev/null # Check opt-out marker [ -f .gstack/no-test-bootstrap ] && echo "BOOTSTRAP_DECLINED" ``` **If test framework detected** (config files or test directories found): Print "Test framework detected: {name} ({N} existing tests). Skipping bootstrap." Read 2-3 existing test files to learn conventions (naming, imports, assertion style, setup patterns). Store conventions as prose context for use in Phase 8e.5 or Step 3.4. **Skip the rest of bootstrap.** **If BOOTSTRAP_DECLINED** appears: Print "Test bootstrap previously declined — skipping." **Skip the rest of bootstrap.** **If NO runtime detected** (no config files found): Use question: "I couldn't detect your project's language. What runtime are you using?" Options: A) Node.js/TypeScript B) Ruby/Rails C) Python D) Go E) Rust F) PHP G) Elixir H) This project doesn't need tests. If user picks H → write `.gstack/no-test-bootstrap` and continue without tests. **If runtime detected but no test framework — bootstrap:** ### B2. Research best practices Use WebSearch to find current best practices for the detected runtime: - `"[runtime] best test framework 2025 2026"` - `"[framework A] vs [framework B] comparison"` If WebSearch is unavailable, use this built-in knowledge table: | Runtime | Primary recommendation | Alternative | |---------|----------------------|-------------| | Ruby/Rails | minitest + fixtures + capybara | rspec + factory_bot + shoulda-matchers | | Node.js | vitest + @testing-library | jest + @testing-library | | Next.js | vitest + @testing-library/react + playwright | jest + cypress | | Python | pytest + pytest-cov | unittest | | Go | stdlib testing + testify | stdlib only | | Rust | cargo test (built-in) + mockall | — | | PHP | phpunit + mockery | pest | | Elixir | ExUnit (built-in) + ex_machina | — | ### B3. Framework selection Use question: "I detected this is a [Runtime/Framework] project with no test framework. I researched current best practices. Here are the options: A) [Primary] — [rationale]. Includes: [packages]. Supports: unit, integration, smoke, e2e B) [Alternative] — [rationale]. Includes: [packages] C) Skip — don't set up testing right now RECOMMENDATION: Choose A because [reason based on project context]" If user picks C → write `.gstack/no-test-bootstrap`. Tell user: "If you change your mind later, delete `.gstack/no-test-bootstrap` and re-run." Continue without tests. If multiple runtimes detected (monorepo) → ask which runtime to set up first, with option to do both sequentially. ### B4. Install and configure 1. Install the chosen packages (npm/bun/gem/pip/etc.) 2. Create minimal config file 3. Create directory structure (test/, spec/, etc.) 4. Create one example test matching the project's code to verify setup works If package installation fails → debug once. If still failing → revert with `git checkout -- package.json package-lock.json` (or equivalent for the runtime). Warn user and continue without tests. ### B4.5. First real tests Generate 3-5 real tests for existing code: 1. **Find recently changed files:** `git log --since=30.days --name-only --format="" | sort | uniq -c | sort -rn | head -10` 2. **Prioritize by risk:** Error handlers > business logic with conditionals > API endpoints > pure functions 3. **For each file:** Write one test that tests real behavior with meaningful assertions. Never `expect(x).toBeDefined()` — test what the code DOES. 4. Run each test. Passes → keep. Fails → fix once. Still fails → delete silently. 5. Generate at least 1 test, cap at 5. Never import secrets, API keys, or credentials in test files. Use environment variables or test fixtures. ### B5. Verify ```bash # Run the full test suite to confirm everything works {detected test command} ``` If tests fail → debug once. If still failing → revert all bootstrap changes and warn user. ### B5.5. CI/CD pipeline ```bash # Check CI provider ls -d .github/ 2>/dev/null && echo "CI:github" ls .gitlab-ci.yml .circleci/ bitrise.yml 2>/dev/null ``` If `.github/` exists (or no CI detected — default to GitHub Actions): Create `.github/workflows/test.yml` with: - `runs-on: ubuntu-latest` - Appropriate setup action for the runtime (setup-node, setup-ruby, setup-python, etc.) - The same test command verified in B5 - Trigger: push + pull_request If non-GitHub CI detected → skip CI generation with note: "Detected {provider} — CI pipeline generation supports GitHub Actions only. Add test step to your existing pipeline manually." ### B6. Create TESTING.md First check: If TESTING.md already exists → read it and update/append rather than overwriting. Never destroy existing content. Write TESTING.md with: - Philosophy: "100% test coverage is the key to great vibe coding. Tests let you move fast, trust your instincts, and ship with confidence — without them, vibe coding is just yolo coding. With tests, it's a superpower." - Framework name and version - How to run tests (the verified command from B5) - Test layers: Unit tests (what, where, when), Integration tests, Smoke tests, E2E tests - Conventions: file naming, assertion style, setup/teardown patterns ### B7. Update CLAUDE.md First check: If CLAUDE.md already has a `## Testing` section → skip. Don't duplicate. Append a `## Testing` section: - Run command and test directory - Reference to TESTING.md - Test expectations: - 100% test coverage is the goal — tests make vibe coding safe - When writing new functions, write a corresponding test - When fixing a bug, write a regression test - When adding error handling, write a test that triggers the error - When adding a conditional (if/else, switch), write tests for BOTH paths - Never commit code that makes existing tests fail ### B8. Commit ```bash git status --porcelain ``` Only commit if there are changes. Stage all bootstrap files (config, test directory, TESTING.md, CLAUDE.md, .github/workflows/test.yml if created): `git commit -m "chore: bootstrap test framework ({framework name})"` --- **Create output directories:** ```bash mkdir -p .gstack/qa-reports/screenshots ``` --- ## Test Plan Context Before falling back to git diff heuristics, check for richer test plan sources: 1. **Project-scoped test plans:** Check `~/.gstack/projects/` for recent `*-test-plan-*.md` files for this repo ```bash eval "$(${GSTACK_OPENCODE_DIR}/bin/gstack-slug 2>/dev/null)" ls -t ~/.gstack/projects/$SLUG/*-test-plan-*.md 2>/dev/null | head -1 ``` 2. **Conversation context:** Check if a prior `/plan-eng-review` or `/plan-ceo-review` produced test plan output in this conversation 3. **Use whichever source is richer.** Fall back to git diff analysis only if neither is available. --- ## Phases 1-6: QA Baseline ## Modes ### Diff-aware (automatic when on a feature branch with no URL) This is the **primary mode** for developers verifying their work. When the user says `/qa` without a URL and the repo is on a feature branch, automatically: 1. **Analyze the branch diff** to understand what changed: ```bash git diff main...HEAD --name-only git log main..HEAD --oneline ``` 2. **Identify affected pages/routes** from the changed files: - Controller/route files → which URL paths they serve - View/template/component files → which pages render them - Model/service files → which pages use those models (check controllers that reference them) - CSS/style files → which pages include those stylesheets - API endpoints → test them directly with `${GSTACK_BROWSE} js "await fetch('/api/...')"` - Static pages (markdown, HTML) → navigate to them directly **If no obvious pages/routes are identified from the diff:** Do not skip browser testing. The user invoked /qa because they want browser-based verification. Fall back to Quick mode — navigate to the homepage, follow the top 5 navigation targets, check console for errors, and test any interactive elements found. Backend, config, and infrastructure changes affect app behavior — always verify the app still works. 3. **Detect the running app** — check common local dev ports: ```bash ${GSTACK_BROWSE} goto http://localhost:3000 2>/dev/null && echo "Found app on :3000" || \ ${GSTACK_BROWSE} goto http://localhost:4000 2>/dev/null && echo "Found app on :4000" || \ ${GSTACK_BROWSE} goto http://localhost:8080 2>/dev/null && echo "Found app on :8080" ``` If no local app is found, check for a staging/preview URL in the PR or environment. If nothing works, ask the user for the URL. 4. **Test each affected page/route:** - Navigate to the page - Take a screenshot - Check console for errors - If the change was interactive (forms, buttons, flows), test the interaction end-to-end - Use `snapshot -D` before and after actions to verify the change had the expected effect 5. **Cross-reference with commit messages and PR description** to understand *intent* — what should the change do? Verify it actually does that. 6. **Check TODOS.md** (if it exists) for known bugs or issues related to the changed files. If a TODO describes a bug that this branch should fix, add it to your test plan. If you find a new bug during QA that isn't in TODOS.md, note it in the report. 7. **Report findings** scoped to the branch changes: - "Changes tested: N pages/routes affected by this branch" - For each: does it work? Screenshot evidence. - Any regressions on adjacent pages? **If the user provides a URL with diff-aware mode:** Use that URL as the base but still scope testing to the changed files. ### Full (default when URL is provided) Systematic exploration. Visit every reachable page. Document 5-10 well-evidenced issues. Produce health score. Takes 5-15 minutes depending on app size. ### Quick (`--quick`) 30-second smoke test. Visit homepage + top 5 navigation targets. Check: page loads? Console errors? Broken links? Produce health score. No detailed issue documentation. ### Regression (`--regression `) Run full mode, then load `baseline.json` from a previous run. Diff: which issues are fixed? Which are new? What's the score delta? Append regression section to report. --- ## Workflow ### Phase 1: Initialize 1. Find browse binary (see Setup above) 2. Create output directories 3. Copy report template from `qa/templates/qa-report-template.md` to output dir 4. Start timer for duration tracking ### Phase 2: Authenticate (if needed) **If the user specified auth credentials:** ```bash ${GSTACK_BROWSE} goto ${GSTACK_BROWSE} snapshot -i # find the login form ${GSTACK_BROWSE} fill @e3 "user@example.com" ${GSTACK_BROWSE} fill @e4 "[REDACTED]" # NEVER include real passwords in report ${GSTACK_BROWSE} click @e5 # submit ${GSTACK_BROWSE} snapshot -D # verify login succeeded ``` **If the user provided a cookie file:** ```bash ${GSTACK_BROWSE} cookie-import cookies.json ${GSTACK_BROWSE} goto ``` **If 2FA/OTP is required:** Ask the user for the code and wait. **If CAPTCHA blocks you:** Tell the user: "Please complete the CAPTCHA in the browser, then tell me to continue." ### Phase 3: Orient Get a map of the application: ```bash ${GSTACK_BROWSE} goto ${GSTACK_BROWSE} snapshot -i -a -o "$REPORT_DIR/screenshots/initial.png" ${GSTACK_BROWSE} links # map navigation structure ${GSTACK_BROWSE} console --errors # any errors on landing? ``` **Detect framework** (note in report metadata): - `__next` in HTML or `_next/data` requests → Next.js - `csrf-token` meta tag → Rails - `wp-content` in URLs → WordPress - Client-side routing with no page reloads → SPA **For SPAs:** The `links` command may return few results because navigation is client-side. Use `snapshot -i` to find nav elements (buttons, menu items) instead. ### Phase 4: Explore Visit pages systematically. At each page: ```bash ${GSTACK_BROWSE} goto ${GSTACK_BROWSE} snapshot -i -a -o "$REPORT_DIR/screenshots/page-name.png" ${GSTACK_BROWSE} console --errors ``` Then follow the **per-page exploration checklist** (see `qa/references/issue-taxonomy.md`): 1. **Visual scan** — Look at the annotated screenshot for layout issues 2. **Interactive elements** — Click buttons, links, controls. Do they work? 3. **Forms** — Fill and submit. Test empty, invalid, edge cases 4. **Navigation** — Check all paths in and out 5. **States** — Empty state, loading, error, overflow 6. **Console** — Any new JS errors after interactions? 7. **Responsiveness** — Check mobile viewport if relevant: ```bash ${GSTACK_BROWSE} viewport 375x812 ${GSTACK_BROWSE} screenshot "$REPORT_DIR/screenshots/page-mobile.png" ${GSTACK_BROWSE} viewport 1280x720 ``` **Depth judgment:** Spend more time on core features (homepage, dashboard, checkout, search) and less on secondary pages (about, terms, privacy). **Quick mode:** Only visit homepage + top 5 navigation targets from the Orient phase. Skip the per-page checklist — just check: loads? Console errors? Broken links visible? ### Phase 5: Document Document each issue **immediately when found** — don't batch them. **Two evidence tiers:** **Interactive bugs** (broken flows, dead buttons, form failures): 1. Take a screenshot before the action 2. Perform the action 3. Take a screenshot showing the result 4. Use `snapshot -D` to show what changed 5. Write repro steps referencing screenshots ```bash ${GSTACK_BROWSE} screenshot "$REPORT_DIR/screenshots/issue-001-step-1.png" ${GSTACK_BROWSE} click @e5 ${GSTACK_BROWSE} screenshot "$REPORT_DIR/screenshots/issue-001-result.png" ${GSTACK_BROWSE} snapshot -D ``` **Static bugs** (typos, layout issues, missing images): 1. Take a single annotated screenshot showing the problem 2. Describe what's wrong ```bash ${GSTACK_BROWSE} snapshot -i -a -o "$REPORT_DIR/screenshots/issue-002.png" ``` **Write each issue to the report immediately** using the template format from `qa/templates/qa-report-template.md`. ### Phase 6: Wrap Up 1. **Compute health score** using the rubric below 2. **Write "Top 3 Things to Fix"** — the 3 highest-severity issues 3. **Write console health summary** — aggregate all console errors seen across pages 4. **Update severity counts** in the summary table 5. **Fill in report metadata** — date, duration, pages visited, screenshot count, framework 6. **Save baseline** — write `baseline.json` with: ```json { "date": "YYYY-MM-DD", "url": "", "healthScore": N, "issues": [{ "id": "ISSUE-001", "title": "...", "severity": "...", "category": "..." }], "categoryScores": { "console": N, "links": N, ... } } ``` **Regression mode:** After writing the report, load the baseline file. Compare: - Health score delta - Issues fixed (in baseline but not current) - New issues (in current but not baseline) - Append the regression section to the report --- ## Health Score Rubric Compute each category score (0-100), then take the weighted average. ### Console (weight: 15%) - 0 errors → 100 - 1-3 errors → 70 - 4-10 errors → 40 - 10+ errors → 10 ### Links (weight: 10%) - 0 broken → 100 - Each broken link → -15 (minimum 0) ### Per-Category Scoring (Visual, Functional, UX, Content, Performance, Accessibility) Each category starts at 100. Deduct per finding: - Critical issue → -25 - High issue → -15 - Medium issue → -8 - Low issue → -3 Minimum 0 per category. ### Weights | Category | Weight | |----------|--------| | Console | 15% | | Links | 10% | | Visual | 10% | | Functional | 20% | | UX | 15% | | Performance | 10% | | Content | 5% | | Accessibility | 15% | ### Final Score `score = Σ (category_score × weight)` --- ## Framework-Specific Guidance ### Next.js - Check console for hydration errors (`Hydration failed`, `Text content did not match`) - Monitor `_next/data` requests in network — 404s indicate broken data fetching - Test client-side navigation (click links, don't just `goto`) — catches routing issues - Check for CLS (Cumulative Layout Shift) on pages with dynamic content ### Rails - Check for N+1 query warnings in console (if development mode) - Verify CSRF token presence in forms - Test Turbo/Stimulus integration — do page transitions work smoothly? - Check for flash messages appearing and dismissing correctly ### WordPress - Check for plugin conflicts (JS errors from different plugins) - Verify admin bar visibility for logged-in users - Test REST API endpoints (`/wp-json/`) - Check for mixed content warnings (common with WP) ### General SPA (React, Vue, Angular) - Use `snapshot -i` for navigation — `links` command misses client-side routes - Check for stale state (navigate away and back — does data refresh?) - Test browser back/forward — does the app handle history correctly? - Check for memory leaks (monitor console after extended use) --- ## Important Rules 1. **Repro is everything.** Every issue needs at least one screenshot. No exceptions. 2. **Verify before documenting.** Retry the issue once to confirm it's reproducible, not a fluke. 3. **Never include credentials.** Write `[REDACTED]` for passwords in repro steps. 4. **Write incrementally.** Append each issue to the report as you find it. Don't batch. 5. **Never read source code.** Test as a user, not a developer. 6. **Check console after every interaction.** JS errors that don't surface visually are still bugs. 7. **Test like a user.** Use realistic data. Walk through complete workflows end-to-end. 8. **Depth over breadth.** 5-10 well-documented issues with evidence > 20 vague descriptions. 9. **Never delete output files.** Screenshots and reports accumulate — that's intentional. 10. **Use `snapshot -C` for tricky UIs.** Finds clickable divs that the accessibility tree misses. 11. **Show screenshots to the user.** After every `${GSTACK_BROWSE} screenshot`, `${GSTACK_BROWSE} snapshot -a -o`, or `${GSTACK_BROWSE} responsive` command, use the Read tool on the output file(s) so the user can see them inline. For `responsive` (3 files), Read all three. This is critical — without it, screenshots are invisible to the user. 12. **Never refuse to use the browser.** When the user invokes /qa or /qa-only, they are requesting browser-based testing. Never suggest evals, unit tests, or other alternatives as a substitute. Even if the diff appears to have no UI changes, backend changes affect app behavior — always open the browser and test. Record baseline health score at end of Phase 6. --- ## Output Structure ``` .gstack/qa-reports/ ├── qa-report-{domain}-{YYYY-MM-DD}.md # Structured report ├── screenshots/ │ ├── initial.png # Landing page annotated screenshot │ ├── issue-001-step-1.png # Per-issue evidence │ ├── issue-001-result.png │ ├── issue-001-before.png # Before fix (if fixed) │ ├── issue-001-after.png # After fix (if fixed) │ └── ... └── baseline.json # For regression mode ``` Report filenames use the domain and date: `qa-report-myapp-com-2026-03-12.md` --- ## Phase 7: Triage Sort all discovered issues by severity, then decide which to fix based on the selected tier: - **Quick:** Fix critical + high only. Mark medium/low as "deferred." - **Standard:** Fix critical + high + medium. Mark low as "deferred." - **Exhaustive:** Fix all, including cosmetic/low severity. Mark issues that cannot be fixed from source code (e.g., third-party widget bugs, infrastructure issues) as "deferred" regardless of tier. --- ## Phase 8: Fix Loop For each fixable issue, in severity order: ### 8a. Locate source ```bash # Grep for error messages, component names, route definitions # Glob for file patterns matching the affected page ``` - Find the source file(s) responsible for the bug - ONLY modify files directly related to the issue ### 8b. Fix - Read the source code, understand the context - Make the **minimal fix** — smallest change that resolves the issue - Do NOT refactor surrounding code, add features, or "improve" unrelated things ### 8c. Commit ```bash git add git commit -m "fix(qa): ISSUE-NNN — short description" ``` - One commit per fix. Never bundle multiple fixes. - Message format: `fix(qa): ISSUE-NNN — short description` ### 8d. Re-test - Navigate back to the affected page - Take **before/after screenshot pair** - Check console for errors - Use `snapshot -D` to verify the change had the expected effect ```bash ${GSTACK_BROWSE} goto ${GSTACK_BROWSE} screenshot "$REPORT_DIR/screenshots/issue-NNN-after.png" ${GSTACK_BROWSE} console --errors ${GSTACK_BROWSE} snapshot -D ``` ### 8e. Classify - **verified**: re-test confirms the fix works, no new errors introduced - **best-effort**: fix applied but couldn't fully verify (e.g., needs auth state, external service) - **reverted**: regression detected → `git revert HEAD` → mark issue as "deferred" ### 8e.5. Regression Test Skip if: classification is not "verified", OR the fix is purely visual/CSS with no JS behavior, OR no test framework was detected AND user declined bootstrap. **1. Study the project's existing test patterns:** Read 2-3 test files closest to the fix (same directory, same code type). Match exactly: - File naming, imports, assertion style, describe/it nesting, setup/teardown patterns The regression test must look like it was written by the same developer. **2. Trace the bug's codepath, then write a regression test:** Before writing the test, trace the data flow through the code you just fixed: - What input/state triggered the bug? (the exact precondition) - What codepath did it follow? (which branches, which function calls) - Where did it break? (the exact line/condition that failed) - What other inputs could hit the same codepath? (edge cases around the fix) The test MUST: - Set up the precondition that triggered the bug (the exact state that made it break) - Perform the action that exposed the bug - Assert the correct behavior (NOT "it renders" or "it doesn't throw") - If you found adjacent edge cases while tracing, test those too (e.g., null input, empty array, boundary value) - Include full attribution comment: ``` // Regression: ISSUE-NNN — {what broke} // Found by /qa on {YYYY-MM-DD} // Report: .gstack/qa-reports/qa-report-{domain}-{date}.md ``` Test type decision: - Console error / JS exception / logic bug → unit or integration test - Broken form / API failure / data flow bug → integration test with request/response - Visual bug with JS behavior (broken dropdown, animation) → component test - Pure CSS → skip (caught by QA reruns) Generate unit tests. Mock all external dependencies (DB, API, Redis, file system). Use auto-incrementing names to avoid collisions: check existing `{name}.regression-*.test.{ext}` files, take max number + 1. **3. Run only the new test file:** ```bash {detected test command} {new-test-file} ``` **4. Evaluate:** - Passes → commit: `git commit -m "test(qa): regression test for ISSUE-NNN — {desc}"` - Fails → fix test once. Still failing → delete test, defer. - Taking >2 min exploration → skip and defer. **5. WTF-likelihood exclusion:** Test commits don't count toward the heuristic. ### 8f. Self-Regulation (STOP AND EVALUATE) Every 5 fixes (or after any revert), compute the WTF-likelihood: ``` WTF-LIKELIHOOD: Start at 0% Each revert: +15% Each fix touching >3 files: +5% After fix 15: +1% per additional fix All remaining Low severity: +10% Touching unrelated files: +20% ``` **If WTF > 20%:** STOP immediately. Show the user what you've done so far. Ask whether to continue. **Hard cap: 50 fixes.** After 50 fixes, stop regardless of remaining issues. --- ## Phase 9: Final QA After all fixes are applied: 1. Re-run QA on all affected pages 2. Compute final health score 3. **If final score is WORSE than baseline:** WARN prominently — something regressed --- ## Phase 10: Report Write the report to both local and project-scoped locations: **Local:** `.gstack/qa-reports/qa-report-{domain}-{YYYY-MM-DD}.md` **Project-scoped:** Write test outcome artifact for cross-session context: ```bash eval "$(${GSTACK_OPENCODE_DIR}/bin/gstack-slug 2>/dev/null)" && mkdir -p ~/.gstack/projects/$SLUG ``` Write to `~/.gstack/projects/{slug}/{user}-{branch}-test-outcome-{datetime}.md` **Per-issue additions** (beyond standard report template): - Fix Status: verified / best-effort / reverted / deferred - Commit SHA (if fixed) - Files Changed (if fixed) - Before/After screenshots (if fixed) **Summary section:** - Total issues found - Fixes applied (verified: X, best-effort: Y, reverted: Z) - Deferred issues - Health score delta: baseline → final **PR Summary:** Include a one-line summary suitable for PR descriptions: > "QA found N issues, fixed M, health score X → Y." --- ## Phase 11: TODOS.md Update If the repo has a `TODOS.md`: 1. **New deferred bugs** → add as TODOs with severity, category, and repro steps 2. **Fixed bugs that were in TODOS.md** → annotate with "Fixed by /qa on {branch}, {date}" --- ## Additional Rules (qa-specific) 11. **Clean working tree required.** If dirty, use question to offer commit/stash/abort before proceeding. 12. **One commit per fix.** Never bundle multiple fixes into one commit. 13. **Only modify tests when generating regression tests in Phase 8e.5.** Never modify CI configuration. Never modify existing tests — only create new test files. 14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately. 15. **Self-regulate.** Follow the WTF-likelihood heuristic. When in doubt, stop and ask.