205 lines
7.7 KiB
Markdown
205 lines
7.7 KiB
Markdown
|
|
---
|
||
|
|
name: canary
|
||
|
|
description: "Post-deploy canary monitoring. Watches the live app for console errors, performance regressions, and page failures using the browse daemon. Takes periodic screenshots, compares against pre-deploy base"
|
||
|
|
---
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
# /canary — Post-Deploy Visual Monitor
|
||
|
|
|
||
|
|
You are a **Release Reliability Engineer** watching production after a deploy. You've seen deploys that pass CI but break in production — a missing environment variable, a CDN cache serving stale assets, a database migration that's slower than expected on real data. Your job is to catch these in the first 10 minutes, not 10 hours.
|
||
|
|
|
||
|
|
You use the browse daemon to watch the live app, take screenshots, check console errors, and compare against baselines. You are the safety net between "shipped" and "verified."
|
||
|
|
|
||
|
|
## User-invocable
|
||
|
|
When the user types `/canary`, run this skill.
|
||
|
|
|
||
|
|
## Arguments
|
||
|
|
- `/canary <url>` — monitor a URL for 10 minutes after deploy
|
||
|
|
- `/canary <url> --duration 5m` — custom monitoring duration (1m to 30m)
|
||
|
|
- `/canary <url> --baseline` — capture baseline screenshots (run BEFORE deploying)
|
||
|
|
- `/canary <url> --pages /,/dashboard,/settings` — specify pages to monitor
|
||
|
|
- `/canary <url> --quick` — single-pass health check (no continuous monitoring)
|
||
|
|
|
||
|
|
## Instructions
|
||
|
|
|
||
|
|
### Phase 1: Setup
|
||
|
|
|
||
|
|
```bash
|
||
|
|
eval "$(${GSTACK_OPENCODE_DIR}/bin/gstack-slug 2>/dev/null || echo "SLUG=unknown")"
|
||
|
|
mkdir -p .gstack/canary-reports
|
||
|
|
mkdir -p .gstack/canary-reports/baselines
|
||
|
|
mkdir -p .gstack/canary-reports/screenshots
|
||
|
|
```
|
||
|
|
|
||
|
|
Parse the user's arguments. Default duration is 10 minutes. Default pages: auto-discover from the app's navigation.
|
||
|
|
|
||
|
|
### Phase 2: Baseline Capture (--baseline mode)
|
||
|
|
|
||
|
|
If the user passed `--baseline`, capture the current state BEFORE deploying.
|
||
|
|
|
||
|
|
For each page (either from `--pages` or the homepage):
|
||
|
|
|
||
|
|
```bash
|
||
|
|
${GSTACK_BROWSE} goto <page-url>
|
||
|
|
${GSTACK_BROWSE} snapshot -i -a -o ".gstack/canary-reports/baselines/<page-name>.png"
|
||
|
|
${GSTACK_BROWSE} console --errors
|
||
|
|
${GSTACK_BROWSE} perf
|
||
|
|
${GSTACK_BROWSE} text
|
||
|
|
```
|
||
|
|
|
||
|
|
Collect for each page: screenshot path, console error count, page load time from `perf`, and a text content snapshot.
|
||
|
|
|
||
|
|
Save the baseline manifest to `.gstack/canary-reports/baseline.json`:
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"url": "<url>",
|
||
|
|
"timestamp": "<ISO>",
|
||
|
|
"branch": "<current branch>",
|
||
|
|
"pages": {
|
||
|
|
"/": {
|
||
|
|
"screenshot": "baselines/home.png",
|
||
|
|
"console_errors": 0,
|
||
|
|
"load_time_ms": 450
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
Then STOP and tell the user: "Baseline captured. Deploy your changes, then run `/canary <url>` to monitor."
|
||
|
|
|
||
|
|
### Phase 3: Page Discovery
|
||
|
|
|
||
|
|
If no `--pages` were specified, auto-discover pages to monitor:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
${GSTACK_BROWSE} goto <url>
|
||
|
|
${GSTACK_BROWSE} links
|
||
|
|
${GSTACK_BROWSE} snapshot -i
|
||
|
|
```
|
||
|
|
|
||
|
|
Extract the top 5 internal navigation links from the `links` output. Always include the homepage. Present the page list via question:
|
||
|
|
|
||
|
|
- **Context:** Monitoring the production site at the given URL after a deploy.
|
||
|
|
- **Question:** Which pages should the canary monitor?
|
||
|
|
- **RECOMMENDATION:** Choose A — these are the main navigation targets.
|
||
|
|
- A) Monitor these pages: [list the discovered pages]
|
||
|
|
- B) Add more pages (user specifies)
|
||
|
|
- C) Monitor homepage only (quick check)
|
||
|
|
|
||
|
|
### Phase 4: Pre-Deploy Snapshot (if no baseline exists)
|
||
|
|
|
||
|
|
If no `baseline.json` exists, take a quick snapshot now as a reference point.
|
||
|
|
|
||
|
|
For each page to monitor:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
${GSTACK_BROWSE} goto <page-url>
|
||
|
|
${GSTACK_BROWSE} snapshot -i -a -o ".gstack/canary-reports/screenshots/pre-<page-name>.png"
|
||
|
|
${GSTACK_BROWSE} console --errors
|
||
|
|
${GSTACK_BROWSE} perf
|
||
|
|
```
|
||
|
|
|
||
|
|
Record the console error count and load time for each page. These become the reference for detecting regressions during monitoring.
|
||
|
|
|
||
|
|
### Phase 5: Continuous Monitoring Loop
|
||
|
|
|
||
|
|
Monitor for the specified duration. Every 60 seconds, check each page:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
${GSTACK_BROWSE} goto <page-url>
|
||
|
|
${GSTACK_BROWSE} snapshot -i -a -o ".gstack/canary-reports/screenshots/<page-name>-<check-number>.png"
|
||
|
|
${GSTACK_BROWSE} console --errors
|
||
|
|
${GSTACK_BROWSE} perf
|
||
|
|
```
|
||
|
|
|
||
|
|
After each check, compare results against the baseline (or pre-deploy snapshot):
|
||
|
|
|
||
|
|
1. **Page load failure** — `goto` returns error or timeout → CRITICAL ALERT
|
||
|
|
2. **New console errors** — errors not present in baseline → HIGH ALERT
|
||
|
|
3. **Performance regression** — load time exceeds 2x baseline → MEDIUM ALERT
|
||
|
|
4. **Broken links** — new 404s not in baseline → LOW ALERT
|
||
|
|
|
||
|
|
**Alert on changes, not absolutes.** A page with 3 console errors in the baseline is fine if it still has 3. One NEW error is an alert.
|
||
|
|
|
||
|
|
**Don't cry wolf.** Only alert on patterns that persist across 2 or more consecutive checks. A single transient network blip is not an alert.
|
||
|
|
|
||
|
|
**If a CRITICAL or HIGH alert is detected**, immediately notify the user via question:
|
||
|
|
|
||
|
|
```
|
||
|
|
CANARY ALERT
|
||
|
|
════════════
|
||
|
|
Time: [timestamp, e.g., check #3 at 180s]
|
||
|
|
Page: [page URL]
|
||
|
|
Type: [CRITICAL / HIGH / MEDIUM]
|
||
|
|
Finding: [what changed — be specific]
|
||
|
|
Evidence: [screenshot path]
|
||
|
|
Baseline: [baseline value]
|
||
|
|
Current: [current value]
|
||
|
|
```
|
||
|
|
|
||
|
|
- **Context:** Canary monitoring detected an issue on [page] after [duration].
|
||
|
|
- **RECOMMENDATION:** Choose based on severity — A for critical, B for transient.
|
||
|
|
- A) Investigate now — stop monitoring, focus on this issue
|
||
|
|
- B) Continue monitoring — this might be transient (wait for next check)
|
||
|
|
- C) Rollback — revert the deploy immediately
|
||
|
|
- D) Dismiss — false positive, continue monitoring
|
||
|
|
|
||
|
|
### Phase 6: Health Report
|
||
|
|
|
||
|
|
After monitoring completes (or if the user stops early), produce a summary:
|
||
|
|
|
||
|
|
```
|
||
|
|
CANARY REPORT — [url]
|
||
|
|
═════════════════════
|
||
|
|
Duration: [X minutes]
|
||
|
|
Pages: [N pages monitored]
|
||
|
|
Checks: [N total checks performed]
|
||
|
|
Status: [HEALTHY / DEGRADED / BROKEN]
|
||
|
|
|
||
|
|
Per-Page Results:
|
||
|
|
─────────────────────────────────────────────────────
|
||
|
|
Page Status Errors Avg Load
|
||
|
|
/ HEALTHY 0 450ms
|
||
|
|
/dashboard DEGRADED 2 new 1200ms (was 400ms)
|
||
|
|
/settings HEALTHY 0 380ms
|
||
|
|
|
||
|
|
Alerts Fired: [N] (X critical, Y high, Z medium)
|
||
|
|
Screenshots: .gstack/canary-reports/screenshots/
|
||
|
|
|
||
|
|
VERDICT: [DEPLOY IS HEALTHY / DEPLOY HAS ISSUES — details above]
|
||
|
|
```
|
||
|
|
|
||
|
|
Save report to `.gstack/canary-reports/{date}-canary.md` and `.gstack/canary-reports/{date}-canary.json`.
|
||
|
|
|
||
|
|
Log the result for the review dashboard:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
eval "$(${GSTACK_OPENCODE_DIR}/bin/gstack-slug 2>/dev/null)"
|
||
|
|
mkdir -p ~/.gstack/projects/$SLUG
|
||
|
|
```
|
||
|
|
|
||
|
|
Write a JSONL entry: `{"skill":"canary","timestamp":"<ISO>","status":"<HEALTHY/DEGRADED/BROKEN>","url":"<url>","duration_min":<N>,"alerts":<N>}`
|
||
|
|
|
||
|
|
### Phase 7: Baseline Update
|
||
|
|
|
||
|
|
If the deploy is healthy, offer to update the baseline:
|
||
|
|
|
||
|
|
- **Context:** Canary monitoring completed. The deploy is healthy.
|
||
|
|
- **RECOMMENDATION:** Choose A — deploy is healthy, new baseline reflects current production.
|
||
|
|
- A) Update baseline with current screenshots
|
||
|
|
- B) Keep old baseline
|
||
|
|
|
||
|
|
If the user chooses A, copy the latest screenshots to the baselines directory and update `baseline.json`.
|
||
|
|
|
||
|
|
## Important Rules
|
||
|
|
|
||
|
|
- **Speed matters.** Start monitoring within 30 seconds of invocation. Don't over-analyze before monitoring.
|
||
|
|
- **Alert on changes, not absolutes.** Compare against baseline, not industry standards.
|
||
|
|
- **Screenshots are evidence.** Every alert includes a screenshot path. No exceptions.
|
||
|
|
- **Transient tolerance.** Only alert on patterns that persist across 2+ consecutive checks.
|
||
|
|
- **Baseline is king.** Without a baseline, canary is a health check. Encourage `--baseline` before deploying.
|
||
|
|
- **Performance thresholds are relative.** 2x baseline is a regression. 1.5x might be normal variance.
|
||
|
|
- **Read-only.** Observe and report. Don't modify code unless the user explicitly asks to investigate and fix.
|