Skills reference

Every qa-team skill — what to type, what it needs, what it does.

Each skill is invoked by typing a command in Claude Code, Cursor, or Codex. This page covers what to type, what context each skill needs, and what comes out.


Workflows at a glance

Fresh URL — first time:

nitpick init → nitpick crawl → /qa-explore <url> → /qa-kickoff → /qa-write-tests → /qa-run

Daily — after the app is known:

/qa-scope <what changed> → /qa-run → /qa-triage → /qa-smoke → /qa-release-gate

/qa-scope

Plan which pages to test. Does not run any tests.

What to type:

/qa-scope I refactored the task creation form to use a new API endpoint
/qa-scope the billing page got a new payment provider
/qa-scope  ← (no description — will ask you)

Needs:

  • nitpick.yaml in the current directory
  • knowledge-base/page-graph.json (run nitpick crawl first)
  • A description of what changed (from your message, a PR description, or git context)

Output: A ranked table of affected pages with confidence scores and reasons. No tests run. Confirm the scope — /qa-run will pick it up.


/qa-run

Execute tests. The main workhorse.

What to type:

/qa-run                                        ← uses scope from /qa-scope
/qa-run /tasks/new                             ← targeted: one specific page
/qa-run /tasks/new /billing                    ← targeted: multiple pages
/qa-run full regression                        ← all pages in the KB
/qa-run smoke                                  ← critical pages only, no LLM

Needs:

  • nitpick.yaml in the current directory
  • API key set (ANTHROPIC_API_KEY or OPENAI_API_KEY) or claude CLI installed
  • knowledge-base/page-graph.json (run nitpick crawl first)
  • Auth state at knowledge-base/auth/ (created by nitpick crawl)
  • For iterative: a change description (from /qa-scope or typed inline)
  • For targeted: at least one URL path

What runs behind the scenes (per page): Pre-flight → Phase 0 (parse URL, ask terminal guards) → Phase 2 (Playwright exploration) → Phase 3 (build UI model) → Phase 3.5 (your approval) → Phase 4 (generate tests) → Phase 5 (run + classify failures) → Phase 6 (write report + KB)

On second and later runs for the same page: Phase 2 starts from the existing model, Phase 3.5 auto-skips already-answered questions. Runs meaningfully faster.

Output: Unified pass/fail report. Bugs filed to KB. Chains to /qa-triage if failures exist.


/qa-explore

Build or refresh the UI model for one page. No tests generated or run.

What to type:

/qa-explore /dashboard
/qa-explore /tasks/new

Needs:

  • nitpick.yaml in the current directory
  • Auth state for the role that can see this page
  • The page must be reachable from base_url

Use when:

  • Testing a page for the first time before committing to a full run
  • The UI was rewritten and the existing model is stale
  • You want to inspect what nitpick knows before generating tests

Output: Derived UI Model written to knowledge-base/pages/<page_id>/. Surfaces diff vs. prior model if one exists.


/qa-kickoff

Review what nitpick learned about a page and give approval before tests are written.

What to type:

/qa-kickoff /tasks/new
/qa-kickoff                ← lists pages with unapproved models, lets you pick

Needs:

  • A page model already built (run /qa-explore or /qa-run to Phase 3 first)
  • You in the conversation — this is the human-in-the-loop gate

What it shows:

  • Every field, button, and validation rule it found
  • Conditional relationships it detected
  • Questions it’s unsure about (ambiguous labels, dynamic content)
  • What it plans to test

You can add:

  • Business rules it couldn’t discover (“email fields reject + signs”)
  • Terminal guards (“never click the Delete Account button”)
  • Focus areas (“prioritise the payment flow”)
  • Corrections to anything it got wrong

Your answers are persisted — next run auto-skips questions whose evidence hasn’t changed.

Output: Approval written to phase-outputs/page-understanding/<page_id>.approved.json. Required before /qa-write-tests.


/qa-write-tests

Generate Playwright tests from an approved model. Does not run them.

What to type:

/qa-write-tests /tasks/new
/qa-write-tests /billing

Needs:

  • Approved model at knowledge-base/pages/<page_id>/model.latest.json
  • Approval flag (approved: true) at phase-outputs/page-understanding/<page_id>.approved.json
  • Run /qa-explore then /qa-kickoff first if you haven’t already

Use when:

  • You want to inspect the spec before running it
  • You updated the model and need to regenerate tests
  • CI: generate specs offline, commit them, run separately

Output: tests/<page_id>.spec.ts written. Coverage breakdown shown. Does not execute.


/qa-smoke

Instant health check. No LLM. Zero cost.

What to type:

/qa-smoke

Needs:

  • scope.critical_pages populated in nitpick.yaml
  • Auth state for each role that accesses critical pages
  • Tests for critical pages already generated (run /qa-run on them first)

What it checks: Each critical page renders (heading visible, no auth redirect, no 5xx). Runs headless Chromium in parallel.

Use when: Before every merge. After every deploy. Any time you want a 5–15 second pulse check.

Output: Pass / fail per page. No report written. Chains to /qa-release-gate if all green.


/qa-triage

Walk the failures from the last run and get an action list.

What to type:

/qa-triage                    ← latest run
/qa-triage 2026-05-11T08-32   ← specific run ID

Needs:

  • A completed run at runs/<run_id>/reports/data.json
  • KB for deduplication lookups

What it does: Classifies each failure as timing flake, selector drift, data collision, or real app bug. Deduplicates against known open bugs. Scores severity. Tells you what to fix vs what to ignore.

Output: Prioritised action list. Known bugs marked. New bugs get a /qa-file-bug recommendation.


/qa-classify-failure

Deep-dive into one specific test failure.

What to type:

/qa-classify-failure /tasks/new:should submit valid task
/qa-classify-failure               ← picks the latest unclassified failure

Needs:

  • Run dir with test artifacts
  • The test name or page ID to drill into
  • KB for flake history

Output: Four-class verdict (timing / selector_drift / data_collision / app_bug) with confidence score and recommended fix path.


/qa-investigate-flake

Find the root cause of a test that keeps failing intermittently.

What to type:

/qa-investigate-flake /tasks/new
/qa-investigate-flake /tasks/new:should submit valid task

Needs:

  • Flake history in the KB (builds up over multiple runs)
  • Run history in runs/

What it analyses: Failure rate over recent runs, time-of-day clustering, provider correlation, model-version correlation.

Output: Root cause hypothesis. Recommended fix (timing threshold, selector update, test isolation, or flag as known flake).


/qa-release-gate

Go / no-go before shipping.

What to type:

/qa-release-gate

Needs:

  • scope.critical_pages in nitpick.yaml
  • Auth state for critical pages
  • KB with bug and flake state

What it checks: Runs smoke on critical pages, reads open critical bugs from KB, checks recent flake rate. Returns a verdict.

Output:

  • PASS — safe to ship
  • PASS WITH WARNINGS — minor issues, your call
  • BLOCK — open critical bugs or smoke failures, do not ship

/qa-canary

Post-deploy regression check against the last passing baseline.

What to type:

/qa-canary

Needs:

  • A clean baseline run before the deploy (the most recent passing smoke run)
  • Auth state for critical pages
  • Deploy accessible at base_url

Use when: Just after a deploy lands. Catches regressions the pre-deploy tests didn’t — environment differences, config changes, migration side-effects.

Output: Comparison against baseline. Regressions filed as bugs. Pass / fail verdict.


/qa-flow

Test a multi-actor flow end to end.

What to type:

/qa-flow hiring
/qa-flow                ← lists configured flows, lets you pick

Needs:

  • A flow defined in nitpick.yaml under flows:
  • Auth state for every role involved in the flow
  • Page models for all pages in the flow (run /qa-run on them first)

What it tests: Each stage in sequence, with cross-actor assertions — e.g. admin creates a record, candidate sees it, admin approves it. Handoff bugs (stage passes but the next actor can’t proceed) are flagged as the highest priority failure type.

Output: Per-stage pass/fail. Cross-actor assertion results. Handoff bugs filed to KB.


/qa-file-bug

Persist a bug to the knowledge base.

What to type:

/qa-file-bug                   ← interactive, asks for details
/qa-file-bug /tasks/new        ← pre-fills the page

Needs:

  • KB write access (mcp.kb.trust: read-write in nitpick.yaml)
  • Bug details: title, description, severity, steps to reproduce, expected vs actual

Use when: /qa-classify-failure returned an app_bug verdict. Or you found a bug manually and want it tracked.

Output: Bug written to KB with stable deduplication hash. Duplicate check run first — won’t create a duplicate if the same bug is already open.


/qa-fixtures

Manage login auth state files.

What to type:

/qa-fixtures

Needs:

  • Roles configured in nitpick.yaml with username_env and password_env
  • Credential env vars set in .env

Use when:

  • A run mass-failed and every page redirected to login (auth expired)
  • You added a new role to nitpick.yaml
  • You changed credentials

What it does: Lists existing auth state files, validates each one with a probe, refreshes expired ones by re-running the login flow.


/qa-coverage-audit

See what’s tested, what’s stale, and what’s missing.

What to type:

/qa-coverage-audit

Needs:

  • knowledge-base/page-graph.json
  • KB with run history

Output: Every page bucketed as: healthy, stale (not tested recently), failing, untested, or model-less. Per-role coverage breakdown. Remedial actions ranked by severity.


/qa-prioritize

Rank a list of pages by risk when you can’t test everything.

What to type:

/qa-prioritize                        ← ranks all pages in scope
/qa-prioritize /tasks/new /billing /dashboard
/qa-prioritize --top 5                ← top 5 only

Needs:

  • KB with run history (flake rate, bug history, staleness)
  • nitpick.yaml with critical_pages

Scoring factors: Critical flag, open bugs, flake rate, time since last test, recent code changes.

Output: Ordered list with score and reason per page. Recommended chain commands.


/qa-resume

Continue a run that was interrupted.

What to type:

/qa-resume                         ← resumes latest interrupted run
/qa-resume 2026-05-11T08-32        ← specific run ID

Needs:

  • An interrupted run in runs/ with job-state.json
  • The same project dir and auth state

What it skips: Pages already marked passed. Zero LLM cost for re-skipped pages.


/qa-weekly-report

7-day trend digest.

What to type:

/qa-weekly-report
/qa-weekly-report --days 14        ← two-week window

Needs:

  • At least a few runs in runs/ within the window
  • KB with bug history

Output: Pass rate trend, flake rate trend, bug flow (opened vs closed), cost trend, pages ranked by volatility. Saved to disk for sharing.


/qa-careful

Safety wrapper for testing in sensitive environments.

What to type:

/qa-careful /dashboard             ← smoke this page with extra guards

Needs:

  • nitpick.yaml with project.environment_tag set, or a production URL

What it changes: Expands the terminal-action denylist, downgrades KB to read-only, asks explicit confirmation before any page that matches a production URL pattern. Scope is limited to smoke / iterative / targeted — never full.


/qa-roadmap

Strategic 30/60/90-day QA plan.

What to type:

/qa-roadmap

Needs:

  • KB with at least a few weeks of run history
  • scope.critical_pages and ideally flows[] configured

Output: Config gaps, coverage gaps, and a 30/60/90-day action plan bucketed by horizon. Saved to disk.