Changelog

What changed in each release.

v2.1.0: 20 qa-team skills bundle

20 SKILL.md files at YC-product depth (~7,500 lines total): qa-scope, qa-explore, qa-flow, qa-triage, qa-release-gate, qa-kickoff, qa-prioritize, qa-write-tests, qa-run, qa-resume, qa-classify-failure, qa-investigate-flake, qa-smoke, qa-canary, qa-coverage-audit, qa-weekly-report, qa-roadmap, qa-file-bug, qa-fixtures, qa-careful. Each follows gstack-style frontmatter: name, version, description, allowed-tools, triggers, numbered phases with precondition gates, decision-format AskUserQuestion blocks, KB context tables, failure modes, and worked examples.
Validated end-to-end on Brooklyn (React + HeadlessUI) with both backends: OpenAI Skills + gpt-5.4 (7/7 passing, kb_get_page organically invoked) and Claude Code subprocess + MCP (51/51 passing, 4 exploration passes, 36 fields captured, Phase 6 complete).
Skills live in skills/qa-team/ in the repo. nitpick init can auto-link them on setup.

v2.0.0: MCP knowledge base server

nitpick mcp serve: starts an MCP-over-stdio server backed by the project KB. Claude Code, Claude Desktop, Cursor, Cline, and any MCP-aware client can connect and query page models, bugs, flaky tests, and flows — without launching the full agent runtime.
8 MCP tools exposed: kb_list_pages, kb_get_page, kb_search, kb_list_open_bugs, kb_get_flow (read tier) + kb_add_bug, kb_note_flaky, kb_put_resolved_question (write tier, read-write trust only).
Trust tiers: mcp.kb.trust: read-write | read-only | deny in nitpick.yaml. Enforced at the tool level — a read-only client sees 5 tools, not 8.
In-process KB tools: Anthropic and OpenAI agent loops get the same 8 kb_* tools as function calls (not stdio). No subprocess overhead.
LLM-assist crawler: nitpick crawl --strategy agent launches an LLM-driven crawler agent for SPAs where the deterministic crawler under-discovers. --strategy auto falls back automatically when deterministic returns below a page threshold.
Headless (non-interactive) mode: --non-interactive injects a headless directive into Phase 3.5. The agent self-writes approved.json instead of blocking on stdin. Works on all backends (Anthropic, OpenAI, Claude Code).
Shape-tolerant honesty gate: accepts three evidence shapes ({headings, fields} flat / {passes:[...]} array / tagged array) + key synonyms (heading/headings, field/fields). Prevents false negatives when the agent writes valid but differently-shaped evidence.

v1.3.0: Async HTTP jobs with input polling

POST /jobs now returns 202 Accepted immediately with { job_id, run_dir, status: "queued", poll: "/jobs/:id" }. The run executes as an unawaited background promise. No more 30-minute blocked HTTP connections or proxy timeouts.
GET /jobs (new): enumerate all runs, newest first.
GET /jobs/:id reads job-state.json directly from disk — polling is cheap, and job state is visible across server restarts.
POST /jobs/:id/input (new): supply human-in-the-loop answers to the workflow’s pending_user_input slot.
POST /jobs/:id/cancel (new): best-effort transition to cancelled (respects the workflow state machine, so completed runs can’t be cancelled).

v1.2.0: Resume interrupted runs

nitpick run --resume <run-id> reopens a crashed, cancelled, or completed run. Pages already marked passed are skipped (no LLM cost) — logged as Resume: page already passed, skipping. Failed / mid-flight pages re-run from scratch.
Use --resume latest for the most recent directory, or pass an exact runs/<id> name / absolute path.
Phase B (scope confirmation) is skipped on resume when scope was already confirmed — no re-asking the human for a decision they already made.
Workflow gets a new softReopen() primitive to rewind a terminal run back to exploring without losing page-level status.

v1.1.0: OpenAI Skills integration

Opt-in alternative OpenAI path using the Responses API + a server-hosted SKILL.md bundle instead of re-shipping the skill in every system prompt. Saves ~15 KB per call on long runs.
New nitpick skill subcommand tree: publish [--dir <path>], list, versions <skill_id>, promote <skill_id> <version>. Client-side enforces OpenAI’s 25 MB uncompressed cap and the “all files must share one top-level directory” rule when zipping.
Three new llm config fields: use_openai_skills (bool), skill_id (from publish), skill_version (optional).
OpenAIResponsesProvider speaks raw fetch to POST /v1/responses — the openai@4.x SDK doesn’t type environment.skills yet. Handles both function_call and local_shell_call output items. Auto-resolves name+description from GET /v1/skills/<id> when missing.
Anthropic + Claude Code paths untouched. Anthropic Skills is intentionally NOT adopted — its mandatory code-execution sandbox conflicts with Nitpick’s local-first, on-user-disk design.

v1.0.0: Baseline — end-to-end agent pipeline

End-to-end OpenAI path: nitpick run produces real behavioural tests/<page_id>.spec.ts with expect() assertions executed against the live app.
Honesty gate on page status: a page is only marked passed if evidence was gathered, a non-probe spec was written with at least one assertion, and the agent returned cleanly.
Skill hardening: SKILL.md and the delegate prompt forbid fabricating evidence, require post-approval Phase 4 test generation, include copy-pasteable probe templates.
Crawler auth fixed: post-login networkidle + reload + login-form re-check before saving storageState.
URL normalisation: scope decider may return a page_id or raw path; both resolve correctly. Double-slash collapse.
KB persistence after each page test.
.env auto-loading from the current directory.
nitpick prep subcommand for Playwright reinstall.
Free-form persistent instructions config field.
Optional login_url on roles.
Model picker in nitpick init with per-run cost estimates.
One-shot scripts/install.sh.

v0.1.2: Plumbing for diff-based approval

Added PageQuestion / ResolvedQuestion types on the derived UI model.
KB persistence for resolved questions across runs.
Delegate threads prior_model_path and resolved_questions_path to the skill.

v0.1.1: Feedback-driven hardening

LLM-backed scope inferrer (replaces keyword-only matching).
Shared retry helper: exponential backoff on 429 / 5xx / network errors, jittered, 5 attempts.
Parallel tool calls enabled in both Anthropic and OpenAI agent loops.
SPA-aware crawler rewrite: ARIA-tree inventory + menu expansion + click-and-observe.
OpenAI hardened to production quality.

v0.1: First public release

CLI (init, crawl, run, status, providers, serve)
HTTP API (sync)
Claude Code skill bundle
Three providers: Anthropic, OpenAI, Claude Code
Bundled skills: prd-e2e-orchestrator, multi-role-e2e, nitpick
In-process agent runtime with tool use
Playwright-based crawler
Knowledge base with page graph, models, history, flaky registry, bugs
Unified reporter
Anthropic prompt caching