Changelog
What changed in each release.
v2.1.0: 20 qa-team skills bundle
- 20 SKILL.md files at YC-product depth (~7,500 lines total):
qa-scope,qa-explore,qa-flow,qa-triage,qa-release-gate,qa-kickoff,qa-prioritize,qa-write-tests,qa-run,qa-resume,qa-classify-failure,qa-investigate-flake,qa-smoke,qa-canary,qa-coverage-audit,qa-weekly-report,qa-roadmap,qa-file-bug,qa-fixtures,qa-careful. Each follows gstack-style frontmatter:name,version,description,allowed-tools,triggers, numbered phases with precondition gates, decision-formatAskUserQuestionblocks, KB context tables, failure modes, and worked examples. - Validated end-to-end on Brooklyn (React + HeadlessUI) with both backends: OpenAI Skills + gpt-5.4 (7/7 passing,
kb_get_pageorganically invoked) and Claude Code subprocess + MCP (51/51 passing, 4 exploration passes, 36 fields captured, Phase 6 complete). - Skills live in
skills/qa-team/in the repo.nitpick initcan auto-link them on setup.
v2.0.0: MCP knowledge base server
nitpick mcp serve: starts an MCP-over-stdio server backed by the project KB. Claude Code, Claude Desktop, Cursor, Cline, and any MCP-aware client can connect and query page models, bugs, flaky tests, and flows — without launching the full agent runtime.- 8 MCP tools exposed:
kb_list_pages,kb_get_page,kb_search,kb_list_open_bugs,kb_get_flow(read tier) +kb_add_bug,kb_note_flaky,kb_put_resolved_question(write tier, read-write trust only). - Trust tiers:
mcp.kb.trust: read-write | read-only | denyinnitpick.yaml. Enforced at the tool level — a read-only client sees 5 tools, not 8. - In-process KB tools: Anthropic and OpenAI agent loops get the same 8
kb_*tools as function calls (not stdio). No subprocess overhead. - LLM-assist crawler:
nitpick crawl --strategy agentlaunches an LLM-driven crawler agent for SPAs where the deterministic crawler under-discovers.--strategy autofalls back automatically when deterministic returns below a page threshold. - Headless (non-interactive) mode:
--non-interactiveinjects a headless directive into Phase 3.5. The agent self-writesapproved.jsoninstead of blocking on stdin. Works on all backends (Anthropic, OpenAI, Claude Code). - Shape-tolerant honesty gate: accepts three evidence shapes (
{headings, fields}flat /{passes:[...]}array / tagged array) + key synonyms (heading/headings,field/fields). Prevents false negatives when the agent writes valid but differently-shaped evidence.
v1.3.0: Async HTTP jobs with input polling
POST /jobsnow returns202 Acceptedimmediately with{ job_id, run_dir, status: "queued", poll: "/jobs/:id" }. The run executes as an unawaited background promise. No more 30-minute blocked HTTP connections or proxy timeouts.GET /jobs(new): enumerate all runs, newest first.GET /jobs/:idreadsjob-state.jsondirectly from disk — polling is cheap, and job state is visible across server restarts.POST /jobs/:id/input(new): supply human-in-the-loop answers to the workflow’spending_user_inputslot.POST /jobs/:id/cancel(new): best-effort transition tocancelled(respects the workflow state machine, so completed runs can’t be cancelled).
v1.2.0: Resume interrupted runs
nitpick run --resume <run-id>reopens a crashed, cancelled, or completed run. Pages already markedpassedare skipped (no LLM cost) — logged asResume: page already passed, skipping. Failed / mid-flight pages re-run from scratch.- Use
--resume latestfor the most recent directory, or pass an exactruns/<id>name / absolute path. - Phase B (scope confirmation) is skipped on resume when scope was already confirmed — no re-asking the human for a decision they already made.
- Workflow gets a new
softReopen()primitive to rewind a terminal run back toexploringwithout losing page-level status.
v1.1.0: OpenAI Skills integration
- Opt-in alternative OpenAI path using the Responses API + a server-hosted
SKILL.mdbundle instead of re-shipping the skill in every system prompt. Saves ~15 KB per call on long runs. - New
nitpick skillsubcommand tree:publish [--dir <path>],list,versions <skill_id>,promote <skill_id> <version>. Client-side enforces OpenAI’s 25 MB uncompressed cap and the “all files must share one top-level directory” rule when zipping. - Three new
llmconfig fields:use_openai_skills(bool),skill_id(from publish),skill_version(optional). OpenAIResponsesProviderspeaks rawfetchtoPOST /v1/responses— theopenai@4.xSDK doesn’t typeenvironment.skillsyet. Handles bothfunction_callandlocal_shell_calloutput items. Auto-resolvesname+descriptionfromGET /v1/skills/<id>when missing.- Anthropic + Claude Code paths untouched. Anthropic Skills is intentionally NOT adopted — its mandatory code-execution sandbox conflicts with Nitpick’s local-first, on-user-disk design.
v1.0.0: Baseline — end-to-end agent pipeline
- End-to-end OpenAI path:
nitpick runproduces real behaviouraltests/<page_id>.spec.tswithexpect()assertions executed against the live app. - Honesty gate on page status: a page is only marked
passedif evidence was gathered, a non-probe spec was written with at least one assertion, and the agent returned cleanly. - Skill hardening: SKILL.md and the delegate prompt forbid fabricating evidence, require post-approval Phase 4 test generation, include copy-pasteable probe templates.
- Crawler auth fixed: post-login
networkidle+ reload + login-form re-check before savingstorageState. - URL normalisation: scope decider may return a page_id or raw path; both resolve correctly. Double-slash collapse.
- KB persistence after each page test.
.envauto-loading from the current directory.nitpick prepsubcommand for Playwright reinstall.- Free-form persistent
instructionsconfig field. - Optional
login_urlon roles. - Model picker in
nitpick initwith per-run cost estimates. - One-shot
scripts/install.sh.
v0.1.2: Plumbing for diff-based approval
- Added
PageQuestion/ResolvedQuestiontypes on the derived UI model. - KB persistence for resolved questions across runs.
- Delegate threads
prior_model_pathandresolved_questions_pathto the skill.
v0.1.1: Feedback-driven hardening
- LLM-backed scope inferrer (replaces keyword-only matching).
- Shared retry helper: exponential backoff on 429 / 5xx / network errors, jittered, 5 attempts.
- Parallel tool calls enabled in both Anthropic and OpenAI agent loops.
- SPA-aware crawler rewrite: ARIA-tree inventory + menu expansion + click-and-observe.
- OpenAI hardened to production quality.
v0.1: First public release
- CLI (
init,crawl,run,status,providers,serve) - HTTP API (sync)
- Claude Code skill bundle
- Three providers: Anthropic, OpenAI, Claude Code
- Bundled skills: prd-e2e-orchestrator, multi-role-e2e, nitpick
- In-process agent runtime with tool use
- Playwright-based crawler
- Knowledge base with page graph, models, history, flaky registry, bugs
- Unified reporter
- Anthropic prompt caching