Spec-Driven Development with AI, explained
AI assistants are great at generating plausible code and bad at noticing when their output drifts from what you actually asked for. Spec-Driven Development with explicit human approval gates fixes both. Here is the workflow that ships in Korva v1.0.
Most of the failure modes of AI-assisted coding aren’t model limitations. They’re workflow limitations. The model isn’t wrong about how to write the code — it’s wrong about what code to write, because nobody pinned down the spec before the AI started typing.
Spec-Driven Development (SDD) is a five-phase workflow that imposes structure before code. It’s not new — software engineering has been doing variants of this for fifty years — but it gets specifically interesting when you couple it with two human approval gates, because that’s exactly where AI assistants fall down.
The shape of the workflow
Five phases, two hard stops:
- Exploration — read the code, surface debt, identify constraints. No proposing.
- Specification ✋ — define what will be built in a fixed format. Stop and wait for approval.
- Design ✋ — define how it will be built. Stop and wait for approval.
- Implementation — write exactly what was designed. No “while I’m here” improvements.
- Verification — confirm spec was met, run guardrails, save what was learned.
The two ✋ gates are the load-bearing beams. They’re where you (the human) actually look at what the AI is about to do and either approve or send it back. Without them, every other phase blurs together and you end up with the same problem you started with — code that “works” but isn’t what you wanted.
Why exploration before proposing
The first phase looks like wasted time and isn’t. When you ask an AI “add refund support to the payments API”, the AI’s natural impulse is to start writing code immediately. That’s the wrong impulse, because the AI doesn’t yet know:
- Which adapter pattern your project uses
- Whether you have a
RefundProcessorport already - How you handle idempotency keys elsewhere
- Whether the existing
PaymentServiceis already too long - What past incidents have shaped your concurrency story
A proper Exploration phase produces a single short report:
Found: 3 adapters, 1 use-case, no tests for PaymentService.Impact: any change requires updating service mocks.Debt: PaymentService is 200 LOC, splittable.Vault context: Decision #47 (Mar 2024) — Stripe over Adyen.Proceed to Phase 2 with the spec?That’s it. No code. No design. Just enough context for the next phase to be specific.
The Specification phase as a contract
Phase 2 is where the AI commits — in writing — to what it’s going to build. Not how, what. The format is mandatory:
## Spec: Add refund support**Objective:** Allow partial and full refunds on paid orders.**Inputs:** orderId: string, amount?: Decimal**Outputs:** RefundResult { id, amount, status }**Business Rules:** 1. Only orders in 'paid' status can be refunded. 2. Refund amount ≤ original payment. 3. Idempotent by orderId.**Affects:** - payment.service.ts (+ refund method) - payment.controller.ts (+ POST /payments/:id/refund)This format works because it’s scannable. You read it in 30 seconds. You catch the misalignment in 30 seconds. (“Wait, why is amount optional? We always need a partial-vs-full discriminator.”) If you approve, the AI proceeds. If not, you send it back to revise. The AI cannot move past this phase without your explicit “approved” — that’s the load-bearing detail.
The Design phase: how, not what
Once the spec is locked, Design is the second contract. The AI commits — again, in writing — to how it’s going to satisfy the spec:
## Technical Design: Add refund support### New files - src/payments/domain/refund.entity.ts | Layer: Domain | Exports: Refund - src/payments/application/refund.use-case.ts | Layer: App | Exports: RefundUseCase### Modified files - payment.service.ts (+ refund method) - payment.controller.ts (+ endpoint)### API contracts - POST /payments/:id/refund | { amount?: number } → RefundResultDTO### Dependency injection - RefundProcessorPort + StripeRefundAdapter (token: REFUND_PROCESSOR_PORT)### Key decisions - Idempotency key = orderId (covers webhook retries; not orderId+day)This is where the AI surfaces the architectural choices that aren’t obvious from the spec. “We’re going to use the existing port pattern.” “We’re going to put idempotency at the adapter level, not the use-case level.” “We’re going to use orderId only as the idempotency key, not orderId+day.”
You read it. You catch the things that don’t fit your team’s history. (“Actually, we use day-bucketed idempotency keys because we had a webhook replay incident in March — see Decision #47.”) You approve, or send it back.
Implementation: write only the design
Phase 4 is the easy part if you did the first three right. The AI implements exactly what was designed. No new files that weren’t in the design. No “while I’m here, I noticed PaymentService could be refactored” — that’s a separate task.
The temptation here, especially with capable models, is to expand scope. The AI sees a long-LOC file and wants to fix it. Don’t let it. The whole point of the gate-driven workflow is that scope is decided before implementation, not during.
Verification: not optional
Phase 5 is where teams cheat the most. The implementation works locally; everyone wants to ship and move on. But Verification is where the workflow’s value compounds:
- Spec review — for each item in the spec, mark PASS or FAIL with the file:line that satisfies it.
- Anti-pattern scan — run Sentinel over the staged files. No layer leaks, no hardcoded secrets, no debug logs.
- Test list — enumerate the tests that should exist. Don’t generate them automatically. Just list them so the developer can decide which to write.
- Save what was learned — call
vault_savefor the design decisions, the patterns, the incidents avoided. This is what makes future AI sessions smarter.
The verification phase is also gated. The AI cannot mark “complete” without a vault_qa_checkpoint that records pass/fail per criterion and a numeric score (≥70 to pass).
Why two gates instead of one
A common question: “couldn’t we just have one gate, between Spec and Implementation?” Empirically, no.
The Spec→Design boundary catches a different class of misalignment than the Design→Implementation boundary. Specs sound right and have terrible designs. Designs look reasonable and skip steps in the spec. The two gates filter different kinds of drift.
The cost of two gates is two 30-second reviews. The cost of one gate is, on average, one re-do per feature. The math is unambiguous.
Where Forge fits
Forge is the SDD implementation that ships in Korva v1.0. It persists the per-project phase in the local SQLite vault, gates the apply → verify and verify → archive transitions on quality checkpoints, and exposes everything via three MCP tools (vault_sdd_phase, vault_qa_checklist, vault_qa_checkpoint) that any compatible AI assistant can call.
The 5 phases are the public surface. Internally, the vault tracks 9 states (explore → propose → spec → design → tasks → apply → verify → archive → onboard) so Beacon — the local dashboard — can show you the audit trail of every gated transition. That’s important for compliance use cases (Teams+ tier, audit log) but doesn’t change the shape of the day-to-day workflow.
When to skip phases
For very small fixes — a typo, a one-liner, a comment — Phase 1 + Phase 4 is enough. The full workflow exists for changes that touch multiple layers, public APIs, or the domain model. Use judgement. SDD is a tool, not a religion.
Trying this
If you want to try the full SDD workflow with your AI assistant, the fastest path is to install Korva:
curl -fsSL https://korva.dev/install | bashkorva initkorva setup --allOpen any of the 8 supported IDEs. The SDD scrolls auto-load when triggers match. The first time you ask the AI for a non-trivial feature, it should propose Phase 1 (Exploration) instead of jumping to code. If it doesn’t, gently say “let’s do this with SDD” — most current models recognize the workflow and switch into it.
If you read more than build, the Forge documentation walks through every phase with the worked refund example expanded, and forge-sdd is the canonical scroll the AI loads at session start.
The point isn’t that this workflow is novel — it isn’t. The point is that AI assistants make the gate-driven version cheap. The 30-second review is now a real 30 seconds, not the 30 minutes it took to read a hand-typed spec on paper. Use it.