Forge — Spec-Driven Development with quality gates
Forge is the 5-phase workflow that prevents AI from diving straight into code. Two human-approval gates — between Specification and Design, and between Design and Implementation — block advancement until criteria are met.
Updated: 2026-04-30
Forge is the workflow that prevents AI from diving straight into code. Instead of “do X”, you ask the AI to follow a five-phase loop with explicit human-approval gates. The state of every phase is persisted in the Vault, and quality checkpoints prevent the AI from advancing past Apply or Verify without meeting your criteria.
When to apply the full workflow
- New features or modules
- Public-facing API or interface changes
- Refactors that cross multiple layers
- Anything touching the domain layer
For small fixes you only need Phase 1 and Phase 4.
The five phases
Phase 1 — Exploration
Goal: understand the code and gather context. No proposing.
The AI reads relevant files, runs vault_search, and identifies existing patterns, dependencies, technical debt and constraints (layers, adapters, shared libraries).
Output: a brief analysis — Found · Impact · Debt · Vault context.
Rules: NO proposing · NO writing code · ONE focused question allowed if there’s ambiguity.
Phase 2 — Specification ✋ Requires developer approval
Goal: define exactly what will be built, in terms the developer can approve.
Output (mandatory format):
## Spec: [feature name]**Objective:** one sentence**Inputs:** paramName: type — description**Outputs:** returnType — description**Business Rules/Constraints:** numbered list**Affects:** files and changesRules: STOP at the end · wait for explicit approval · no code · no design · if multiple specs are viable, present at most 3 options.
Phase 3 — Design ✋ Requires developer approval
Pre-actions: vault_context to load active patterns, vault_search "adapter pattern", vault_search "module structure".
Output:
## Technical Design: [feature name]### New files: src/.../X.ts | Layer: Domain | Exports: ClassName### Modified files: description of change### API contracts: endpoints + DTOs### Dependency injection changes### Key decisions: rationale, cache keys, etc.Rules: STOP at the end · respect layers · every new file declares layer + responsibility · flag conflicts with the approved spec before proceeding.
Phase 4 — Implementation
Goal: write code exactly as designed in Phase 3.
Rules: implement only what was designed · PAUSE and ask before adding anything not in the design · follow active scrolls · no “creative additions” · no “while I’m here” improvements · follow exact naming.
Checklist:
- ✓ Every Phase 3 file created/modified
- ✓ No layer violations
- ✓ No
console.loginsrc/ - ✓ No
anywithout// korva-ignore: reason - ✓ DTO uppercase
- ✓ File names kebab-case
Phase 5 — Verification
Goal: confirm implementation matches spec. No new code — only validation.
Actions:
- Spec review (PASS/FAIL per item).
- Anti-pattern scan — runs Sentinel.
- Test list — enumerate the tests that should exist (don’t generate unless asked).
vault_savewhat was learned.
Output:
## Verification: [feature name]### Spec checklist### Anti-pattern check### Tests to write### Saved to VaultRules: be honest about what’s missing · never PASS for something uncertain · always vault_save before closing.
Quality gates persisted in the Vault
The sdd_state table tracks the current phase per project. The transitions apply → verify and verify → archive are gated: they require a vault_qa_checkpoint call with gate_passed=true (score ≥ 70 + all mandatory criteria passed).
The internal nine-state machine — explore → propose → spec → design → tasks → apply → verify → archive → onboard — covers both the public 5 phases above and onboarding flows. Forge always speaks the public 5 phases; the 9 internal states give Beacon the audit substrate.
Why human-approval gates work
LLMs are great at generating plausible code. They are not great at noticing that the spec they just wrote subtly contradicts the spec the developer asked for. The two approval gates — between Spec and Design, and between Design and Implementation — make that mismatch a 30-second conversation instead of a multi-hour debugging session after the PR.
Worked example
1. Phase 1 (Exploration) AI: "I read src/payments/. Found: Found: 3 adapters, 1 use-case, no tests. Impact: any change requires updating service mocks. Debt: PaymentService is 200 LOC, splittable. Vault context: Decision #47 (Mar 2024) — Stripe over Adyen. Proceed to Phase 2 with the spec?"
2. Phase 2 (Specification) ✋ APPROVAL AI: "## Spec: Add refund support Objective: allow partial and full refunds. Inputs: orderId: string, amount?: Decimal Outputs: RefundResult — id, amount, status Business Rules: 1. Only orders in 'paid' status. 2. Refund amount <= original. 3. Idempotent by orderId. Affects: payment.service.ts (+method), payment.controller.ts (+endpoint) Approve?"
Developer: "✅ approved"
3. Phase 3 (Design) ✋ APPROVAL ...
4. Phase 4 (Implementation) AI writes exactly the design. No 'while I'm here'.
5. Phase 5 (Verification) AI runs Sentinel, lists missing tests, vault_saves the lessons learned.Forge directory layout
forge/├── WORKFLOW.md├── phases/│ ├── 01-exploration.md│ ├── 02-specification.md│ ├── 03-design.md│ ├── 04-implementation.md│ └── 05-verification.md├── hive-mock/ → in-memory mock for tests (port :7438)└── licensing-server/ → admin endpoints (handlers_admin.go)Next
- MCP tools — vault_sdd_phase, vault_qa_checkpoint
- Sentinel — runs in Phase 5
- Lore — forge-sdd scroll