Forge — Spec-Driven Development with quality gates

Forge is the 5-phase workflow that prevents AI from diving straight into code. Two human-approval gates — between Specification and Design, and between Design and Implementation — block advancement until criteria are met.

Updated: 2026-04-30

Forge is the workflow that prevents AI from diving straight into code. Instead of “do X”, you ask the AI to follow a five-phase loop with explicit human-approval gates. The state of every phase is persisted in the Vault, and quality checkpoints prevent the AI from advancing past Apply or Verify without meeting your criteria.

When to apply the full workflow

New features or modules
Public-facing API or interface changes
Refactors that cross multiple layers
Anything touching the domain layer

For small fixes you only need Phase 1 and Phase 4.

The five phases

Phase 1 — Exploration

Goal: understand the code and gather context. No proposing.

The AI reads relevant files, runs vault_search, and identifies existing patterns, dependencies, technical debt and constraints (layers, adapters, shared libraries).

Output: a brief analysis — Found · Impact · Debt · Vault context.

Rules: NO proposing · NO writing code · ONE focused question allowed if there’s ambiguity.

Phase 2 — Specification ✋ Requires developer approval

Goal: define exactly what will be built, in terms the developer can approve.

Output (mandatory format):

## Spec: [feature name]
**Objective:** one sentence
**Inputs:** paramName: type — description
**Outputs:** returnType — description
**Business Rules/Constraints:** numbered list
**Affects:** files and changes

Rules: STOP at the end · wait for explicit approval · no code · no design · if multiple specs are viable, present at most 3 options.

Phase 3 — Design ✋ Requires developer approval

Pre-actions: vault_context to load active patterns, vault_search "adapter pattern", vault_search "module structure".

Output:

## Technical Design: [feature name]
### New files: src/.../X.ts | Layer: Domain | Exports: ClassName
### Modified files: description of change
### API contracts: endpoints + DTOs
### Dependency injection changes
### Key decisions: rationale, cache keys, etc.

Rules: STOP at the end · respect layers · every new file declares layer + responsibility · flag conflicts with the approved spec before proceeding.

Phase 4 — Implementation

Goal: write code exactly as designed in Phase 3.

Rules: implement only what was designed · PAUSE and ask before adding anything not in the design · follow active scrolls · no “creative additions” · no “while I’m here” improvements · follow exact naming.

Checklist:

✓ Every Phase 3 file created/modified
✓ No layer violations
✓ No console.log in src/
✓ No any without // korva-ignore: reason
✓ DTO uppercase
✓ File names kebab-case

Phase 5 — Verification

Goal: confirm implementation matches spec. No new code — only validation.

Actions:

Spec review (PASS/FAIL per item).
Anti-pattern scan — runs Sentinel.
Test list — enumerate the tests that should exist (don’t generate unless asked).
vault_save what was learned.

Output:

## Verification: [feature name]
### Spec checklist
### Anti-pattern check
### Tests to write
### Saved to Vault

Rules: be honest about what’s missing · never PASS for something uncertain · always vault_save before closing.

Quality gates persisted in the Vault

The sdd_state table tracks the current phase per project. The transitions apply → verify and verify → archive are gated: they require a vault_qa_checkpoint call with gate_passed=true (score ≥ 70 + all mandatory criteria passed).

The internal nine-state machine — explore → propose → spec → design → tasks → apply → verify → archive → onboard — covers both the public 5 phases above and onboarding flows. Forge always speaks the public 5 phases; the 9 internal states give Beacon the audit substrate.

Why human-approval gates work

LLMs are great at generating plausible code. They are not great at noticing that the spec they just wrote subtly contradicts the spec the developer asked for. The two approval gates — between Spec and Design, and between Design and Implementation — make that mismatch a 30-second conversation instead of a multi-hour debugging session after the PR.

Worked example

1. Phase 1 (Exploration)
   AI: "I read src/payments/. Found:
        Found: 3 adapters, 1 use-case, no tests.
        Impact: any change requires updating service mocks.
        Debt: PaymentService is 200 LOC, splittable.
        Vault context: Decision #47 (Mar 2024) — Stripe over Adyen.
        Proceed to Phase 2 with the spec?"

2. Phase 2 (Specification) ✋ APPROVAL
   AI: "## Spec: Add refund support
        Objective: allow partial and full refunds.
        Inputs: orderId: string, amount?: Decimal
        Outputs: RefundResult — id, amount, status
        Business Rules: 1. Only orders in 'paid' status.
                        2. Refund amount <= original.
                        3. Idempotent by orderId.
        Affects: payment.service.ts (+method), payment.controller.ts (+endpoint)
        Approve?"

   Developer: "✅ approved"

3. Phase 3 (Design) ✋ APPROVAL
   ...

4. Phase 4 (Implementation)
   AI writes exactly the design. No 'while I'm here'.

5. Phase 5 (Verification)
   AI runs Sentinel, lists missing tests, vault_saves the lessons learned.

Forge directory layout

forge/
├── WORKFLOW.md
├── phases/
│   ├── 01-exploration.md
│   ├── 02-specification.md
│   ├── 03-design.md
│   ├── 04-implementation.md
│   └── 05-verification.md
├── hive-mock/                  → in-memory mock for tests (port :7438)
└── licensing-server/           → admin endpoints (handlers_admin.go)

MCP tools — vault_sdd_phase, vault_qa_checkpoint
Sentinel — runs in Phase 5
Lore — forge-sdd scroll