AI Agent Quality Assurance System

8 AI Agents Modifying Code.
How Do We Ensure Quality?

The Harness Engineering system, evolved across 6 versions, answers this question

227+

Decisions

AI Agents

Version Iterations

3-Layer Quality Assurance Architecture

Each layer operates independently, implementing a Defense in Depth strategy

L0 — Safety Net

Automated defense via pre-commit hooks

Secret protection (protect-secrets)
Dangerous command blocking (block-dangerous-commands)
Edit/Write tool monitoring

L1 — Enablement

Custom Subagent Architecture

agent.md tool permission matrix
core SKILL.md + references/ separation architecture
All 7 agents SKILL streamlined

L2 — Traceable Ops

Traceable operations via run_id

run_id tracking system
Log collection pipeline
KPI measurement and 2nd review

Agent Permission Matrix

7 agents × Write/Edit · Bash · Coding model permissions

Agent	Write/Edit	Bash	Coding Model
CEO Agent	Blocked	Allowlist	—
Fullstack Dev	Blocked	Open	Subcontract (codex)
QA Engineer	Blocked	Open	Subcontract (codex)
DevOps Engineer	Allowed	Open	Direct edit
Board Advisor	Blocked	Limited	Subcontract (codex)
Historian	Allowed	Limited	Direct edit
Content Writer	Allowed	Limited	Direct edit

Version History

Evolution from v1.0 to v5.1 across 6 versions

v1.0 — Initial Safety Net

Started hooks-based secret protection and dangerous command blocking

v2.0 — Role Definition System

SKILL.md introduced to specify per-agent roles and behavior rules

v3.0 — Safety Net Reinforcement

protect-secrets, block-dangerous-commands L0 layer completed

v4.0 — Custom Subagent Architecture

agent.md-based tool permission matrix, 100% role awareness achieved

v5.0 — Traceable Operations

run_id tracking system, log collection, KPI measurement framework completed

v5.1 — SKILL Diet

Introduced core SKILL.md + references/ separation. 680→134 lines streamlined, all 7 agents completed. Migrated to .claude/skills/ path

KPI — Internal Validation Benchmarks

All figures are based on internal smoke tests and Sprint D benchmarks

False Positives

SKILL Diet completion benchmark (2026-03-19)

100%

E2E Pipeline Success Rate

v4.0 Phase 1B-2 validation benchmark