SidequestLab
AI Agent Quality Assurance System

8 AI Agents Modifying Code.
How Do We Ensure Quality?

The Harness Engineering system, evolved across 6 versions, answers this question

227+
Decisions
9
AI Agents
6
Version Iterations

3-Layer Quality Assurance Architecture

Each layer operates independently, implementing a Defense in Depth strategy

L0

L0 — Safety Net

Automated defense via pre-commit hooks

  • Secret protection (protect-secrets)
  • Dangerous command blocking (block-dangerous-commands)
  • Edit/Write tool monitoring

L1

L1 — Enablement

Custom Subagent Architecture

  • agent.md tool permission matrix
  • core SKILL.md + references/ separation architecture
  • All 7 agents SKILL streamlined

L2

L2 — Traceable Ops

Traceable operations via run_id

  • run_id tracking system
  • Log collection pipeline
  • KPI measurement and 2nd review

Agent Permission Matrix

7 agents × Write/Edit · Bash · Coding model permissions

AgentWrite/EditBashCoding Model
CEO AgentBlockedAllowlist
Fullstack DevBlockedOpenSubcontract (codex)
QA EngineerBlockedOpenSubcontract (codex)
DevOps EngineerAllowedOpenDirect edit
Board AdvisorBlockedLimitedSubcontract (codex)
HistorianAllowedLimitedDirect edit
Content WriterAllowedLimitedDirect edit

Version History

Evolution from v1.0 to v5.1 across 6 versions

v1.0 — Initial Safety Net

Started hooks-based secret protection and dangerous command blocking

v2.0 — Role Definition System

SKILL.md introduced to specify per-agent roles and behavior rules

v3.0 — Safety Net Reinforcement

protect-secrets, block-dangerous-commands L0 layer completed

v4.0 — Custom Subagent Architecture

agent.md-based tool permission matrix, 100% role awareness achieved

v5.0 — Traceable Operations

run_id tracking system, log collection, KPI measurement framework completed

v5.1 — SKILL Diet

Introduced core SKILL.md + references/ separation. 680→134 lines streamlined, all 7 agents completed. Migrated to .claude/skills/ path

KPI — Internal Validation Benchmarks

All figures are based on internal smoke tests and Sprint D benchmarks

0
False Positives
SKILL Diet completion benchmark (2026-03-19)
100%
E2E Pipeline Success Rate
v4.0 Phase 1B-2 validation benchmark
100%
Role Awareness Rate
v4.0 smoke test benchmark