Abstract
This paper documents the migration of the CREATE Something Harness from legacy headless mode
patterns to Agent SDK best practices. We analyze the trade-offs between security, reliability,
and operational efficiency, drawing from empirical observation of a live Canon Redesign project
(21 features across 19 files). The migration replaces --dangerously-skip-permissions with explicit --allowedTools, adds runaway prevention via --max-turns,
and enables cost tracking through structured JSON output parsing.
1. Introduction
The CREATE Something Harness orchestrates autonomous Claude Code sessions for large-scale
refactoring and feature implementation. Prior to this migration, the harness used --dangerously-skip-permissions for tool access—a pattern that prioritized
convenience over security.
The Agent SDK documentation recommends explicit tool allowlists via --allowedTools.
This migration implements that recommendation alongside additional optimizations.
1.1 Heideggerian Framing
Per the CREATE Something philosophy, infrastructure should exhibit Zuhandenheit (ready-to-hand)—receding into transparent use. The harness should be invisible when working
correctly; failures should surface clearly with actionable context.
1.2 The Canon Redesign Project
The test project: removing --webflow-blue (#4353ff) from the Webflow Dashboard.
This brand color polluted focus states, buttons, links, nav, and logos—43 violations across 19 files.
2. Architecture
2.1 Harness Flow
┌─────────────────────────────────────────────────────────┐
│ HARNESS RUNNER │
│ │
│ Spec Parser ──► Issue Creation ──► Session Loop │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Session 1 ──► Session 2 ──► Session 3 ──► ... │ │
│ │ │ │ │ │ │
│ │ ▼ ▼ ▼ │ │
│ │ Checkpoint Checkpoint Checkpoint │ │
│ │ │ │ │ │ │
│ │ ▼ ▼ ▼ │ │
│ │ Peer Review Peer Review Peer Review │ │
│ └─────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ BEADS (Human Interface) │
│ │
│ bd progress - Review checkpoints │
│ bd update - Redirect priorities │
│ bd create - Inject work │
└─────────────────────────────────────────────────────────┘
2.2 Session Spawning
Each session spawns Claude Code in headless mode with explicit configuration:
// packages/harness/src/session.ts
export async function runSession(
issueId: string,
prompt: string,
options: SessionOptions = {}
): Promise<SessionResult> {
const args = [
'-p',
'--allowedTools', HARNESS_ALLOWED_TOOLS,
'--max-turns', options.maxTurns?.toString() ?? '100',
'--output-format', 'json',
];
if (options.model) {
args.push('--model', options.model);
}
// Spawn claude process with captured stdout/stderr
const result = await spawnClaude(args, prompt);
// Parse structured JSON output
const metrics = parseJsonOutput(result.stdout);
return {
issueId,
outcome: determineOutcome(result),
sessionId: metrics.sessionId,
costUsd: metrics.costUsd,
numTurns: metrics.numTurns,
};
}
3. Migration Changes
3.1 Before: Legacy Pattern
const args = [
'-p',
'--dangerously-skip-permissions',
'--output-format', 'json',
];
Characteristics:
- All tools available without restriction
- No runaway prevention
- No cost tracking
- No model selection
- Security relies entirely on session isolation
3.2 After: Agent SDK Pattern
const args = [
'-p',
'--allowedTools', HARNESS_ALLOWED_TOOLS,
'--max-turns', '100',
'--output-format', 'json',
'--model', options.model,
];
Characteristics:
- Explicit tool allowlist (defense in depth)
- Turn limit prevents infinite loops
- JSON output enables metrics parsing
- Model selection for cost optimization
3.3 Tool Categories
4. Peer Review Pipeline
The harness runs three peer reviewers at checkpoint boundaries:
const REVIEWERS: ReviewerConfig[] = [
{
name: 'security',
prompt: 'Review the code changes for security vulnerabilities...',
model: 'haiku',
timeout: 30000,
},
{
name: 'architecture',
prompt: 'Review the code changes for architectural concerns...',
model: 'haiku',
timeout: 30000,
},
{
name: 'quality',
prompt: 'Review the code changes for quality issues...',
model: 'haiku',
timeout: 30000,
},
];
4.1 Observed Review Outcomes
Finding: Architecture reviewer surfaces legitimate concerns (token consistency,
pattern adherence) without blocking progress. This matches the intended "first-pass analysis" philosophy.
5. Empirical Observations
5.1 Security Improvements
Finding: No legitimate harness operations were blocked by the new restrictions.
The allowlist is sufficient for all observed work patterns.
5.2 Runaway Prevention
--max-turns 100 prevents infinite loops. Observed session turn counts:
5.3 Cost Visibility
5.4 Model Selection Impact
6. Trade-offs Analysis
6.1 Pros
6.2 Cons
7. Recommendations
7.1 Immediate Adoption
- Replace
--dangerously-skip-permissions with --allowedTools: The security improvement has no operational cost. - Set
--max-turns 100: Provides headroom without enabling runaways. - Parse JSON output for metrics: Even if not displayed, capture for future analysis.
- Use Haiku for peer reviews: 95% cost reduction with equivalent quality.
7.2 Future Work
- Implement
--resume: Use captured session_id for task continuity within epics. - Model auto-selection: Use task complexity to choose Haiku/Sonnet/Opus.
- Cost budgets: Set per-harness-run cost limits with automatic pause.
- Streaming output: Use
--output-format stream-json for real-time progress.
8. Conclusion
The Agent SDK migration improves the CREATE Something Harness without degrading operational
capability. The explicit tool allowlist provides defense-in-depth security, while --max-turns prevents runaway sessions.
The key insight: restrictive defaults with explicit exceptions is more
maintainable than permissive defaults with implicit risks.
This aligns with the Subtractive Triad:
- DRY: One allowlist, not per-session permission decisions
- Rams: Only necessary tools; each earns its place
- Heidegger: Infrastructure recedes; security becomes invisible when correct
Appendix A: Full Tool Allowlist
const HARNESS_ALLOWED_TOOLS = [
// Core file operations
'Read', 'Write', 'Edit', 'Glob', 'Grep', 'NotebookEdit',
// Bash with granular patterns
'Bash(git:*)', 'Bash(pnpm:*)', 'Bash(npm:*)', 'Bash(npx:*)',
'Bash(node:*)', 'Bash(tsc:*)', 'Bash(wrangler:*)',
'Bash(bd:*)', 'Bash(bv:*)', // Beads CLI
'Bash(grep:*)', 'Bash(find:*)', 'Bash(ls:*)', 'Bash(cat:*)',
'Bash(mkdir:*)', 'Bash(rm:*)', 'Bash(cp:*)', 'Bash(mv:*)',
'Bash(echo:*)', 'Bash(test:*)',
// Orchestration
'Task', 'TodoWrite', 'WebFetch', 'WebSearch',
// CREATE Something
'Skill',
// MCP Cloudflare
'mcp__cloudflare__kv_get', 'mcp__cloudflare__kv_put',
'mcp__cloudflare__kv_list', 'mcp__cloudflare__d1_query',
'mcp__cloudflare__d1_list_databases',
'mcp__cloudflare__r2_list_objects', 'mcp__cloudflare__r2_get_object',
'mcp__cloudflare__r2_put_object', 'mcp__cloudflare__worker_list',
'mcp__cloudflare__worker_get', 'mcp__cloudflare__worker_deploy',
].join(',');