PAPER-2026-002

Ralph Implementation: Overnight Autonomous Development

Fresh Claude Code instances working through user stories while you sleep—achieving production-ready features at $6 instead of $800+ in developer time.

Research • 15 min read • Intermediate

Abstract

This paper documents the Ralph pattern for autonomous overnight development. Named after Geoffrey Huntley's Ralph Wiggum technique, Ralph spawns fresh Claude Code instances that iterate through user stories until a feature is complete. Each iteration gets a clean context window—preventing context pollution across stories. We present the PRD-to-Ralph workflow, the ralph.sh implementation, and cost analysis showing $6 for 12-15 story features compared to 8+ hours of developer time ($800+ at $100/hour). Case study validation from the Kickstand project demonstrates 155 scripts reduced to 13 through systematic autonomous work. The contribution is both practical (a working overnight development system) and philosophical (nondeterministic idempotence—different paths, same outcome).

Cost for 20 iterations

155 to 13

Kickstand script reduction

Fresh

Context per iteration

Overnight

Autonomous execution

1. What is Ralph?

Ralph is an iterative autonomous development pattern that spawns fresh Claude Code instances to work through user stories. Named after Geoffrey Huntley's Ralph Wiggum technique, the pattern exploits a key insight: each iteration benefits from a clean context window.

Traditional agent loops accumulate context as they work. By story 5, the context window is cluttered with implementation details from stories 1-4. Ralph solves this by starting fresh each iteration—Claude reads the PRD, picks an incomplete story, implements it, commits, and exits. The next iteration starts with full context capacity.

The Core Loop

for iteration in 1..MAX_ITERATIONS:
  1. Read prd.json
  2. Find story where passes == false
  3. Spawn fresh Claude Code instance
  4. Claude implements story, commits, updates prd.json
  5. Log to progress.txt
  6. If all stories pass → done
  7. Next iteration

The PRD (Product Requirements Document) serves as Claude's task board. Each story has acceptance criteria that must be satisfied for passes: true. When all stories pass, Ralph exits.

Key Insight: Context Pollution

Context pollution is real. When working on a multi-file feature in a single session, Claude accumulates tokens about each implementation decision. These tokens are wasted when moving to unrelated stories.

By spawning fresh instances, Ralph ensures:

Each story gets Claude's full attention (no irrelevant context)
No "memory" of implementation details that don't matter
Cleaner, more focused work per iteration
Natural parallelization opportunity (though Ralph runs sequentially)

2. The PRD-to-Ralph Workflow

The workflow consists of three phases: PRD creation, Ralph execution, and result verification.

2.1 Creating the PRD

A PRD is a JSON file defining user stories with acceptance criteria:

{
  "title": "Agency Contact Form",
  "description": "Contact form with validation and D1 storage",
  "stories": [
    {
      "id": "contact-1",
      "title": "Create contact submissions D1 table",
      "acceptance": [
        "Migration file exists at migrations/XXXX_contact_submissions.sql",
        "Table has columns: id, name, email, message, created_at",
        "Migration applies without errors"
      ],
      "files": ["packages/agency/migrations/"],
      "passes": false
    },
    {
      "id": "contact-2",
      "title": "Add contact form API endpoint",
      "acceptance": [
        "POST /api/contact returns 200 on valid submission",
        "Returns 400 with errors on invalid email",
        "Stores submission in D1 contact_submissions table"
      ],
      "files": ["packages/agency/src/routes/api/contact/+server.ts"],
      "passes": false
    }
  ]
}

Story rules:

Rule	Why
One story = one context window	Keeps iterations focused
Max 3-5 files per story	Prevents scope creep
Acceptance criteria must be verifiable	Agent needs to know when done
Order by dependency	Foundation, Core, UI, Integration

2.2 The /prd-to-ralph Skill

Claude Code includes a skill that converts feature descriptions into PRDs:

# In Claude Code session
"Use /prd-to-ralph to create a user authentication feature
 with login, signup, and password reset"

The skill asks clarifying questions, breaks the feature into atomic stories, writes testable acceptance criteria, and outputs prd.json.

2.3 Running Ralph

# Basic usage
./packages/agent-sdk/scripts/ralph.sh

# Custom iterations (for larger features)
./packages/agent-sdk/scripts/ralph.sh --max-iterations 20

# Custom PRD file
./packages/agent-sdk/scripts/ralph.sh --prd-file features/auth-prd.json

Ralph outputs progress to progress.txt and archives thread logs to .ralph-archive/. When all stories pass, it archives the completed PRD.

3. How ralph.sh Works

The script is a bash loop that orchestrates Claude Code instances. Here's the implementation architecture:

3.1 Architecture

ralph.sh
    |
    +-- reads prd.json (finds incomplete story)
    |
    +-- spawns claude --print --dangerously-skip-permissions
    |       |
    |       +-- Claude reads prd.json
    |       +-- Claude implements story
    |       +-- Claude commits changes
    |       +-- Claude updates prd.json (passes: true)
    |       +-- Claude logs to progress.txt
    |       +-- Claude exits
    |
    +-- checks if all stories complete
    |       |
    |       +-- if yes: exit loop
    |       +-- if no: next iteration
    |
    +-- archives thread log
    |
    +-- next iteration (fresh Claude instance)

3.2 System Prompt

Each Claude instance receives a consistent system prompt:

You are an autonomous coding agent working on this project.

## Your Task
1. Read the PRD file (prd.json) and find a user story where "passes": false
2. Pick ONE story to implement (usually the first incomplete one)
3. Implement it according to the acceptance criteria
4. Run any relevant tests to verify your implementation
5. Commit your changes with a clear message: "feat: <story title>"
6. Update prd.json - set "passes": true for the completed story
7. Append to progress.txt

## Important Rules
- Complete ONE story per iteration, then stop
- Each story must be atomic and independently verifiable
- If all stories pass, output: ALL_STORIES_COMPLETE

3.3 Key Implementation Details

Detail	Implementation	Purpose
Fresh context	New `claude` process each iteration	Prevents context pollution
Autonomous mode	`--dangerously-skip-permissions`	No human confirmation needed
Output capture	`--print` flag + `tee`	Archives for debugging
Story selection	`jq` filters `passes == false`	Deterministic story ordering
Completion signal	`ALL_STORIES_COMPLETE` in output	Early exit when done

3.4 Output Files

File	Purpose
`prd.json`	User stories (updated as stories complete)
`progress.txt`	Short-term memory, iteration logs
`.ralph-archive/`	Thread logs, archived PRDs

4. Cost Analysis

Ralph's economics are compelling: $6 for overnight feature development compared to 8+ hours of developer time.

4.1 Ralph Cost Estimation

Iterations	Estimated Cost	Use Case
5	~$1.50	Small feature (3-4 stories)
10	~$3.00	Medium feature (6-8 stories)
20	~$6.00	Large feature (12-15 stories)

4.2 Comparison to Developer Time

For a 12-story feature requiring 8 hours of developer time at $100/hour:

Approach	Cost	Time	Availability
Developer	$800	8 hours	Business hours
Ralph	$6	Overnight	24/7
Savings	$794 (99.25%)	—	—

Key insight: Ralph runs overnight. You describe the feature before leaving work, run Ralph, and find completed code in the morning.

4.3 When Ralph Makes Sense

Scenario	Recommendation
Well-defined feature with clear stories	Use Ralph
Overnight autonomous work	Use Ralph
Sequential dependent stories	Use Ralph
3+ independent features simultaneously	Consider Gastown (parallel)
Quick test-fix loop (same session)	/ralph-loop (legacy)
Exploratory work, unclear requirements	Manual Claude Code session

5. Case Study: Kickstand

The Kickstand project demonstrates Ralph's effectiveness at scale. Kickstand is a venue intelligence automation system that had accumulated significant technical debt across multiple architectural phases.

5.1 Results

Metric	Before	After	Change
Active scripts	155	13	-92%
TypeScript errors	30	0	-100%
Health score	6.2	9.2	+48%

5.2 How Ralph Contributed

The systematic reduction from 155 to 13 scripts was achieved through Ralph-style autonomous work:

DRY pass: Unified duplicate implementations (Node.js + Workers)
Rams pass: Archived 153 orphan scripts that no longer served production
Heidegger pass: Reconnected documentation to actual system state

Each pass was decomposed into stories with clear acceptance criteria. Ralph iterated through them autonomously, with human review at story completion.

5.3 Economic Impact

Traditional approach: A senior developer auditing 155 scripts, consolidating to 13, fixing 30 TypeScript errors, and updating documentation would require 40+ hours at $150/hour = $6,000+.

Ralph approach: PRD creation (2 hours human time) + Ralph execution ($50-100 in API costs) = under $500 total.

Savings: $5,500+ (90%+ reduction)

6. Philosophical Grounding

6.1 Nondeterministic Idempotence

Ralph embodies nondeterministic idempotence: different paths, same outcome. Ralph might complete in 8 iterations or 12. Stories might complete in different orders. But the end result is the same: a working feature with all acceptance criteria satisfied.

This is why work survives crashes. If Ralph stops at iteration 5, you restart and it picks up from story 6. The PRD is the source of truth, persisted to disk.

6.2 Fresh Context as Zuhandenheit

In Heideggerian terms, context pollution causes the tool to become present-at-hand (Vorhandenheit)—you notice the cluttered context, the irrelevant tokens, the sluggish responses. Fresh context per iteration keeps the tool ready-to-hand (Zuhandenheit)—transparent, receding into use.

When Ralph works correctly, you don't think about it. You define the feature, run the script, and find working code. The infrastructure disappears; only the work remains.

6.3 The PRD as Task Board

The PRD is Claude's kanban board. Just like humans grab sticky notes from a board, Claude grabs stories from the PRD. The format is simple because it needs to be:

Machine-readable: Claude parses it with jq patterns
Human-readable: You write it without special tooling
Versionable: Git tracks changes, enabling bisection

7. Troubleshooting

Ralph Stops Early

Symptom: All stories show passes: true but feature isn't complete.

Cause: Acceptance criteria too vague. Claude marked them done when they weren't.

Fix: Write more specific acceptance criteria. "Form works" is bad. "Form renders at /login route" is good.

Same Error Repeating

Symptom: Multiple iterations hit the same error.

Cause: Missing context in CLAUDE.md or agents.md.

Fix: Add the learning to CLAUDE.md so future iterations know about it. Ralph reads CLAUDE.md at the start of each iteration.

Story Too Big

Symptom: Claude can't complete a story in one iteration.

Cause: Story scope exceeds context window capacity.

Fix: Break the story into smaller atomic pieces. If a story needs more than 5 files, split it.

8. Implementation

Ralph is production-deployed in the CREATE SOMETHING monorepo:

Script: packages/agent-sdk/scripts/ralph.sh
PRD skill: .claude/skills/prd-to-ralph.md
Template: packages/agent-sdk/templates/prd-template.json
Documentation: .claude/rules/ralph-patterns.md

Prerequisites:

Claude Code CLI installed (npm install -g @anthropic-ai/claude-code)
Git repository initialized
CLAUDE.md file in project root (project context)
jq installed for JSON parsing

9. Conclusion

Ralph transforms overnight development from aspiration to practice. By spawning fresh Claude Code instances per story, the pattern prevents context pollution while maintaining systematic progress through feature requirements.

The economics are decisive: $6 for features that would cost $800+ in developer time. The Kickstand case study validates this at production scale—155 scripts reduced to 13 through systematic autonomous work.

Key takeaways:

Fresh context per iteration prevents pollution—each story gets full attention
PRD as task board enables deterministic story selection and progress tracking
Nondeterministic idempotence ensures work survives crashes
Specific acceptance criteria are the bottleneck—invest in PRD quality

Status: Production-deployed, actively used for CREATE SOMETHING development.

How to Apply This

Define your feature with clear boundaries
Use /prd-to-ralph or write prd.json manually
Ensure each story has specific, testable acceptance criteria
Run ./ralph.sh --max-iterations 10
Go to sleep (or dinner)
Check progress.txt and git log in the morning

Rule of thumb: Spend 30 minutes on PRD quality. It saves 3 hours of failed iterations.

Related Research

Subtractive Triad Audit: Kickstand — Case study of systematic codebase reduction using autonomous work

The Norvig Partnership — Empirical validation of AI-human collaboration achieving 20x productivity gains

Haiku Optimization — Intelligent model routing for cost-effective autonomous development

Attribution

The Ralph pattern is based on Geoffrey Huntley's Ralph Wiggum technique, adapted for CREATE SOMETHING's PRD-to-Ralph workflow.