PAPER-2026-002

Ralph Implementation: Overnight Autonomous Development

Fresh Claude Code instances working through user stories while you sleep—achieving production-ready features at $6 instead of $800+ in developer time.

Research 15 min read Intermediate

Abstract

This paper documents the Ralph pattern for autonomous overnight development. Named after Geoffrey Huntley's Ralph Wiggum technique, Ralph spawns fresh Claude Code instances that iterate through user stories until a feature is complete. Each iteration gets a clean context window—preventing context pollution across stories. We present the PRD-to-Ralph workflow, the ralph.sh implementation, and cost analysis showing $6 for 12-15 story features compared to 8+ hours of developer time ($800+ at $100/hour). Case study validation from the Kickstand project demonstrates 155 scripts reduced to 13 through systematic autonomous work. The contribution is both practical (a working overnight development system) and philosophical (nondeterministic idempotence—different paths, same outcome).

$6
Cost for 20 iterations
155 to 13
Kickstand script reduction
Fresh
Context per iteration
Overnight
Autonomous execution

1. What is Ralph?

Ralph is an iterative autonomous development pattern that spawns fresh Claude Code instances to work through user stories. Named after Geoffrey Huntley's Ralph Wiggum technique, the pattern exploits a key insight: each iteration benefits from a clean context window.

Traditional agent loops accumulate context as they work. By story 5, the context window is cluttered with implementation details from stories 1-4. Ralph solves this by starting fresh each iteration—Claude reads the PRD, picks an incomplete story, implements it, commits, and exits. The next iteration starts with full context capacity.

The Core Loop

for iteration in 1..MAX_ITERATIONS:
  1. Read prd.json
  2. Find story where passes == false
  3. Spawn fresh Claude Code instance
  4. Claude implements story, commits, updates prd.json
  5. Log to progress.txt
  6. If all stories pass → done
  7. Next iteration

The PRD (Product Requirements Document) serves as Claude's task board. Each story has acceptance criteria that must be satisfied for passes: true. When all stories pass, Ralph exits.

Key Insight: Context Pollution

Context pollution is real. When working on a multi-file feature in a single session, Claude accumulates tokens about each implementation decision. These tokens are wasted when moving to unrelated stories.

By spawning fresh instances, Ralph ensures:

  • Each story gets Claude's full attention (no irrelevant context)
  • No "memory" of implementation details that don't matter
  • Cleaner, more focused work per iteration
  • Natural parallelization opportunity (though Ralph runs sequentially)

2. The PRD-to-Ralph Workflow

The workflow consists of three phases: PRD creation, Ralph execution, and result verification.

2.1 Creating the PRD

A PRD is a JSON file defining user stories with acceptance criteria:

{
  "title": "Agency Contact Form",
  "description": "Contact form with validation and D1 storage",
  "stories": [
    {
      "id": "contact-1",
      "title": "Create contact submissions D1 table",
      "acceptance": [
        "Migration file exists at migrations/XXXX_contact_submissions.sql",
        "Table has columns: id, name, email, message, created_at",
        "Migration applies without errors"
      ],
      "files": ["packages/agency/migrations/"],
      "passes": false
    },
    {
      "id": "contact-2",
      "title": "Add contact form API endpoint",
      "acceptance": [
        "POST /api/contact returns 200 on valid submission",
        "Returns 400 with errors on invalid email",
        "Stores submission in D1 contact_submissions table"
      ],
      "files": ["packages/agency/src/routes/api/contact/+server.ts"],
      "passes": false
    }
  ]
}

Story rules:

RuleWhy
One story = one context windowKeeps iterations focused
Max 3-5 files per storyPrevents scope creep
Acceptance criteria must be verifiableAgent needs to know when done
Order by dependencyFoundation, Core, UI, Integration

2.2 The /prd-to-ralph Skill

Claude Code includes a skill that converts feature descriptions into PRDs:

# In Claude Code session
"Use /prd-to-ralph to create a user authentication feature
 with login, signup, and password reset"

The skill asks clarifying questions, breaks the feature into atomic stories, writes testable acceptance criteria, and outputs prd.json.

2.3 Running Ralph

# Basic usage
./packages/agent-sdk/scripts/ralph.sh

# Custom iterations (for larger features)
./packages/agent-sdk/scripts/ralph.sh --max-iterations 20

# Custom PRD file
./packages/agent-sdk/scripts/ralph.sh --prd-file features/auth-prd.json

Ralph outputs progress to progress.txt and archives thread logs to .ralph-archive/. When all stories pass, it archives the completed PRD.

3. How ralph.sh Works

The script is a bash loop that orchestrates Claude Code instances. Here's the implementation architecture:

3.1 Architecture

ralph.sh
    |
    +-- reads prd.json (finds incomplete story)
    |
    +-- spawns claude --print --dangerously-skip-permissions
    |       |
    |       +-- Claude reads prd.json
    |       +-- Claude implements story
    |       +-- Claude commits changes
    |       +-- Claude updates prd.json (passes: true)
    |       +-- Claude logs to progress.txt
    |       +-- Claude exits
    |
    +-- checks if all stories complete
    |       |
    |       +-- if yes: exit loop
    |       +-- if no: next iteration
    |
    +-- archives thread log
    |
    +-- next iteration (fresh Claude instance)

3.2 System Prompt

Each Claude instance receives a consistent system prompt:

You are an autonomous coding agent working on this project.

## Your Task
1. Read the PRD file (prd.json) and find a user story where "passes": false
2. Pick ONE story to implement (usually the first incomplete one)
3. Implement it according to the acceptance criteria
4. Run any relevant tests to verify your implementation
5. Commit your changes with a clear message: "feat: <story title>"
6. Update prd.json - set "passes": true for the completed story
7. Append to progress.txt

## Important Rules
- Complete ONE story per iteration, then stop
- Each story must be atomic and independently verifiable
- If all stories pass, output: ALL_STORIES_COMPLETE

3.3 Key Implementation Details

DetailImplementationPurpose
Fresh contextNew claude process each iterationPrevents context pollution
Autonomous mode--dangerously-skip-permissionsNo human confirmation needed
Output capture--print flag + teeArchives for debugging
Story selectionjq filters passes == falseDeterministic story ordering
Completion signalALL_STORIES_COMPLETE in outputEarly exit when done

3.4 Output Files

FilePurpose
prd.jsonUser stories (updated as stories complete)
progress.txtShort-term memory, iteration logs
.ralph-archive/Thread logs, archived PRDs

4. Cost Analysis

Ralph's economics are compelling: $6 for overnight feature development compared to 8+ hours of developer time.

4.1 Ralph Cost Estimation

IterationsEstimated CostUse Case
5~$1.50Small feature (3-4 stories)
10~$3.00Medium feature (6-8 stories)
20~$6.00Large feature (12-15 stories)

4.2 Comparison to Developer Time

For a 12-story feature requiring 8 hours of developer time at $100/hour:

ApproachCostTimeAvailability
Developer$8008 hoursBusiness hours
Ralph$6Overnight24/7
Savings$794 (99.25%)

Key insight: Ralph runs overnight. You describe the feature before leaving work, run Ralph, and find completed code in the morning.

4.3 When Ralph Makes Sense

ScenarioRecommendation
Well-defined feature with clear storiesUse Ralph
Overnight autonomous workUse Ralph
Sequential dependent storiesUse Ralph
3+ independent features simultaneouslyConsider Gastown (parallel)
Quick test-fix loop (same session)/ralph-loop (legacy)
Exploratory work, unclear requirementsManual Claude Code session

5. Case Study: Kickstand

The Kickstand project demonstrates Ralph's effectiveness at scale. Kickstand is a venue intelligence automation system that had accumulated significant technical debt across multiple architectural phases.

5.1 Results

MetricBeforeAfterChange
Active scripts15513-92%
TypeScript errors300-100%
Health score6.29.2+48%

5.2 How Ralph Contributed

The systematic reduction from 155 to 13 scripts was achieved through Ralph-style autonomous work:

  • DRY pass: Unified duplicate implementations (Node.js + Workers)
  • Rams pass: Archived 153 orphan scripts that no longer served production
  • Heidegger pass: Reconnected documentation to actual system state

Each pass was decomposed into stories with clear acceptance criteria. Ralph iterated through them autonomously, with human review at story completion.

5.3 Economic Impact

Traditional approach: A senior developer auditing 155 scripts, consolidating to 13, fixing 30 TypeScript errors, and updating documentation would require 40+ hours at $150/hour = $6,000+.

Ralph approach: PRD creation (2 hours human time) + Ralph execution ($50-100 in API costs) = under $500 total.

Savings: $5,500+ (90%+ reduction)

6. Philosophical Grounding

6.1 Nondeterministic Idempotence

Ralph embodies nondeterministic idempotence: different paths, same outcome. Ralph might complete in 8 iterations or 12. Stories might complete in different orders. But the end result is the same: a working feature with all acceptance criteria satisfied.

This is why work survives crashes. If Ralph stops at iteration 5, you restart and it picks up from story 6. The PRD is the source of truth, persisted to disk.

6.2 Fresh Context as Zuhandenheit

In Heideggerian terms, context pollution causes the tool to become present-at-hand (Vorhandenheit)—you notice the cluttered context, the irrelevant tokens, the sluggish responses. Fresh context per iteration keeps the tool ready-to-hand (Zuhandenheit)—transparent, receding into use.

When Ralph works correctly, you don't think about it. You define the feature, run the script, and find working code. The infrastructure disappears; only the work remains.

6.3 The PRD as Task Board

The PRD is Claude's kanban board. Just like humans grab sticky notes from a board, Claude grabs stories from the PRD. The format is simple because it needs to be:

  • Machine-readable: Claude parses it with jq patterns
  • Human-readable: You write it without special tooling
  • Versionable: Git tracks changes, enabling bisection

7. Troubleshooting

Ralph Stops Early

Symptom: All stories show passes: true but feature isn't complete.

Cause: Acceptance criteria too vague. Claude marked them done when they weren't.

Fix: Write more specific acceptance criteria. "Form works" is bad. "Form renders at /login route" is good.

Same Error Repeating

Symptom: Multiple iterations hit the same error.

Cause: Missing context in CLAUDE.md or agents.md.

Fix: Add the learning to CLAUDE.md so future iterations know about it. Ralph reads CLAUDE.md at the start of each iteration.

Story Too Big

Symptom: Claude can't complete a story in one iteration.

Cause: Story scope exceeds context window capacity.

Fix: Break the story into smaller atomic pieces. If a story needs more than 5 files, split it.

8. Implementation

Ralph is production-deployed in the CREATE SOMETHING monorepo:

  • Script: packages/agent-sdk/scripts/ralph.sh
  • PRD skill: .claude/skills/prd-to-ralph.md
  • Template: packages/agent-sdk/templates/prd-template.json
  • Documentation: .claude/rules/ralph-patterns.md

Prerequisites:

  • Claude Code CLI installed (npm install -g @anthropic-ai/claude-code)
  • Git repository initialized
  • CLAUDE.md file in project root (project context)
  • jq installed for JSON parsing

9. Conclusion

Ralph transforms overnight development from aspiration to practice. By spawning fresh Claude Code instances per story, the pattern prevents context pollution while maintaining systematic progress through feature requirements.

The economics are decisive: $6 for features that would cost $800+ in developer time. The Kickstand case study validates this at production scale—155 scripts reduced to 13 through systematic autonomous work.

Key takeaways:

  • Fresh context per iteration prevents pollution—each story gets full attention
  • PRD as task board enables deterministic story selection and progress tracking
  • Nondeterministic idempotence ensures work survives crashes
  • Specific acceptance criteria are the bottleneck—invest in PRD quality

Status: Production-deployed, actively used for CREATE SOMETHING development.

How to Apply This

  1. Define your feature with clear boundaries
  2. Use /prd-to-ralph or write prd.json manually
  3. Ensure each story has specific, testable acceptance criteria
  4. Run ./ralph.sh --max-iterations 10
  5. Go to sleep (or dinner)
  6. Check progress.txt and git log in the morning

Rule of thumb: Spend 30 minutes on PRD quality. It saves 3 hours of failed iterations.

Related Research

Subtractive Triad Audit: Kickstand — Case study of systematic codebase reduction using autonomous work

The Norvig Partnership — Empirical validation of AI-human collaboration achieving 20x productivity gains

Haiku Optimization — Intelligent model routing for cost-effective autonomous development

Attribution

The Ralph pattern is based on Geoffrey Huntley's Ralph Wiggum technique, adapted for CREATE SOMETHING's PRD-to-Ralph workflow.