PAPER-2026-001

The Three-Tier Framework

A structural model for agent systems, realized through Model Context Protocol

Research · 25 min read · Advanced
With contributions from Joey (Senior System Architect, Webflow)

I. Abstract

This framework proposes a hierarchical ontology for understanding and building agent systems. It identifies three distinct tiers—Database, Automation, and Judgment—connected by typed Artifacts and spanning four cross-cutting concerns: Touchpoints, Artifacts, Orchestration, and Insight.

The Model Context Protocol (MCP) encapsulates this structure naturally: its three primitives (Resources, Tools, Prompts) map directly to the three tiers. More precisely, MCP's control model distinctions—application-controlled, model-controlled, user-controlled—produce the tier separations. The who decides and the what kind of work converge.

Critically, the framework is not a simple stack. MCP's sampling mechanism reveals a recursive property: Automation can request Judgment, creating a feedback loop. This mirrors embodied cognition—the body doesn't just execute commands; it participates in thinking by encountering the world and asking for judgment.

The framework's most significant implication: policy itself is an artifact. System prompts, constraints, and behavioral rules flow through the same tiers as any other data—stored in Database, transformed by Automation, evaluated by Judgment. This enables versioned constraints, context-driven policy selection, and reflexive self-modification under human oversight.

II. The Framework

The Three-Tier Framework organizes agent systems into three functional layers, each corresponding to a distinct type of work and a distinct locus of control. These tiers are not merely organizational—they capture causal dependency. You cannot have judgment without process, cannot have process without substrate.

Four cross-cutting concerns span all three tiers: Touchpoints (where interaction happens), Artifacts (what flows between layers), Orchestration (how execution flows), and Insight (how the system perceives itself). These are not tiers—they don't do work the way tiers do—but they are essential to how the system operates.

TierQuestionMCP PrimitiveControl Model
DatabaseWhat exists?ResourcesApplication-controlled
AutomationWhat happens?ToolsModel-controlled
JudgmentWhat should happen?PromptsUser-controlled

III. Definitions

The Three Tiers

Database Layer

  • · What exists. The substrate of state, content, and record
  • · Databases, applications, payment systems, websites, files, API endpoints
  • · Control: Application-controlled
  • · MCP: Resources primitive

Automation Layer

  • · What happens. The agentic layer where LLM-driven functions execute
  • · Tools, skills, and the harness that constrains agent behavior
  • · Control: Model-controlled
  • · MCP: Tools primitive

Judgment Layer

  • · What should happen. The policy layer where constraints and human oversight determine quality
  • · Constraints, weights, human oversight, policy definitions
  • · Control: User-controlled
  • · MCP: Prompts primitive

The Database layer contains everything that can be touched, queried, or persisted. The client application (not the model, not the user) decides when to fetch data and inject it into context—this is infrastructure-level decision-making.

The Automation layer is specifically model-controlled—the LLM decides when to invoke tools during its reasoning process. The "harness" and "ethos" of an agent belong here—not as external observation, but as constitutive rules that define what the agent can and cannot do. This distinguishes agentic automation from procedural Orchestration.

The Judgment layer is where decisions get made about correctness, relevance, and appropriate action. Human judgment and algorithmic evaluation occupy the same functional role: given this input, what is the right output? Policy defines the boundaries; the Insight concern makes the application of that policy legible.

Cross-Cutting Concerns

Artifacts

What flows between layers. Typed payloads that move through the system: RFI objects, submittal payloads, log summaries, decision records. Artifacts are the boundary contracts between tiers—they can be versioned, validated at transitions, and observed in flight.

Touchpoints

Where interaction happens. The MCP server surface that spans all tiers. Every URI, webhook endpoint, embedded interface, and API surface is a touchpoint. This is not a layer but a cross-cutting concern—the membrane through which external systems and humans interact with the framework.

Orchestration

How execution flows. The procedural coordination that connects tiers and sequences operations: workflows, triggers, cron jobs, webhook handlers. Orchestration is application-controlled (like Database) but distinct in function: Database stores, Orchestration sequences. This distinguishes procedural automation (deterministic, testable) from agentic automation (probabilistic, model-controlled).

Insight

How the system perceives itself. The perceptual membrane that makes execution legible: observability, human-in-the-loop approval, audit trails, confidence scores, reasoning traces. Insight is not a processing tier—it watches work being done. Without Insight, policy modification is blind mutation. With Insight, every policy selection is traced, every constraint change is logged.

IV. MCP as Encapsulation

The Model Context Protocol defines three server primitives, each with a distinct control model—who decides when the primitive is used:

MCP PrimitiveControl ModelDescription
ResourcesApplication-controlledThe client application decides when to fetch and inject data into context
ToolsModel-controlledThe LLM decides when to invoke functions during reasoning
PromptsUser-controlledThe human explicitly selects templates to guide interaction

MCP also defines Sampling—a mechanism that allows Tools to request LLM access back through the Client. This creates the feedback loop where Automation can invoke Judgment.

Control Model Convergence

This control model distinction is the key. MCP's designers separated primitives by who decides. The framework separates tiers by what kind of work they do. They converge because the who and the what are correlated:

MCP PrimitiveControl ModelFramework TierRationale
ResourcesApplication-controlledDatabaseData decisions are infrastructure concerns—what exists and when to surface it
ToolsModel-controlledAutomationAction decisions are agent reasoning—what happens and when to execute
PromptsUser-controlledJudgmentPolicy and guidance decisions are human oversight—what should happen and why

The mapping is not accidental. When building on MCP, you are instantiating this framework directly:

  • Workers exposing Resources → Database Layer (application-controlled data)
  • Workers exposing Tools → Automation Layer (model-controlled actions)
  • Prompts and skill definitions → Judgment Layer (user-controlled guidance)
  • Sampling requests → Feedback loop (Automation requesting Judgment)
  • MCP server endpoints → Touchpoints
  • JSON schemas and structured outputs → Artifacts

The framework is not an abstraction imposed on MCP—it is the structure that MCP's control model distinctions already assume.

V. The Control Model Hierarchy

The control models form a hierarchy of decision-making authority:

1. Users Set Boundaries

Through prompt selection, users define the operating constraints. This is the Judgment layer: policy, ethics, acceptable outcomes.

2. Models Act Within Boundaries

The agent reasons and selects tools, but only within the space the user has defined. This is the Automation layer: execution, skill invocation, agentic work.

3. Applications Provide Substrate

The infrastructure makes data available (or not), independent of what the model wants. This is the Database layer: state, records, content that exists.

Failure Modes

The hierarchy also explains failure modes:

Judgment Failures

  • · Wrong prompt selected
  • · Poor constraints
  • · Misaligned ethos
  • · Agent acts correctly but produces wrong outcomes

Automation Failures

  • · Tool errors
  • · Skill bugs
  • · Agent mistakes
  • · Agent has right constraints but execution fails

Database Failures

  • · Missing data
  • · Stale state
  • · Unavailable resources
  • · Agent can't act because substrate is broken

Debugging Heuristic

Debugging follows the hierarchy: check Database first (is data there?), then Automation (did execution work?), then Judgment (was policy correct?). The causality flows upward—you cannot have judgment without process, cannot have process without substrate.

VI. The Recursive Property: Sampling as Feedback Loop

The framework is not a simple stack. MCP's sampling mechanism reveals that the hierarchy is actually a cycle with directional flow.

What Sampling Does

Sampling allows a Tool (MCP server) to request LLM access back through the Client. The tool says: "I need judgment to complete my work." The client proxies that request up to the LLM and returns the result.

The Flow

StepActorActionLayer
1UserSelects prompts, constraintsJudgment
2AgentReasons and decides to call a toolAutomation
3ToolExecutes, but needs LLM judgment to completeAutomation
4ToolSends sampling request back to clientAutomation → Judgment
5ClientProxies request to LLMJudgment
6LLMReturns result to clientJudgment
7ClientForwards result to toolJudgment → Automation
8ToolCompletes and returns to agentAutomation

The tool doesn't need its own LLM configuration—it piggybacks on the client's. This is why Samuel Colvin (Pydantic) calls it "extremely powerful": you can build MCP servers that perform agentic work without each server needing independent LLM access.

Why This Matters Architecturally

Without Sampling

  • · Every tool needing judgment must configure its own LLM access
  • · Each server manages its own API keys and rate limits
  • · Each server bears its own inference costs
  • · Context is duplicated that the main agent already has

With Sampling

  • · Tool delegates judgment back to the calling system
  • · Client becomes a shared resource for reasoning
  • · Centralized cost management
  • · Context stays lean—specialized context lives in tools

Embodied Cognition Parallel

This recursive property mirrors what phenomenologists observed about human cognition:

"Cognition isn't a separate layer commanding the body—it's enacted through embodied action. The body isn't just executing; it's informing cognition through its encounters with the world."

— Heidegger, Being and Time (1927); Merleau-Ponty, Phenomenology of Perception (1945)

The parallel: In the three-tier framework, Automation (the body/tools) doesn't just execute commands from Judgment. Through sampling, Automation can request Judgment. The tool encounters something in the world (Database) and needs judgment to proceed—so it asks.

This is why the intuition that "something is above Judgment" is both right and wrong:

  • Wrong because there's no fourth layer
  • Right because the system is recursive—Judgment sits "above" itself through the feedback loop

Implications for Design

Context Window Management

Colvin's example shows a research agent that calls a BigQuery tool. The tool has its own system prompt with SQL schema details. If that context lived in the main agent, it would bloat every request. By isolating it in the tool and using sampling, the main agent stays lean.

Cost Distribution

The main client bears LLM costs, but tools can contribute specialized context. This is more efficient than each tool paying for its own inference.

Trust Boundaries

The client controls what sampling requests it honors. A tool can ask for LLM access, but the client decides whether to grant it. This maintains the user-controlled property of Judgment while allowing Automation to participate.

VII. Policy as Artifact

The framework's most significant implication: policy is not external to the system—it is an artifact that flows through the tiers.

The Conventional View vs. The Artifact View

Conventional View

  • · Policy (system prompts, constraints) treated as fixed scaffolding
  • · Written once, applied always
  • · Policy sits outside the system
  • · It constrains but does not participate
  • · A "harness" is one implementation: fixed rules bounding agent behavior

Artifact View

  • · Policy is data flowing through tiers like any other artifact
  • · Can be versioned, selected contextually, modified reflexively
  • · Policy participates in the system
  • · Stored in Database, transformed by Automation, evaluated by Judgment
  • · Always under human oversight at the Judgment layer

The tiers operate on policy just as they operate on any artifact:

TierPolicy Operation
DatabaseStores policy versions (prompts, constraints, ethos as versioned data)
AutomationTransforms/requests policy (tool asks "give me the strict constraints" via sampling)
JudgmentEvaluates which policy to apply (selects constraints appropriate to context)

What This Enables

Policy Versioning

Multiple constraint sets coexist as stored artifacts. A financial compliance policy. A creative exploration policy. A debugging policy. Version-controlled, auditable, selectable at runtime.

Context-Driven Selection

A tool encountering sensitive data can request (via sampling) the appropriate policy. "I'm about to handle PII—give me the strict constraints." The client evaluates and returns the right policy for the moment.

Graduated Trust

Different operations invoke different constraint levels. Read operations get permissive policy. Write operations get restrictive policy. Delete operations require human approval policy. Policy isn't uniform—it's contextual.

Self-Modification Through the Loop

The system can observe its own policy, request modifications through sampling, and apply new constraints. This isn't unconstrained self-modification—the Judgment layer (user-controlled) still decides what modifications to honor.

Multi-Agent Coordination

When multiple agents coordinate (via systems like Beads, Loom, or Agent Mail), they're not just passing task artifacts—they're passing policy artifacts.

Policy Propagation Example

Agent A (financial analysis) → passes policy artifact: "PII handling required" → Agent B (data processing) → applies received constraints, processes under financial compliance policy → Agent C (reporting)

The coordination layer isn't a fourth tier above Judgment. It's artifact-passing at the Judgment level. Agents share:

  • Task artifacts: what to do
  • Context artifacts: what's known
  • Policy artifacts: how to behave

This is why coordination systems like Beads (Yegge) work: they give agents shared memory that includes behavioral constraints, not just task state. Policy propagates through the agent graph as data.

Insight as Self-Perception

If policy is an artifact the system can modify, the Insight concern becomes essential—not as an MCP primitive, but as the perceptual loop that makes self-modification legible.

Without Insight

  • · Policy changes are blind mutations
  • · No audit trail of constraint evolution
  • · No ability to diagnose behavioral drift

With Insight

  • · Every policy selection is traced
  • · Constraint changes are logged with context
  • · The system perceives itself modifying itself

This is the reflexive loop that embodied cognition predicts: the body (Automation) doesn't just act—it perceives its own acting. The system doesn't just apply constraints—it watches itself choosing constraints.

Risks and Mitigations

Self-modifying policy introduces risk. A system that can request looser constraints might game itself into unsafe behavior. Mitigations:

  1. User-controlled ceiling. The Judgment layer (Prompts) remains user-controlled. The system can request policy changes, but humans define what's requestable.
  2. Policy immutability tiers. Some constraints are mutable (formatting preferences), some are immutable (safety boundaries). Store them differently in Database.
  3. Insight as governance. Every policy modification is logged. Anomaly detection on constraint drift. The perceptual loop becomes the audit trail.

The framework doesn't solve this risk—it names it and provides the vocabulary to reason about it.

VIII. Properties

Causality

The framework captures dependency: Database feeds Automation feeds Judgment. You cannot have judgment without process, cannot have process without substrate. This is not just organization—it's a debugging heuristic.

When a system fails, ask:

  1. Did the Database layer fail to provide data?
  2. Did Automation execute incorrectly?
  3. Did Judgment apply wrong policy?

Blurriness

The boundaries between tiers are elastic, not rigid. An agent with tools and skills running in the background is already making judgments—the policy is embedded in how it decides which tool to call. The three tiers are a spectrum, with boundaries getting blurry as agents get more capable.

This blurriness is a feature, not a bug. It reflects the reality that sophisticated automation contains embedded judgment, and that judgment requires automation to act. The sampling mechanism makes this explicit: a tool can request judgment, collapsing the boundary between Automation and Judgment for that moment.

Brittleness vs. Variety

A deep, type-safe monorepo where all three tiers share a unified stack (e.g., Cloudflare Workers, TypeScript end-to-end) gives speed and correctness guarantees, but creates a single point of conceptual failure.

A polyglot stack with more variety is more resilient (different failure modes don't cascade the same way) but introduces translation costs at every Artifact boundary.

The framework does not prescribe which approach to take—it provides the vocabulary to reason about the tradeoff.

IX. Implementation: Cloudflare-First Architecture

For a unified stack approach using Cloudflare:

Framework ElementCloudflare Service
Database LayerD1, KV, R2, Durable Objects
Automation LayerWorkers, Workflows, Queues
Judgment LayerWorkers AI, External LLM APIs
TouchpointsWorker endpoints, MCP servers
OrchestrationWorkers (procedural), Workflows
InsightLogpush, Analytics, custom tracing
ArtifactsJSON schemas, structured outputs

This provides type-safety from Database through Judgment, with MCP servers as the Touchpoint surface. The Cloudflare stack exemplifies the "Brittleness" property: a unified platform gives speed and correctness guarantees at the cost of platform dependency.

X. Applications

Two concrete applications demonstrate how the framework applies to real systems:

WORKWAY — Workflow Automation for Construction

Database: Procore projects, RFIs, daily logs, submittals
Automation: AI skills (draft RFI, summarize logs, review submittals)
Judgment: Policy definitions, human approval gates, trust boundaries
Orchestration: Workflow triggers, webhook handlers, notification dispatchers
Insight: Execution traces, approval audit logs, confidence scores
Touchpoints: MCP server endpoints for Procore, Slack, email
Artifacts: RFI objects, summary reports, compliance flags

CREATE SOMETHING — Custom MCP Development

Database: Client systems (Salesforce, HubSpot, internal tools)
Automation: Custom MCP servers connecting systems to agents
Judgment: Agent policy, skill constraints, trust boundaries
Orchestration: Connection flows, OAuth handlers, request routing
Insight: Agent observability, decision traces, HITL surfaces
Touchpoints: MCP server URIs, OAuth surfaces
Artifacts: Integration payloads, structured responses

XI. Conclusion

The three-tier framework—Database, Automation, Judgment—provides a structural model for reasoning about agent systems. MCP encapsulates it naturally through its primitives (Resources, Tools, Prompts) and control model distinctions (application-controlled, model-controlled, user-controlled).

Four cross-cutting concerns span the tiers: Touchpoints (interface surface), Artifacts (boundary contracts), Orchestration (procedural flow), and Insight (perceptual membrane). These are not tiers—they don't do work the way tiers do—but they are essential to how the system operates.

The framework is not a simple hierarchy. MCP's sampling mechanism reveals the recursive property: Automation can request Judgment, closing the loop. The tool encounters the world and asks for judgment. This mirrors embodied cognition—the body doesn't just execute; it participates in thinking.

The policy-as-artifact insight extends this further: the constraints that govern agent behavior are not external scaffolding but data flowing through the tiers. Policy can be versioned, selected contextually, and modified reflexively—always under human oversight at the Judgment layer. Multi-agent coordination becomes policy-passing: agents share not just tasks but behavioral constraints.

Artifacts flow between tiers as typed boundary contracts. Touchpoints span all tiers as the interaction surface. Orchestration sequences procedural work. Insight makes the system legible to itself and to humans. Sampling allows lower tiers to reach back up when they need judgment. And policy itself participates in this flow.

This is not an abstraction imposed on systems—it is the shape that MCP already assumes. The framework names it.

References

  1. Heidegger, M. Being and Time. Trans. Macquarrie & Robinson. New York: Harper & Row, 1927.
  2. Merleau-Ponty, M. Phenomenology of Perception. Trans. Colin Smith. London: Routledge, 1945.
  3. Anthropic. "Model Context Protocol Specification." 2024. modelcontextprotocol.io
  4. Colvin, S. "MCP Sampling: How It Works and Why It Matters." 2025.
  5. Yegge, S. "Beads: Agent-Native Persistence for Cross-Session Memory." 2025.