The Proof Surface | CREATE SOMETHING

Executive Thesis

Proof is the product once work leaves chat.

A model response can sound right. A tool call can succeed. A trace can show every step. A deploy can pass. A Linear issue can close. None of those facts, alone, gives a buyer or operator a usable answer to the practical question:

What happened, what changed, who owns the next decision, and what evidence should we trust?

The missing layer is the Proof Surface: the business-readable layer that turns agent work into inspectable operating receipts.

The Proof Surface answers four questions:

What can run?
What must wait?
What must stop?
What proves the decision?

This is not a replacement for traces, evals, runbooks, or contracts. It is the layer that makes those artifacts legible to the people who have to inherit the workflow.

The practical recommendation is simple: before expanding agent authority, define the proof surface the operator will inspect after the work runs.

What This Paper Gives You

Use this paper when an AI workflow is already capable enough to act, but not yet legible enough to inherit.

It gives you three practical outputs:

A distinction between raw evidence and proof receipts.
A public/private evidence boundary for workflow status surfaces.
A starter proof template that can be applied to one workflow before autonomy expands.

The target reader is the operator, founder, product lead, client sponsor, or builder who needs to know whether agent work can leave chat without losing accountability.

Why Proof Needs A Surface

Most AI workflow demos end at the moment the agent responds.

Real operations begin after that moment.

The operator needs to know whether a customer-facing reply was only drafted or actually sent. The founder needs to know whether a production deploy touched the canonical domain or only a preview alias. The client sponsor needs to know which evidence is safe to share and which logs must stay private. The next teammate needs to know why an action stopped instead of pretending to finish.

Without a proof surface, the system creates a familiar failure mode: execution exists, but accountability is scattered.

The chat transcript has the explanation.
The CI run has the command output.
The trace has the runtime steps.
The issue tracker has the assignment.
The deploy system has the URL.
The client page has only a vague status.

Nobody is lying, but nobody can read the work as one operating path.

The Proof Surface brings those fragments into one inspectable object. It does not expose every private detail. It shows enough for a buyer, operator, or reviewer to understand the state of the work without receiving credentials, raw logs, private client data, or implementation noise.

Proof Is Not The Same As Evidence

Evidence is the underlying material.

Proof is the interpreted receipt.

For agent workflows, evidence can include:

command output
test results
trace IDs
eval scores
screenshots
deploy IDs
source URLs
database snapshots
policy decisions
approval notes
blocked-state records

Those artifacts matter, but most of them are not buyer-readable. A trace can prove runtime behavior to an engineer while still failing to explain the workflow to an operator. A deploy ID can prove promotion to a release owner while saying nothing about whether the customer-facing action was allowed. A screenshot can show a page rendered while hiding whether the data behind it was stale.

Proof is the layer that says what the evidence means.

Evidence	Proof receipt
A passing validation command	The workflow met its release gate.
A trace ID	The run followed the expected path.
A blocked-state JSON file	The action stopped for a named reason.
A deploy URL	The change is visible on a specific surface.
An approval comment	A named owner accepted a risky action.
A rollback note	The team knows how to recover if the change fails.

This distinction keeps teams from oversharing private evidence while still giving stakeholders a trustworthy view of the work.

The Four Visible States

The Proof Surface makes agent work readable through four states.

State	Meaning	Receipt
Run	Bounded work was allowed to proceed.	What action ran, against which object, under which rule.
Wait	A named owner must approve before impact.	The approval note, decision owner, and pending action.
Stop	The workflow exceeded scope, lacked data, or hit policy.	The reason code and recovery path.
Receipt	The result is preserved for review and handoff.	The link, command, trace, note, or delivery record that proves the state.

These states are intentionally small.

Operators do not need to read an entire policy bundle to understand whether the system acted correctly. They need the operating state first. From there, the deeper evidence can remain attached for reviewers who need it.

The state language also prevents fake autonomy. An agent that stops with a reason is more trustworthy than an agent that guesses to preserve the appearance of completion.

Public Status, Private Evidence

The proof surface has to separate what is visible from what is sensitive.

This is where many agent systems fail. They either expose too little and become opaque, or expose too much and leak credentials, raw client data, private logs, or internal reasoning that should never become a customer artifact.

A useful proof surface has two layers:

Layer	Audience	Contents
Public or client-safe status	Buyer, operator, client sponsor, next teammate	Workflow state, owner decision, safe summary, next action, non-sensitive links.
Private evidence packet	Builder, reviewer, release owner, security owner	Commands, trace IDs, raw validation output, secrets-adjacent context, rollback notes, detailed logs.

The separation is not cosmetic. It preserves ownership.

The buyer should see enough to trust the system. The operator should see enough to act. The builder should preserve enough to debug. Sensitive proof should stay behind the right boundary.

That is the difference between transparency and leakage.

How This Fits The Existing Stack

The Proof Surface follows the sequence already established by the CREATE SOMETHING research trail.

The Workflow Trust Layer names what can run, what waits, and what stops.

The Policy OS Contract Bundle defines the portable artifacts that govern capability, behavior, outcomes, regressions, and operations.

The Eval Evidence Layer makes traces, evals, approval receipts, and blocked states measurable enough to change release decisions.

The Proof Surface makes the result readable to the people who inherit the work.

Those layers should not collapse into one document.

Layer	Primary question
Workflow Trust Layer	What is the workflow allowed to do?
Policy OS Contract Bundle	Which artifacts govern the workflow?
Eval Evidence Layer	Which measurements change release decisions?
Proof Surface	What can a human inspect after work runs?

The proof surface is the user-facing edge of governance. It is where policy, evidence, and handoff become understandable.

The Proof Path: Connect, Verify, Coordinate, Control

A practical proof surface follows the same path as governed delivery.

1. Connect

Name the system, account owner, and authority boundary.

This is where the workflow states what it can read, what it can write, and which system owns the source of truth. A connection without ownership is just latent risk.

The proof receipt should answer:

Which account or system is involved?
Who owns access?
Which authority was granted?
Which authority was not granted?

2. Verify

Check the claim before repeating it.

Verification can be a command, route response, screenshot, trace, eval, or direct source read. The important point is that the agent does not merely narrate confidence. It attaches evidence.

The proof receipt should answer:

What was checked?
When was it checked?
What passed or failed?
What is still unverified?

3. Coordinate

Keep ownership, status, and evidence with the work.

Agent work often spans multiple tools and sessions. Without coordination, the next operator inherits a scattered story. A useful proof surface preserves the handoff as an artifact, not as oral tradition.

The proof receipt should answer:

Who owns the next decision?
Which issue, delivery record, or runbook carries the state?
What is blocked?
What can continue safely?

4. Control

Ship the run, wait, stop, and rollback paths.

Control means the workflow can act inside bounds and stop outside them. It also means a future operator can recover. A proof surface without rollback context is incomplete whenever production, revenue, customer trust, or account authority is involved.

The proof receipt should answer:

What can run automatically?
What requires approval?
What stops with a reason?
How does the team recover?

Delivery Records As Proof Surfaces

A delivery record is one of the most useful proof surfaces because it is naturally business-readable.

It can show:

the workflow model
the current status
the owner boundary
the visible decisions
the private evidence boundary
the next action

For a recruiter-gated workflow pilot, the delivery record can show the business model, agent boundary, remaining owner decisions, and client-safe proof without exposing private sourcing data or account credentials.

For a backend handoff, the delivery record can separate account ownership, credentials, app administration, database state, acceptance checks, and transfer options.

The pattern is the same in both cases:

Visible status for the client or operator.
Private evidence for the builder or release owner.
Named authority for the next decision.
A receipt trail that survives the handoff.

This is why proof surfaces are not marketing pages. They are operating artifacts with a public-safe face.

A Worked Example: Support Recovery

Consider an ecommerce support workflow.

A customer writes in because an order has the wrong shipping address. The order is paid, the shipment is not yet fulfilled, and the warehouse cutoff has not passed.

The agent has access to four systems:

the support case
the customer record
the order record
the warehouse cutoff state

The raw capability looks simple: read the case, inspect the order, draft a reply, write an internal order note, and notify the warehouse.

The proof surface is what makes the workflow safe to inspect.

Proof field	Support recovery example
Workflow	Address correction before fulfillment cutoff.
Current state	Run.
Named owner	Support lead owns exceptions; warehouse owns cutoff.
Allowed action	Add an internal order note and draft the customer reply.
Approval-needed action	Credit, refund, cancellation, or shipment reroute after cutoff.
Blocked action	Any payment change or address rewrite after warehouse cutoff.
Evidence summary	Order is paid, unfulfilled, and inside cutoff; address format validated.
Private evidence pointer	Case URL, order lookup, warehouse cutoff check, command output, trace ID.
Public receipt	"Address correction drafted; warehouse note prepared; no payment action taken."
Next action	Support lead reviews the drafted reply or lets the bounded note proceed.

This example matters because the same agent capability can produce three different proof states.

If the order is unfulfilled and inside cutoff, the workflow can run. If the request includes a goodwill credit, it must wait for a named owner. If the customer asks for a refund above the support lane, it must stop with a reason.

The model may be the same in all three cases. The tools may be the same. The proof surface is what changes the operating state.

Database, Automation, Judgment

The Proof Surface maps cleanly to the Three-Tier Framework.

Database: what exists

The Database layer stores the proof material:

workflow records
delivery records
source objects
account ownership
trace links
eval results
command output
approval decisions
blocked-state records
rollback notes

The proof surface should not pretend to be the source of truth for everything. It should point to the source of truth and summarize the current state safely.

Automation: what happened

The Automation layer produces the receipts:

route checks
validation commands
deploys
agent runs
MCP tool calls
CI checks
evidence packaging
handoff updates

Automation makes work fast. The proof surface makes automation inspectable.

Judgment: what should happen

The Judgment layer interprets the receipt:

Is this safe to publish?
Does this need owner approval?
Should this stop?
Is the evidence sufficient?
Does the workflow graduate, narrow scope, or roll back?

The proof surface carries this judgment back to the user in a form they can act on.

What A Good Proof Surface Includes

A useful proof surface is compact, but it is not vague.

It should include:

Workflow name: the business path, not the tool name.
Current state: run, wait, stop, or receipt.
Named owner: the person or role that owns approval.
Allowed action: what the system may do without review.
Blocked action: what the system refused or deferred.
Evidence summary: what was checked and what passed.
Private evidence pointer: where detailed proof lives.
Next action: what should happen now.
Rollback or recovery note: what to do if the path fails.

It should avoid:

raw secrets
full logs with private data
unsupported status claims
vague "completed" labels
hidden approval assumptions
unlinked evidence
screenshots that are treated as the only proof

The test is simple: a new operator should be able to inspect the surface and understand what can happen next without asking the original agent to reconstruct the story.

A Starter Proof Surface Template

A first proof surface can be written as a small operating record.

workflow: support-recovery.address-correction
owner:
  approval: support_lead
  source_account: ecommerce_ops
state: run
boundary:
  can_run:
    - read_support_case
    - read_order
    - read_warehouse_cutoff
    - write_internal_order_note
    - draft_customer_reply
  must_wait:
    - issue_credit
    - post_customer_reply_without_template_match
    - reroute_after_cutoff
  must_stop:
    - refund_above_support_lane
    - missing_order_record
    - payment_state_unclear
evidence:
  public_summary: "Order is paid, unfulfilled, and inside warehouse cutoff."
  private_packet:
    - case_url
    - order_lookup_result
    - warehouse_cutoff_check
    - trace_id
receipt:
  public: "Address correction prepared; no payment action taken."
  private: "validation-output-2026-06-22.md"
next_decision:
  owner: support_lead
  action: review_customer_reply
rollback:
  note: "Remove internal note and escalate to warehouse owner if cutoff state changes."

This record is deliberately small. It is not trying to replace the contract bundle, runbook, trace, or eval ledger. It is the visible operating object that tells the next human how to read the work.

Common Failure Modes

Status Without Receipts

The page says "done," but no evidence is attached.

This is the most common failure. It creates confidence without transferability. A proof surface should avoid status claims that cannot be followed to evidence.

Evidence Without Interpretation

The team has logs, traces, screenshots, and deploy IDs, but no one has translated them into an operating conclusion.

This creates the opposite problem: too much evidence, not enough proof. The proof surface must say what the evidence means.

Public Proof With Private Leakage

The system exposes raw logs, secret-adjacent output, private customer data, or credential context because it treats transparency as unrestricted visibility.

The fix is to split public-safe status from private evidence packets.

Approval Without An Owner

The workflow says an action needs approval, but it does not name who can approve it.

That is not a wait state. It is an abandoned state. A useful proof surface names the owner or stops with a reason.

Agent Continuity Without Handoff

The agent can continue in its own context, but the human team cannot inherit the work.

Long-running agent systems need ownership, checkpoints, and evidence that survive tool boundaries. Otherwise continuity exists only inside the model session.

A One-Workflow Starting Point

Teams do not need to build a full proof platform before using agents.

Start with one workflow your team already protects by hand.

Choose a workflow with:

a visible owner
repeated handoffs
real risk when it fails
at least one system connection
at least one approval boundary
evidence that can be checked

Then create a first proof surface with five sections:

Workflow: name the path and owner.
Boundary: list what can run, wait, and stop.
Evidence: summarize what was checked.
Receipt: link the delivery record, issue, trace, or validation output.
Next decision: name the owner and action.

If this feels too heavy, the workflow probably is not ready for more autonomy. If it feels clarifying, the proof surface is doing its job.

Conclusion

Agent systems need more than capability, contracts, and metrics.

They need a surface where humans can inspect the work.

The Proof Surface is that layer. It turns private evidence into public-safe receipts. It keeps owners visible. It explains why work ran, waited, or stopped. It lets delivery records, runbooks, evals, traces, and release evidence become one operating story.

The result is not just a better report.

It is a safer delegation path: work can leave chat without leaving accountability behind.