Research Papers

40 papers — methodology, data, and conclusions you can verify

case-study 16 min read intermediate

The Analyzer MCP: A Policy-Grounded Review Architecture

A review system, not just a site analyzer

How CREATE SOMETHING turned Webflow template review into a multi-surface MCP system that joins Designer state, published-site evidence, policy ingestion, and governed review output.

Analyzer MCPWebflowMCPThree-Tier FrameworkPolicy as ArtifactReview SystemsObservability
research 22 min read intermediate

Composio in the MCP Delivery System

Composio accelerates connectivity; CREATE SOMETHING retains the outcome layer

A decision-grade analysis of why Composio is included for commodity connectivity, how the wrap pattern protects brand and margin, and how delivery remains aligned to Database, Automation, and Judgment control boundaries.

ComposioMCPThree-Tier FrameworkWrap PatternAgent Outcome StackPolicy as Artifact
research 15 min read intermediate

Braintrust Trace Unsurfacing: Finding What Normal Aggregates Hide

A trace audit that turns hidden reliability structure into ranked experiments

How a 1,000-row Braintrust trace snapshot exposed clustered permission failures, routing misses, and latent control-plane stalls that aggregate reliability metrics hid.

BraintrustObservabilityMCPReliabilityExperiment DesignDashboarding
research 15 min read intermediate

The Wrap Pattern: Commodity Integration as Invisible Infrastructure

When MCP consumption is commoditized, the strategic response is to wrap — not build — the plumbing

A structural pattern for integrating commodity MCP vendors as invisible infrastructure while preserving the client-facing surface, the Intelligence Layer margin, and the Three-Tier alignment.

MCPWrap PatternCommodity IntegrationCreation MoatThree-Tier FrameworkInvisible InfrastructureAgent ArchitectureCloudflare WorkersModel Context Protocol
methodology 10 min read intermediate

The Webflow Way, Automated

Agent-Ready Template Reviews on Published Sites (WebMCP + Review Snippet)

A case study on exposing Webflow Way QA signals to agents from a published template preview, aligned to WebMCP-style in-browser tools.

WebflowTemplate MarketplaceQATemplate reviewWebMCPMCPAgentsInteractionsAccessibilitySEO
methodology 16 min read intermediate

Open-Weight Models in Client MCP Work

A decision framework for when to use OpenAI gpt-oss (and safeguard) versus hosted frontier models in client education and implementation.

Guidance for consultancies building MCP integrations: how to choose between OpenAI open-weight models (gpt-oss-20b/120b, gpt-oss-safeguard) and hosted models, with concrete patterns for education, production, and compliance.

MCPModel Context ProtocolOpenAIOpen-weight modelsgpt-ossgpt-oss-safeguardClient implementationEducationDeploymentSafetyResponses APICloudflare Workers AI
research 25 min read advanced

The Three-Tier Framework: Database, Rules, Policy

A structural model for agent systems, realized through Model Context Protocol

A hierarchical ontology identifying three tiers connected by typed Artifacts and spanning four cross-cutting concerns, with MCP as natural encapsulation.

Three-Tier FrameworkMCPModel Context ProtocolAgent SystemsDatabaseRulesPolicyPolicy as ArtifactSamplingEmbodied CognitionCloudflare Workers
methodology 15 min read intermediate

Observability Infrastructure: Making AI Operations Visible

Tracing infrastructure, LLM generation, and agent coordination as one surface

A three-layer observability architecture for AI-native systems: infrastructure tracing, LLM generation tracking, and agent coordination unified through shared vocabulary.

ObservabilityLangfuseCloudflare WorkersMCPAI AgentsTracingMonitoring
research 18 min read intermediate

The Andon Protocol

When to pull the cord: obligation-based escalation, with a concrete path to deployment

AI-native structured escalation for agent harnesses and multi-agent systems. v3.1 adds Silent Running Detection, cost-parameter defaults and worked examples, Resolution Surface design for batch review, and a three-phase implementation plan. The canonical boundary between Automation and Judgment in the Three-Tier Framework.

AndonThree-Tier FrameworkJudgmentAutomationHITLKaizen
case-study 15 min read intermediate

Ground: Verification-First Code Analysis

How computed claims replaced guesswork in an 80+ package monorepo

Case study: How Ground saved 8+ hours analyzing an 80+ package monorepo by preventing AI hallucination in code analysis.

GroundCode AnalysisHallucination PreventionMonorepo
methodology 10 min read intermediate

Tufte for Mobile: Design Intent Across Screen Sizes

Applying Edward Tufte's principles to preserve meaning in responsive design

A methodology demonstrating how wireframe intent survives responsive transformation through five Tufte principles: data-ink ratio, sparklines, direct labeling, information density, and small multiples.

TufteMobileResponsive DesignData VisualizationData-ink RatioWireframesInformation DesignCanon
case-study 18 min read intermediate

Ground: Evidence-Based Claims for AI Code Analysis

Computation-Constrained Verification Prevents False Positives in Agentic Development

A tool that blocks AI agents from claiming code is dead, duplicated, or orphaned without first computing the evidence. Now with AI-native features: batch analysis, incremental diff mode, structured fix output, and fix verification. Rated 10/10 by agent testing across two production codebases.

GroundEvidence-Based ClaimsDRY ViolationsDead Code DetectionCloudflare WorkersMCPSubtractive TriadAI-NativeIncremental AnalysisAutonomous Agents