Research Papers
40 papers — methodology, data, and conclusions you can verify
The Analyzer MCP: A Policy-Grounded Review Architecture
A review system, not just a site analyzer
How CREATE SOMETHING turned Webflow template review into a multi-surface MCP system that joins Designer state, published-site evidence, policy ingestion, and governed review output.
Composio in the MCP Delivery System
Composio accelerates connectivity; CREATE SOMETHING retains the outcome layer
A decision-grade analysis of why Composio is included for commodity connectivity, how the wrap pattern protects brand and margin, and how delivery remains aligned to Database, Automation, and Judgment control boundaries.
Braintrust Trace Unsurfacing: Finding What Normal Aggregates Hide
A trace audit that turns hidden reliability structure into ranked experiments
How a 1,000-row Braintrust trace snapshot exposed clustered permission failures, routing misses, and latent control-plane stalls that aggregate reliability metrics hid.
The Wrap Pattern: Commodity Integration as Invisible Infrastructure
When MCP consumption is commoditized, the strategic response is to wrap — not build — the plumbing
A structural pattern for integrating commodity MCP vendors as invisible infrastructure while preserving the client-facing surface, the Intelligence Layer margin, and the Three-Tier alignment.
The Webflow Way, Automated
Agent-Ready Template Reviews on Published Sites (WebMCP + Review Snippet)
A case study on exposing Webflow Way QA signals to agents from a published template preview, aligned to WebMCP-style in-browser tools.
Open-Weight Models in Client MCP Work
A decision framework for when to use OpenAI gpt-oss (and safeguard) versus hosted frontier models in client education and implementation.
Guidance for consultancies building MCP integrations: how to choose between OpenAI open-weight models (gpt-oss-20b/120b, gpt-oss-safeguard) and hosted models, with concrete patterns for education, production, and compliance.
The Three-Tier Framework: Database, Rules, Policy
A structural model for agent systems, realized through Model Context Protocol
A hierarchical ontology identifying three tiers connected by typed Artifacts and spanning four cross-cutting concerns, with MCP as natural encapsulation.
Observability Infrastructure: Making AI Operations Visible
Tracing infrastructure, LLM generation, and agent coordination as one surface
A three-layer observability architecture for AI-native systems: infrastructure tracing, LLM generation tracking, and agent coordination unified through shared vocabulary.
The Andon Protocol
When to pull the cord: obligation-based escalation, with a concrete path to deployment
AI-native structured escalation for agent harnesses and multi-agent systems. v3.1 adds Silent Running Detection, cost-parameter defaults and worked examples, Resolution Surface design for batch review, and a three-phase implementation plan. The canonical boundary between Automation and Judgment in the Three-Tier Framework.
Ground: Verification-First Code Analysis
How computed claims replaced guesswork in an 80+ package monorepo
Case study: How Ground saved 8+ hours analyzing an 80+ package monorepo by preventing AI hallucination in code analysis.
Tufte for Mobile: Design Intent Across Screen Sizes
Applying Edward Tufte's principles to preserve meaning in responsive design
A methodology demonstrating how wireframe intent survives responsive transformation through five Tufte principles: data-ink ratio, sparklines, direct labeling, information density, and small multiples.
Ground: Evidence-Based Claims for AI Code Analysis
Computation-Constrained Verification Prevents False Positives in Agentic Development
A tool that blocks AI agents from claiming code is dead, duplicated, or orphaned without first computing the evidence. Now with AI-native features: batch analysis, incremental diff mode, structured fix output, and fix verification. Rated 10/10 by agent testing across two production codebases.