Experiment

Meeting Capture

Tools Recede, Understanding Remains

December 2025 · Swift + Cloudflare Workers · Heidegger, Rams

╔══════════════════════════════════════════════════════════════════╗
║  MEETING CAPTURE                                                 ║
║                                                                  ║
║  ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌─────────┐  ║
║  │  ((o))   │     │    ~     │     │   ___    │     │   [D1]  │  ║
║  │  AUDIO   │ ──► │ WHISPER  │ ──► │  TEXT    │ ──► │  STORE  │  ║
║  └──────────┘     └──────────┘     └──────────┘     └─────────┘  ║
║                                                                  ║
║     Swift          Workers AI        Transcript       Database   ║
║    (local)        (Cloudflare)                                   ║
╚══════════════════════════════════════════════════════════════════╝
			

Hypothesis

A meeting transcription tool with fewer features will achieve higher actual utility than feature-rich alternatives—because tools that recede into transparent use (Zuhandenheit) serve understanding better than tools that demand attention.

Success Criteria

Zero interactions during meeting

Tool requires no attention while recording

Transcription completes <60s post-meeting

Actual: 15-30s for typical meetings

Auto-detection works for Zoom, Meet, Teams

Tested with Zoom; Meet/Teams detection implemented

Auto-stop when meeting ends

Partial: Manual stop required; detection unreliable

System audio capture

macOS Sonoma requires kernel extension; microphone only

Measured Results

MetricTargetActualStatus
Setup time<5 min~3 minPass
Transcription latency<60s15-30sPass
Audio file size (30min)<10MB~500KBPass
Interactions during meeting00Pass
Post-meeting interactions1 (stop)1Partial
System audio captureYesNoFail

Grounding: Heidegger on Tools

In Being and Time (1927), Heidegger distinguishes two modes of encountering equipment:

Zuhandenheit

Ready-to-hand

The tool disappears into use. When hammering, we think about the nail—not the hammer. The tool serves the work; attention flows through it.

Vorhandenheit

Present-at-hand

The tool becomes an object of attention. When the hammer breaks, we suddenly see the hammer. It intrudes into consciousness.

"The less we just stare at the hammer-thing, and the more we seize hold of it and use it, the more primordial does our relationship to it become."

— Heidegger, Being and Time §15

Feature-rich tools like Granola and Otter shift equipment from Zuhanden to Vorhanden through accumulated complexity: speaker identification, action item extraction, calendar sync, team sharing. Each individually justified; collectively attention-demanding.

Method: Subtractive Triad

Following Rams's tenth principle ("As little design as possible") and the Subtractive Triad, we asked three questions for each potential feature:

DRY (Implementation)

"Have I built this before?"

→ Cloudflare D1, R2, Workers AI already exist. Use them.

Rams (Artifact)

"Does this feature earn its existence?"

→ Speaker diarization? Action items? No—complexity without proportional value.

Heidegger (System)

"Does this serve the whole?"

→ Store in D1 to join the CREATE SOMETHING knowledge graph.

Features Removed

  • Real-time transcription — Complexity without benefit; post-meeting suffices
  • Speaker identification — Adds 3x complexity; rarely needed for personal use
  • Action item extraction — AI hallucination risk exceeds value
  • Calendar integration — Meeting detection handles this
  • Team sharing — Personal tool; single user
  • Web dashboardcurl suffices for queries

Features Retained

  • Auto-detect meetings — Zoom, Google Meet, Teams via NSWorkspace + AppleScript
  • Record microphone — AVFoundation; system audio requires kernel extension
  • Upload on stop — Multipart form to Cloudflare Worker
  • Transcribe via Whisper — @cf/openai/whisper-large-v3-turbo
  • Store in D1 — Searchable via SQL

Implementation

macOS Menubar App

Swift + SwiftUI · ~400 LOC

  • • NSWorkspace for app monitoring
  • • AppleScript → System Events for call detection
  • • AVFoundation for microphone capture
  • • URLSession for multipart upload

Cloudflare Worker

TypeScript · ~200 LOC

  • • R2 for audio storage
  • • Workers AI (Whisper) for transcription
  • • D1 for transcript storage
  • • ctx.waitUntil() for background processing

Technical Learnings

Whisper requires Base64, not ArrayBuffer

@cf/openai/whisper-large-v3-turbo expects Base64-encoded audio. The standard Whisper model accepts integer arrays. This distinction caused "Type mismatch of '/audio'" errors until resolved. Documentation unclear.

Unsigned apps cannot run AppleScript

macOS blocks unsigned apps from querying System Events, even with Automation permission. Ad-hoc signing with com.apple.security.automation.apple-events entitlement resolves this without Developer ID.

ArrayBuffer detaches in async context

Audio data passed to ctx.waitUntil() becomes unusable. Solution: store in R2 first, re-fetch within the async handler.

Limitations & Failures

System audio capture not implemented

macOS Sonoma requires a kernel extension or audio driver (like BlackHole) for system audio capture. This experiment captures microphone only—participant voices are clear, but shared content audio is not captured. A production tool would need to integrate with an audio routing solution.

Auto-stop detection unreliable

Zoom's helper processes (CptHost, caphost) don't terminate predictably when a meeting ends. AppleScript window-count detection has false positives. Current implementation requires manual stop. Acceptable for personal use; would need improvement for broader adoption.

Transcription accuracy varies

Whisper performs well for clear speech but struggles with overlapping voices, technical jargon, and non-English content. No quantitative WER measurement performed in this experiment.

Reproducibility

Prerequisites

  • macOS 14+ (Sonoma) with Xcode 15+
  • Cloudflare account with Workers AI access
  • Microphone permission granted to app
  • Automation permission for System Events

Expected Challenges

  • Code signing: Without ad-hoc signing, AppleScript will fail silently
  • Whisper input format: Use Base64, not ArrayBuffer for turbo model
  • D1 FTS5: Full-text search virtual tables may cause database corruption

Starting Prompt

Build a macOS menubar app that:
1. Detects when Zoom/Meet/Teams starts a call
2. Records microphone audio
3. Uploads to a Cloudflare Worker on stop
4. Transcribes via Whisper and stores in D1

Use: Swift/SwiftUI, AVFoundation, NSWorkspace
Constraints: No UI during recording, <500 LOC total

Conclusion

The hypothesis is partially validated. A minimal transcription tool achieves Zuhandenheit for its core function—zero attention during meetings, automatic transcription after. The tool recedes.

However, the manual stop requirement and microphone-only capture represent failures to fully disappear. A tool that demands even one interaction per meeting has not completely achieved ready-to-hand status.

Less, but better. The discipline of removal produced a functional tool in ~600 LOC across two components. Whether "better" depends on use case: for personal meeting notes, this suffices. For team collaboration, the removed features become necessary.