The shape

              ┌──────────────────────────────────────────────────┐
              │              Your application code               │
              │     (Python / TS / Go / Rust / Java / .NET /     │
              │                  Unity .NET runtime)             │
              └────────────────────────┬─────────────────────────┘
                                       │ idiomatic methods
              ┌────────────────────────▼─────────────────────────┐
              │            ClutchCall language SDK             │
              │  ┌────────┬────────┬──────────┬───────┬───────┐  │
              │  │ Voice  │Streams │ Robotics │ Games │ Data  │  │
              │  └────────┴────────┴──────────┴───────┴───────┘  │
              │              MoqtClient (substrate)              │
              └────────────────────────┬─────────────────────────┘
                                       │ C ABI: clutchcall_*
              ┌────────────────────────▼─────────────────────────┐
              │      C++23 core (one binary, all languages)      │
              │  - MoQT framer / parser                          │
              │  - subgroup-stream + datagram scheduler          │
              │  - capability routing + namespace auth           │
              │  - APM pipeline (AEC, AGC2, resampler)           │
              └────────────────────────┬─────────────────────────┘
                                       │ QUIC (UDP/443 + UDP/4433)
              ┌────────────────────────▼─────────────────────────┐
              │             ClutchCall relay mesh              │
              │   Shard-per-core · SO_REUSEPORT + eBPF dispatch  │
              │  Capability-based fan-out · namespace gating     │
              └────────────────────────┬─────────────────────────┘
                                       │ control plane
              ┌────────────────────────▼─────────────────────────┐
              │        ClutchCall engine (per modality)         │
              │  Voice telephony · streams ingest/transcode ·    │
              │  agent runtime · webhook events · analytics      │
              └──────────────────────────────────────────────────┘
The typed modalities sit on top of a single MoQT client, which sits on top of the C++ core, which speaks QUIC to the relay mesh. The relay handles fan-out and capability routing; the engine handles the control plane.

The substrate: MoQT over QUIC

Every modality reduces to MoQT tracks. A publisher names a track by (namespace, name), attaches a capability tag, and writes frames. The relay fans the track out to every subscriber that matched the namespace + capability. The publisher doesn’t know who’s subscribed; the subscriber doesn’t know who the publisher is.
Track kindCarriesWire model
AudioContinuous voice (Opus / PCM / G.711)subgroup stream
VideoEncoded video; group per keyframesubgroup stream
FrameOpaque binary with per-frame prioritysubgroup OR datagram (datagram=true)
TextReliable ordered messagessubgroup stream
Subgroup streams give in-order, reliable delivery per group; datagrams trade ordering for sub-millisecond floor. The modality picks the right lane per channel — see each modality’s concept page.

The relay mesh

The relay is the same binary as the engine, with the built-in relay role enabled. It runs shard-per-core: one shard per core, each shard binds the same UDP port with SO_REUSEPORT and an eBPF program dispatches incoming QUIC packets to the shard that owns the connection’s destination CID. Each shard:
  • accepts QUIC handshakes (ECDSA P-256)
  • maintains its share of MoQT sessions
  • forwards published frames to subscribers without leaving the shard
  • gates each publish/subscribe through the namespace auth hook (JWT verify with namespace-scoped claims)
The relay’s data path is zero-copy where the QUIC sequencer allows, GSO-batched on send, UDP_GRO on receive. POPs are addressable by DNS round-robin under relay.clutchcall.dev and by the per-edge POP code (relay-us.clutchcall.dev, relay-uk.clutchcall.dev, …).

Two ports, two stacks

The relay runs two QUIC stacks on two ports rather than multiplexing on one:
PortStackCarries
443QUIC · MoQTMoQT (audio / video / frame / text tracks)
4433QUIC · HTTP/3HTTP/3 + WebTransport (REST, MCP, signalling)
Clients find the right port via RFC 9460 HTTPS records. The split keeps each stack’s SO_REUSEPORT + eBPF DCID dispatch intact and avoids the ~300-500 LOC of userspace ALPN-sniffed routing single-port multiplexing would need.

Why one core

Two reasons. Identical wire envelopes. The relay only ever sees the same MoQT framing regardless of which SDK published. No per-language parser to keep in sync — only the C++ core, which every SDK imports via FFI / WASM. Audio without a copy. µ-law / A-law to 16-bit PCM conversion is done in SIMD inside the core. Browser (WASM) and Node / Python / Go / Rust / Java / .NET (native FFI) call into the same code path with the same latency profile.

Connection lifecycle

  1. Connect. SDK dials MoQT on relay.clutchcall.dev:443 over QUIC (or WebTransport on browser). Tenant token presented as the first MoQT envelope.
  2. Handshake. The relay’s namespace_auth hook verifies the token and stamps a namespace scope on the session.
  3. Publish / subscribe. SDK opens MoQT publish and subscribe requests on demand. The relay routes by namespace and capability.
  4. Auto-reconnect. If the link drops, the SDK reconnects with capped exponential backoff and re-establishes every publication and subscription transparently. Application code doesn’t see it.
The control plane (call originate, live-input create, room token mint, etc.) is HTTP/3 on port 4433 — control-plane (tRPC) procedures on the engine. Modality clients hold both a MoQT client (data plane) and an HTTPS client (control plane) under the hood.

Where things run

  • Browsers. TypeScript SDK speaks MoQT directly over native WebTransport (no FFI, no custom framing). The C++ core is compiled to WebAssembly via Emscripten for audio APM + framing fast paths.
  • Native runtimes. SDK loads clutchcall_core_ffi.so / .dylib / .dll via the language’s native loader: JNI (Java), P/Invoke (.NET), CGO (Go), libloading (Rust), ctypes (Python).
  • Unity. .NET runtime SDK plus a com.clutchcall.transport UPM package that exposes INetworkInterface over the games modality — drop-in for Unity Netcode for GameObjects / Entities. See Netcode (Unity).

Direct-media (voice)

For server-side AI calls (default_app=AI_BIDIRECTIONAL_STREAM), the voice path uses direct-media between the carrier and the agent runtime. The gateway negotiates SIP signalling with the carrier, then publishes the SDP answer pointing at the agent runtime’s RTP socket; RTP flows straight from the carrier to the runtime. The gateway is signalling-only on that path. For SIP/RTP-only calls (no AI bridge), the gateway still terminates RTP and runs a local VAD. So the same call_sid can take either RTP path depending on default_app.

Legacy RPC (still supported)

The original control plane was a method-id RPC envelope over QUIC:
+----------------+----------------+----------------------+
| u32 length LE  | u32 method_id  | serde envelope body  |
+----------------+----------------+----------------------+
ClutchCallClient (dial, originate_bulk, hangup, barge, push_audio, …) used this surface. It’s kept for backwards compat; new code should use the Voice modality. See Envelope Format for the full wire detail.

Code generation

Method IDs and DTO definitions in every language come from one IDL (api/clutchcall.json) compiled by apirpc_compiler.py. The modality clients’ wire formats are similarly generated from the same IDL, so a new modality method only needs an IDL edit + compiler run.