# Architecture

> The MoQT relay mesh, the C++ core, the modality layer, and where each piece runs.

## The shape

```
              ┌──────────────────────────────────────────────────┐
              │              Your application code               │
              │     (Python / TS / Go / Rust / Java / .NET /     │
              │                  Unity .NET runtime)             │
              └────────────────────────┬─────────────────────────┘
                                       │ idiomatic methods
              ┌────────────────────────▼─────────────────────────┐
              │            ClutchCall language SDK             │
              │  ┌────────┬────────┬──────────┬───────┬───────┐  │
              │  │ Voice  │Streams │ Robotics │ Games │ Data  │  │
              │  └────────┴────────┴──────────┴───────┴───────┘  │
              │              MoqtClient (substrate)              │
              └────────────────────────┬─────────────────────────┘
                                       │ C ABI: clutchcall_*
              ┌────────────────────────▼─────────────────────────┐
              │      C++23 core (one binary, all languages)      │
              │  - MoQT framer / parser                          │
              │  - subgroup-stream + datagram scheduler          │
              │  - capability routing + namespace auth           │
              │  - APM pipeline (AEC, AGC2, resampler)           │
              └────────────────────────┬─────────────────────────┘
                                       │ QUIC (UDP/443 + UDP/4433)
              ┌────────────────────────▼─────────────────────────┐
              │             ClutchCall relay mesh              │
              │   Shard-per-core · SO_REUSEPORT + eBPF dispatch  │
              │  Capability-based fan-out · namespace gating     │
              └────────────────────────┬─────────────────────────┘
                                       │ control plane
              ┌────────────────────────▼─────────────────────────┐
              │        ClutchCall engine (per modality)         │
              │  Voice telephony · streams ingest/transcode ·    │
              │  agent runtime · webhook events · analytics      │
              └──────────────────────────────────────────────────┘
```

The typed modalities sit on top of a single MoQT client, which sits on top
of the C++ core, which speaks QUIC to the relay mesh. The relay handles
fan-out and capability routing; the engine handles the control plane.

## The substrate: MoQT over QUIC

Every modality reduces to **MoQT tracks**. A publisher names a track by
*(namespace, name)*, attaches a capability tag, and writes frames. The relay
fans the track out to every subscriber that matched the namespace + capability.
The publisher doesn't know who's subscribed; the subscriber doesn't know who
the publisher is.

| Track kind     | Carries                                                  | Wire model         |
| -------------- | -------------------------------------------------------- | ------------------ |
| **Audio**      | Continuous voice (Opus / PCM / G.711)                    | subgroup stream    |
| **Video**      | Encoded video; group per keyframe                        | subgroup stream    |
| **Frame**      | Opaque binary with per-frame priority                    | subgroup OR datagram (`datagram=true`) |
| **Text**       | Reliable ordered messages                                | subgroup stream    |

Subgroup streams give in-order, reliable delivery per group; datagrams
trade ordering for sub-millisecond floor. The modality picks the right
lane per channel — see each modality's concept page.

## The relay mesh

The relay is the same binary as the engine, with the
built-in relay role enabled. It runs shard-per-core: one
shard per core, each shard binds the same UDP port with `SO_REUSEPORT`
and an eBPF program dispatches incoming QUIC packets to the shard that
owns the connection's destination CID. Each shard:

- accepts QUIC handshakes (ECDSA P-256)
- maintains its share of MoQT sessions
- forwards published frames to subscribers without leaving the shard
- gates each publish/subscribe through the **namespace auth** hook
  (JWT verify with namespace-scoped claims)

The relay's data path is **zero-copy where the QUIC sequencer allows**,
GSO-batched on send, UDP_GRO on receive.

POPs are addressable by DNS round-robin under `relay.clutchcall.dev` and
by the per-edge POP code (`relay-us.clutchcall.dev`,
`relay-uk.clutchcall.dev`, …).

## Two ports, two stacks

The relay runs **two QUIC stacks on two ports** rather than multiplexing on
one:

| Port | Stack            | Carries                                |
| ---- | ---------------- | -------------------------------------- |
| 443  | QUIC · MoQT      | MoQT (audio / video / frame / text tracks)     |
| 4433 | QUIC · HTTP/3    | HTTP/3 + WebTransport (REST, MCP, signalling) |

Clients find the right port via RFC 9460 HTTPS records. The split keeps
each stack's SO_REUSEPORT + eBPF DCID dispatch intact and avoids the
~300-500 LOC of userspace ALPN-sniffed routing single-port multiplexing
would need.

## Why one core

Two reasons.

**Identical wire envelopes.** The relay only ever sees the same MoQT
framing regardless of which SDK published. No per-language parser to
keep in sync — only the C++ core, which every SDK imports via FFI / WASM.

**Audio without a copy.** µ-law / A-law to 16-bit PCM conversion is done
in SIMD inside the core. Browser (WASM) and Node / Python / Go / Rust /
Java / .NET (native FFI) call into the same code path with the same
latency profile.

## Connection lifecycle

1. **Connect.** SDK dials MoQT on `relay.clutchcall.dev:443` over QUIC
   (or WebTransport on browser). Tenant token presented as the first
   MoQT envelope.
2. **Handshake.** The relay's namespace_auth hook verifies the token
   and stamps a namespace scope on the session.
3. **Publish / subscribe.** SDK opens MoQT publish and subscribe
   requests on demand. The relay routes by namespace and capability.
4. **Auto-reconnect.** If the link drops, the SDK reconnects with
   capped exponential backoff and **re-establishes every publication
   and subscription** transparently. Application code doesn't see it.

The control plane (call originate, live-input create, room token mint,
etc.) is **HTTP/3 on port 4433** — control-plane (tRPC) procedures
on the engine. Modality clients hold both a MoQT client (data plane) and an
HTTPS client (control plane) under the hood.

## Where things run

- **Browsers.** TypeScript SDK speaks MoQT directly over native
  WebTransport (no FFI, no custom framing). The C++ core is compiled to
  WebAssembly via Emscripten for audio APM + framing fast paths.
- **Native runtimes.** SDK loads `clutchcall_core_ffi.so` / `.dylib` / `.dll`
  via the language's native loader: JNI (Java), P/Invoke (.NET), CGO
  (Go), `libloading` (Rust), `ctypes` (Python).
- **Unity.** .NET runtime SDK plus a `com.clutchcall.transport` UPM
  package that exposes `INetworkInterface` over the games modality —
  drop-in for Unity Netcode for GameObjects / Entities. See
  [Netcode (Unity)](/modalities/netcode/details).

## Direct-media (voice)

For server-side AI calls (`default_app=AI_BIDIRECTIONAL_STREAM`), the
voice path uses **direct-media** between the carrier and the agent
runtime. The gateway negotiates SIP signalling with the carrier, then
publishes the SDP answer pointing at the agent runtime's RTP socket;
RTP flows straight from the carrier to the runtime. The gateway is
signalling-only on that path.

For SIP/RTP-only calls (no AI bridge), the gateway still terminates RTP
and runs a local VAD. So the same `call_sid` can take either RTP
path depending on `default_app`.

## Legacy RPC (still supported)

The original control plane was a method-id RPC envelope over QUIC:

```
+----------------+----------------+----------------------+
| u32 length LE  | u32 method_id  | serde envelope body  |
+----------------+----------------+----------------------+
```

`ClutchCallClient` (`dial`, `originate_bulk`, `hangup`, `barge`,
`push_audio`, …) used this surface. It's kept for backwards compat;
new code should use the [Voice modality](/modalities/voice/details). See
[Envelope Format](/rpc/envelope-format) for the full wire detail.

## Code generation

Method IDs and DTO definitions in every language come from one IDL
(`api/clutchcall.json`) compiled by `apirpc_compiler.py`. The
modality clients' wire formats are similarly generated from the same
IDL, so a new modality method only needs an IDL edit + compiler run.
