Voice client and the
optional browser helpers from it.
- TypeScript
- Python
Voice
The top-level client. It hands out three scoped helpers — calls,
audioBridge, and agents — that carry your org id and the transport.
- TypeScript
- Python
Control-plane origin (no trailing slash). Calls go over HTTPS with
Authorization: Bearer <apiKey>.API key with
voice:* scopes.Default org / tenant id applied to every call.
Relay hostname for the audio bridge. Defaults to
relay.clutchcall.dev.Override the
fetch implementation (TypeScript only).Inject a custom WebTransport factory for the audio bridge (TypeScript, for
Node tests).
Calls — control plane
Reachable as v.calls. Originates calls and fetches their current state.
calls.originate(args) → Promise<Call>
Place an outbound call over a SIP trunk. Returns a Call handle once the call
is dialing.
- TypeScript
- Python
E.164 destination.
Caller-id, E.164.
SIP trunk to route over.
AI agent id to attach automatically on answer.
Seconds to ring before giving up. Server-clamped 5..120, default 30.
calls.get({ sid }) → Promise<Call>
Fetch the current state of a call by its sid. Use it to read status after
originate().
- TypeScript
- Python
Call — one call handle
Returned by originate() and get(). Read-only fields plus two mutating
methods.
| Field | Type | Notes |
|---|---|---|
sid | string | The only identifier you need to address audio. |
status | CallStatus | dialing … completed (see below). |
to | string | E.164 destination. |
from | string | Caller-id (from_ in Python). |
startedAt | string | ISO-8601 start time. |
trunkId | string? | SIP trunk (trunk_id in Python). |
agent | string? | Attached AI agent id, if any. |
call.transfer(args | string) → Promise<void>
Transfer to a PSTN number or re-attach a different agent. Provide exactly
one of to / agent. Passing a bare string is shorthand for { to }.
- TypeScript
- Python
call.hangup() → Promise<void>
End the call and drop both audio tracks.
- TypeScript
- Python
AudioBridge — data plane
Reachable as v.audioBridge (v.audio_bridge in Python). It opens the
bidirectional audio bridge for one call and hands back a handle you hold for the
call’s lifetime.
audioBridge.attach(callSid, opts) → Promise<AudioBridge>
Open the bridge as the server: subscribe the caller’s audio (uplink),
publish audio back (downlink). Pass onUplink to receive caller frames.
- TypeScript
- Python
Callback for inbound caller audio. One encoded frame per call.
Default
opus.Default 48000.
Default 1.
Frame duration in ms. Default 20.
audioBridge.attachCaller(callSid, opts) → Promise<AudioBridge>
The browser-caller mirror of attach(): subscribe downlink (cloud → caller),
publish uplink (mic → cloud). Pass onDownlink to receive audio for playback.
(TypeScript; this is the softphone path.)
AudioBridge methods
| Method | Direction | Notes |
|---|---|---|
publishDownlink(frame, timestampUs?) | server → caller | Push one encoded frame to the caller (e.g. 20 ms Opus). |
publishUplink(frame, timestampUs?) | caller → cloud | Browser-caller side; push one encoded mic frame. |
onUplink(cb) | — | Swap the inbound-caller consumer mid-call. |
onDownlink(cb) | — | Browser-caller side; swap the inbound-cloud consumer. |
close() | — | Tear down both tracks and the underlying MoQT session. |
callSid | — | The sid this bridge belongs to. |
timestampUs is optional; if omitted the SDK stamps a monotonic microsecond
clock. In Python the bridge exposes publish_downlink(frame, timestamp_us=…)
and close().
Agents — AI-agent attach
Reachable as v.agents. Bind a server-side speech-to-speech agent to a live
call; the engine wires the audio bridge end-to-end, so you do not open an
AudioBridge yourself.
agents.attach(callSid, agent) → Promise<void>
- TypeScript
- Python
Browser softphone helpers
These ship on themoqt subpath and turn a microphone into encoded Opus and
play encoded Opus back, so the softphone never touches a WebRTC transport for
media.
captureMicrophone(publication, opts?) → Promise<MicCapture>
Capture the mic, run AEC / AGC / noise-suppression, Opus-encode, and write each
encoded frame to a MoQT audio publication. The returned stop() tears down
the capture graph (it does not close the publication). The simplest softphone
loop forwards captured frames into the bridge with publishUplink:
captureMicrophone writes onto an AudioPublication. The snippet above adapts
it to the bridge’s publishUplink; if you work with MoqtClient directly you
pass the publication returned by client.publishAudio(...).OpusPlayer
Decode received Opus with WebCodecs and render through an AudioWorklet ring
buffer (silence-padded on underrun). Construct with an AudioContext, start()
once, then push() each frame you receive.
Types and events
| Type | Values |
|---|---|
CallStatus | dialing | ringing | in_progress | completed | failed | no_answer |
AudioCodec | opus | pcm16 | g711_ulaw | g711_alaw |
calls.get({ sid }). The
data plane is event-driven: the onUplink / onDownlink callbacks fire per
audio frame, and the underlying MoQT session auto-reconnects and replays the
publish/subscribe on link loss.
Other languages
The Go, Rust, Java, and C# SDKs expose the same publish/subscribe primitives viaMoqtClient on the voice/<sid>/{uplink,downlink} tracks; the typed
Calls/AudioBridge/Agents convenience wrappers are TypeScript-first with
Python parity. See Realtime Tracks for the raw
publishAudio / subscribeAudio surface in every language.
