The voice modality lives in its own subpath. Import the Voice client and the optional browser helpers from it.
import { Voice } from "@clutchcall/sdk/voice";
// browser softphone helpers live on the moqt subpath:
import { captureMicrophone, OpusPlayer } from "@clutchcall/sdk/moqt";

Voice

The top-level client. It hands out three scoped helpers — calls, audioBridge, and agents — that carry your org id and the transport.
const v = new Voice({
  baseUrl: "https://engine.clutchcall.dev",   // control-plane origin
  apiKey:  process.env.CLUTCHCALL_CREDENTIALS!,
  orgId:   "org_abc",
});

v.calls;        // control plane
v.audioBridge;  // data plane
v.agents;       // AI-agent attach
baseUrl
string
required
Control-plane origin (no trailing slash). Calls go over HTTPS with Authorization: Bearer <apiKey>.
apiKey
string
required
API key with voice:* scopes.
orgId
string
required
Default org / tenant id applied to every call.
relayHost
string
Relay hostname for the audio bridge. Defaults to relay.clutchcall.dev.
fetch
function
Override the fetch implementation (TypeScript only).
webTransport
WebTransportFactory
Inject a custom WebTransport factory for the audio bridge (TypeScript, for Node tests).

Calls — control plane

Reachable as v.calls. Originates calls and fetches their current state.

calls.originate(args) → Promise<Call>

Place an outbound call over a SIP trunk. Returns a Call handle once the call is dialing.
const call = await v.calls.originate({
  to:      "+15551234567",
  from:    "+15558675309",
  trunkId: "trunk_main",
  agent:   "healthcare-assistant",  // optional: engine attaches the bridge on answer
  ringTimeoutSec: 30,               // optional: server-clamped 5..120
});
console.log(call.sid, call.status);
to
string
required
E.164 destination.
from
string
required
Caller-id, E.164.
trunkId
string
required
SIP trunk to route over.
agent
string
AI agent id to attach automatically on answer.
ringTimeoutSec
number
Seconds to ring before giving up. Server-clamped 5..120, default 30.

calls.get({ sid }) → Promise<Call>

Fetch the current state of a call by its sid. Use it to read status after originate().
const call = await v.calls.get({ sid });

Call — one call handle

Returned by originate() and get(). Read-only fields plus two mutating methods.
FieldTypeNotes
sidstringThe only identifier you need to address audio.
statusCallStatusdialingcompleted (see below).
tostringE.164 destination.
fromstringCaller-id (from_ in Python).
startedAtstringISO-8601 start time.
trunkIdstring?SIP trunk (trunk_id in Python).
agentstring?Attached AI agent id, if any.

call.transfer(args | string) → Promise<void>

Transfer to a PSTN number or re-attach a different agent. Provide exactly one of to / agent. Passing a bare string is shorthand for { to }.
await call.transfer({ to: "+15557654321" });   // forward to PSTN
await call.transfer("+15557654321");           // shorthand
await call.transfer({ agent: "billing-bot" }); // re-attach an agent

call.hangup() → Promise<void>

End the call and drop both audio tracks.
await call.hangup();

AudioBridge — data plane

Reachable as v.audioBridge (v.audio_bridge in Python). It opens the bidirectional audio bridge for one call and hands back a handle you hold for the call’s lifetime.

audioBridge.attach(callSid, opts) → Promise<AudioBridge>

Open the bridge as the server: subscribe the caller’s audio (uplink), publish audio back (downlink). Pass onUplink to receive caller frames.
const bridge = await v.audioBridge.attach(call.sid, {
  codec: "opus",          // default
  sampleRate: 48000,      // default
  channels: 1,            // default
  frameMs: 20,            // default
  onUplink: (frame, tsUs) => myAsr.feed(frame),
});

// push synthesized audio back to the caller:
myTts.onChunk((opus) => bridge.publishDownlink(opus));
Callback for inbound caller audio. One encoded frame per call.
codec
AudioCodec
Default opus.
sampleRate
number
Default 48000.
channels
number
Default 1.
frameMs
number
Frame duration in ms. Default 20.

audioBridge.attachCaller(callSid, opts) → Promise<AudioBridge>

The browser-caller mirror of attach(): subscribe downlink (cloud → caller), publish uplink (mic → cloud). Pass onDownlink to receive audio for playback. (TypeScript; this is the softphone path.)
const bridge = await v.audioBridge.attachCaller(call.sid, {
  codec: "opus",
  onDownlink: (frame, tsUs) => player.push(tsUs, frame),
});

AudioBridge methods

MethodDirectionNotes
publishDownlink(frame, timestampUs?)server → callerPush one encoded frame to the caller (e.g. 20 ms Opus).
publishUplink(frame, timestampUs?)caller → cloudBrowser-caller side; push one encoded mic frame.
onUplink(cb)Swap the inbound-caller consumer mid-call.
onDownlink(cb)Browser-caller side; swap the inbound-cloud consumer.
close()Tear down both tracks and the underlying MoQT session.
callSidThe sid this bridge belongs to.
timestampUs is optional; if omitted the SDK stamps a monotonic microsecond clock. In Python the bridge exposes publish_downlink(frame, timestamp_us=…) and close().

Agents — AI-agent attach

Reachable as v.agents. Bind a server-side speech-to-speech agent to a live call; the engine wires the audio bridge end-to-end, so you do not open an AudioBridge yourself.

agents.attach(callSid, agent) → Promise<void>

await v.agents.attach(call.sid, "healthcare-assistant");
You can attach an agent two ways: pass agent to originate() (attached on answer) or call agents.attach() against a live sid (e.g. to hand a call from a human to a bot, or swap one bot for another).

Browser softphone helpers

These ship on the moqt subpath and turn a microphone into encoded Opus and play encoded Opus back, so the softphone never touches a WebRTC transport for media.

captureMicrophone(publication, opts?) → Promise<MicCapture>

Capture the mic, run AEC / AGC / noise-suppression, Opus-encode, and write each encoded frame to a MoQT audio publication. The returned stop() tears down the capture graph (it does not close the publication). The simplest softphone loop forwards captured frames into the bridge with publishUplink:
const bridge = await v.audioBridge.attachCaller(call.sid, {
  codec: "opus",
  onDownlink: (frame, tsUs) => player.push(tsUs, frame),
});

// forward each captured mic frame onto the uplink track
const mic = await captureMicrophone(
  { write: (tsUs, frame) => bridge.publishUplink(frame, tsUs) } as any,
  { audioConstraints: { echoCancellation: true, autoGainControl: true, noiseSuppression: true } },
);
// later:
mic.stop();
captureMicrophone writes onto an AudioPublication. The snippet above adapts it to the bridge’s publishUplink; if you work with MoqtClient directly you pass the publication returned by client.publishAudio(...).

OpusPlayer

Decode received Opus with WebCodecs and render through an AudioWorklet ring buffer (silence-padded on underrun). Construct with an AudioContext, start() once, then push() each frame you receive.
import { OpusPlayer } from "@clutchcall/sdk/moqt";

const ctx = new AudioContext();
const player = new OpusPlayer(ctx, { sampleRate: 48000, channels: 1 });
await player.start();

// feed it the frames from the downlink subscription:
// onDownlink: (frame, tsUs) => player.push(tsUs, frame)

player.close();

Types and events

TypeValues
CallStatusdialing | ringing | in_progress | completed | failed | no_answer
AudioCodecopus | pcm16 | g711_ulaw | g711_alaw
The control-plane surface is request/response — there is no socket of call events. To track a call’s lifecycle, re-fetch with calls.get({ sid }). The data plane is event-driven: the onUplink / onDownlink callbacks fire per audio frame, and the underlying MoQT session auto-reconnects and replays the publish/subscribe on link loss.
Hold the AudioBridge handle (and, in Python, any subscription it owns) for the whole call. If it is garbage-collected, the engine calls into freed memory and both tracks go silent. Call close() explicitly when the call ends.

Other languages

The Go, Rust, Java, and C# SDKs expose the same publish/subscribe primitives via MoqtClient on the voice/<sid>/{uplink,downlink} tracks; the typed Calls/AudioBridge/Agents convenience wrappers are TypeScript-first with Python parity. See Realtime Tracks for the raw publishAudio / subscribeAudio surface in every language.