Place an outbound call, attach a server-side speech-to-speech agent, wait for it
to connect, and let the engine drive the whole conversation. No audio bridge in
your process — the agent owns both legs.
1
Originate with an agent
Pass agent to originate(); the engine attaches the bridge on answer.
2
Wait for connect or failure
Poll calls.get() until the call leaves the ringing states.
3
Hang up when done
The agent runs the call; you decide when to end it.
TypeScript
Python
import { Voice } from "@clutchcall/sdk/voice";const v = new Voice({ baseUrl: "https://engine.clutchcall.dev", apiKey: process.env.CLUTCHCALL_CREDENTIALS!, orgId: "org_abc",});async function callWithAgent(to: string) { const call = await v.calls.originate({ to, from: "+15558675309", trunkId: "trunk_main", agent: "appointment-reminder", ringTimeoutSec: 25, }); // wait for connect / failure for (;;) { const c = await v.calls.get({ sid: call.sid }); if (c.status === "in_progress") break; if (c.status === "failed" || c.status === "no_answer") { console.log("call ended early:", c.status); return c.status; } await new Promise((r) => setTimeout(r, 500)); } console.log("agent is on the line for", call.sid); // … the agent drives the conversation; end it whenever you like: // await (await v.calls.get({ sid: call.sid })).hangup?.(); return "in_progress";}await callWithAgent("+15551234567");
import os, timefrom clutchcall.voice import Voicev = Voice( base_url="https://engine.clutchcall.dev", api_key=os.environ["CLUTCHCALL_CREDENTIALS"], org_id="org_abc",)def call_with_agent(to: str) -> str: call = v.calls.originate( to=to, from_="+15558675309", trunk_id="trunk_main", agent="appointment-reminder", ring_timeout_sec=25, ) while True: c = v.calls.get(sid=call.sid) if c.status == "in_progress": break if c.status in ("failed", "no_answer"): print("call ended early:", c.status) return c.status time.sleep(0.5) print("agent is on the line for", call.sid) return "in_progress"call_with_agent("+15551234567")
When you want full control of the brain, skip agents.attach and run the audio
bridge yourself. Caller audio (uplink) flows into your ASR; your TTS output is
pushed back as downlink. This is the pattern for custom dialog logic, IVR, or a
model the platform doesn’t host.
import { Voice } from "@clutchcall/sdk/voice";const v = new Voice({ baseUrl: "https://engine.clutchcall.dev", apiKey: process.env.CLUTCHCALL_CREDENTIALS!, orgId: "org_abc",});async function customAgent(to: string) { const call = await v.calls.originate({ to, from: "+15558675309", trunkId: "trunk_main" }); // pcm16 so the model gets raw audio; the bridge transcodes the PSTN leg. const bridge = await v.audioBridge.attach(call.sid, { codec: "pcm16", sampleRate: 16000, onUplink: (pcm, tsUs) => myModel.appendAudio(pcm), }); // your model emits audio chunks → push them straight to the caller myModel.onAudioOut((pcm) => bridge.publishDownlink(pcm)); // your model decides the call is over myModel.onDone(async () => { await bridge.close(); await call.hangup(); }); return call.sid;}
Keep the bridge reference alive for the call’s lifetime. Letting it be
garbage-collected drops both tracks. Always bridge.close() before
call.hangup() so the tracks tear down cleanly.
A full in-browser caller: capture the mic as encoded Opus, publish it as uplink,
and play decoded downlink Opus through WebCodecs. Media rides MoQT over QUIC —
there is no WebRTC transport and no SFU in this path.
import { Voice } from "@clutchcall/sdk/voice";import { captureMicrophone, OpusPlayer } from "@clutchcall/sdk/moqt";const v = new Voice({ baseUrl: "https://engine.clutchcall.dev", apiKey: BROWSER_SCOPED_KEY, orgId: "org_abc",});async function softphone(to: string) { const call = await v.calls.originate({ to, from: "+15558675309", trunkId: "trunk_main" }); const ctx = new AudioContext(); const player = new OpusPlayer(ctx, { sampleRate: 48000, channels: 1 }); await player.start(); const bridge = await v.audioBridge.attachCaller(call.sid, { codec: "opus", onDownlink: (frame, tsUs) => player.push(tsUs, frame), }); const mic = await captureMicrophone( { write: (tsUs, frame) => bridge.publishUplink(frame, tsUs) } as any, { audioConstraints: { echoCancellation: true, autoGainControl: true, noiseSuppression: true } }, ); async function hangUp() { mic.stop(); player.close(); await bridge.close(); await call.hangup(); } return { sid: call.sid, hangUp };}
captureMicrophone runs the browser’s AEC / AGC / noise-suppression and diverts
the encoded Opus frames onto the uplink track — raw PCM never crosses the
wire.
Start a call with an AI agent, then escalate to a human by transferring the live
audio — the sid is preserved across the handoff, so any recording or analytics
keyed on it stay intact.
import { Voice } from "@clutchcall/sdk/voice";const v = new Voice({ baseUrl: "https://engine.clutchcall.dev", apiKey: process.env.CLUTCHCALL_CREDENTIALS!, orgId: "org_abc",});async function handoffFlow(to: string) { // 1. start with a triage bot const call = await v.calls.originate({ to, from: "+15558675309", trunkId: "trunk_main", agent: "triage-bot", }); // 2. the bot decides it needs a human → re-attach to a different agent… await call.transfer({ agent: "human-queue" }); // 3. …or forward the live audio to an on-call agent's phone await call.transfer({ to: "+15557654321" }); // 4. wrap up await call.hangup();}
When you transfer to a PSTN number the bridge performs a SIP REFER to hand the
audio off; when you transfer to an agent the engine re-attaches the bridge to
the new agent in place. Either way the original sid keeps addressing the call.