Voice — Recipes - ClutchCall

Longer, end-to-end examples that combine several voice methods into realistic mini-apps. Each assumes a configured Voice client (see SDK methods).

1. Outbound AI voice agent

Place an outbound call, attach a server-side speech-to-speech agent, wait for it to connect, and let the engine drive the whole conversation. No audio bridge in your process — the agent owns both legs.

Originate with an agent

Pass agent to originate(); the engine attaches the bridge on answer.

Wait for connect or failure

Poll calls.get() until the call leaves the ringing states.

Hang up when done

The agent runs the call; you decide when to end it.

TypeScript
Python

import { Voice } from "@clutchcall/sdk/voice";

const v = new Voice({
  baseUrl: "https://engine.clutchcall.dev",
  apiKey:  process.env.CLUTCHCALL_CREDENTIALS!,
  orgId:   "org_abc",
});

async function callWithAgent(to: string) {
  const call = await v.calls.originate({
    to,
    from: "+15558675309",
    trunkId: "trunk_main",
    agent: "appointment-reminder",
    ringTimeoutSec: 25,
  });

  // wait for connect / failure
  for (;;) {
    const c = await v.calls.get({ sid: call.sid });
    if (c.status === "in_progress") break;
    if (c.status === "failed" || c.status === "no_answer") {
      console.log("call ended early:", c.status);
      return c.status;
    }
    await new Promise((r) => setTimeout(r, 500));
  }

  console.log("agent is on the line for", call.sid);
  // … the agent drives the conversation; end it whenever you like:
  // await (await v.calls.get({ sid: call.sid })).hangup?.();
  return "in_progress";
}

await callWithAgent("+15551234567");

import os, time
from clutchcall.voice import Voice

v = Voice(
    base_url="https://engine.clutchcall.dev",
    api_key=os.environ["CLUTCHCALL_CREDENTIALS"],
    org_id="org_abc",
)

def call_with_agent(to: str) -> str:
    call = v.calls.originate(
        to=to, from_="+15558675309", trunk_id="trunk_main",
        agent="appointment-reminder", ring_timeout_sec=25,
    )
    while True:
        c = v.calls.get(sid=call.sid)
        if c.status == "in_progress":
            break
        if c.status in ("failed", "no_answer"):
            print("call ended early:", c.status)
            return c.status
        time.sleep(0.5)

    print("agent is on the line for", call.sid)
    return "in_progress"

call_with_agent("+15551234567")

2. Bring-your-own ASR + TTS bridge

When you want full control of the brain, skip agents.attach and run the audio bridge yourself. Caller audio (uplink) flows into your ASR; your TTS output is pushed back as downlink. This is the pattern for custom dialog logic, IVR, or a model the platform doesn’t host.

import { Voice } from "@clutchcall/sdk/voice";

const v = new Voice({
  baseUrl: "https://engine.clutchcall.dev",
  apiKey:  process.env.CLUTCHCALL_CREDENTIALS!,
  orgId:   "org_abc",
});

async function customAgent(to: string) {
  const call = await v.calls.originate({ to, from: "+15558675309", trunkId: "trunk_main" });

  // pcm16 so the model gets raw audio; the bridge transcodes the PSTN leg.
  const bridge = await v.audioBridge.attach(call.sid, {
    codec: "pcm16",
    sampleRate: 16000,
    onUplink: (pcm, tsUs) => myModel.appendAudio(pcm),
  });

  // your model emits audio chunks → push them straight to the caller
  myModel.onAudioOut((pcm) => bridge.publishDownlink(pcm));

  // your model decides the call is over
  myModel.onDone(async () => {
    await bridge.close();
    await call.hangup();
  });

  return call.sid;
}

Keep the bridge reference alive for the call’s lifetime. Letting it be garbage-collected drops both tracks. Always bridge.close() before call.hangup() so the tracks tear down cleanly.

3. Browser softphone (no WebRTC transport)

A full in-browser caller: capture the mic as encoded Opus, publish it as uplink, and play decoded downlink Opus through WebCodecs. Media rides MoQT over QUIC — there is no WebRTC transport and no SFU in this path.

Originate the call

Get a sid from the control plane.

Start playback

Create and start() an OpusPlayer.

Attach as the caller

attachCaller() subscribes downlink, publishes uplink.

Capture the mic

captureMicrophone() forwards encoded frames onto uplink.

import { Voice } from "@clutchcall/sdk/voice";
import { captureMicrophone, OpusPlayer } from "@clutchcall/sdk/moqt";

const v = new Voice({
  baseUrl: "https://engine.clutchcall.dev",
  apiKey:  BROWSER_SCOPED_KEY,
  orgId:   "org_abc",
});

async function softphone(to: string) {
  const call = await v.calls.originate({ to, from: "+15558675309", trunkId: "trunk_main" });

  const ctx = new AudioContext();
  const player = new OpusPlayer(ctx, { sampleRate: 48000, channels: 1 });
  await player.start();

  const bridge = await v.audioBridge.attachCaller(call.sid, {
    codec: "opus",
    onDownlink: (frame, tsUs) => player.push(tsUs, frame),
  });

  const mic = await captureMicrophone(
    { write: (tsUs, frame) => bridge.publishUplink(frame, tsUs) } as any,
    { audioConstraints: { echoCancellation: true, autoGainControl: true, noiseSuppression: true } },
  );

  async function hangUp() {
    mic.stop();
    player.close();
    await bridge.close();
    await call.hangup();
  }

  return { sid: call.sid, hangUp };
}

captureMicrophone runs the browser’s AEC / AGC / noise-suppression and diverts the encoded Opus frames onto the uplink track — raw PCM never crosses the wire.

4. Live bot-to-human handoff

Start a call with an AI agent, then escalate to a human by transferring the live audio — the sid is preserved across the handoff, so any recording or analytics keyed on it stay intact.

import { Voice } from "@clutchcall/sdk/voice";

const v = new Voice({
  baseUrl: "https://engine.clutchcall.dev",
  apiKey:  process.env.CLUTCHCALL_CREDENTIALS!,
  orgId:   "org_abc",
});

async function handoffFlow(to: string) {
  // 1. start with a triage bot
  const call = await v.calls.originate({
    to, from: "+15558675309", trunkId: "trunk_main",
    agent: "triage-bot",
  });

  // 2. the bot decides it needs a human → re-attach to a different agent…
  await call.transfer({ agent: "human-queue" });

  // 3. …or forward the live audio to an on-call agent's phone
  await call.transfer({ to: "+15557654321" });

  // 4. wrap up
  await call.hangup();
}

When you transfer to a PSTN number the bridge performs a SIP REFER to hand the audio off; when you transfer to an agent the engine re-attaches the bridge to the new agent in place. Either way the original sid keeps addressing the call.

SDK methods

Full signatures for everything used above.

Cookbook

Single-task snippets to lift into your own code.

Turn detection

VAD and barge-in tuning for AI voice agents.

Details

Wire model, codecs, and the architecture.

​1. Outbound AI voice agent

​2. Bring-your-own ASR + TTS bridge

​3. Browser softphone (no WebRTC transport)

​4. Live bot-to-human handoff

​See also

SDK methods

Cookbook

Turn detection

Details

1. Outbound AI voice agent

2. Bring-your-own ASR + TTS bridge

3. Browser softphone (no WebRTC transport)

4. Live bot-to-human handoff

See also