Inference — Cookbook

Copy-pasteable answers to “how do I X” for the inference modality. Everything rides the Voice surface — voice.agents.attach plus the turn-detection knobs. TypeScript primary; Python shown where it differs.

Attach a speech-to-speech agent to a call

The one-liner: bind a registered agent to a live call sid.

import { Voice } from "@clutchcall/sdk/voice";

const voice = new Voice({ baseUrl: BASE_URL, apiKey: KEY, orgId: ORG });
await voice.agents.attach(callSid, "support-s2s");

Originate and attach in one call

Pass agent to originate so the model is bridged the moment the callee answers.

const call = await voice.calls.originate({
  to:      "+15551234567",
  from:    "+15558675309",
  trunkId: "trunk_main",
  agent:   "support-s2s",
});

Attach with an inline agent spec

Skip the registry — select the model leg and turn-detection policy inline.

await voice.agents.attach(callSid, {
  mode:          "speech_to_speech",
  modelCodec:    "pcm16",
  inputRateHz:   16000,
  outputRateHz:  24000,
  turnDetection: { responseMinSpeechMs: 600, bargeConfirmMs: 300 },
});

Pick the codec the model wants

Most speech-to-speech models ingest raw PCM16. Set the model codec; the caller leg is transcoded at the bridge.

await voice.agents.attach(callSid, {
  mode:       "speech_to_speech",
  modelCodec: "pcm16",   // "opus" for a browser caller leg
});

A PSTN caller leg may be g711_ulaw / g711_alaw; a browser leg is opus. You never set the caller codec here — only what the model expects.

Tighten end-of-turn for snappier replies

Lower the trailing-silence threshold so the agent answers sooner after the caller stops.

await voice.agents.attach(callSid, {
  mode: "speech_to_speech",
  turnDetection: { silenceThresholdMs: 300 },   // default 500
});

Below ~300 ms, end-of-turn fires on inter-word pauses and the agent interrupts mid-sentence. Don’t go lower.

Stop backchannels from triggering a reply

Continuers like “mhm” / “ok” / “right” shouldn’t make the model answer. Raise the minimum speech length that counts as a turn.

await voice.agents.attach(callSid, {
  mode: "speech_to_speech",
  turnDetection: { responseMinSpeechMs: 700 },   // default 600 — anything shorter is dropped
});

Make barge-in forgiving (hold-and-confirm)

Require speech to be sustained before cancelling the agent, so a muttered continuer mid-reply doesn’t kill the turn.

await voice.agents.attach(callSid, {
  mode: "speech_to_speech",
  turnDetection: { bargeConfirmMs: 400 },   // sustain 400ms before cancelling
});

Make barge-in instant (browser / AEC legs)

When the caller leg has client-side echo cancellation, cut the agent on the first speech frame.

await voice.agents.attach(callSid, {
  mode: "speech_to_speech",
  turnDetection: { bargeConfirmMs: 0 },   // 0 = fire immediately on onset
});

Stop the agent self-triggering on a no-AEC SIP/PSTN leg

When there’s no echo cancellation, the agent’s own audio bleeds into the mic. Raise the post-audio mic guard so it doesn’t read its own voice as barge-in.

await voice.agents.attach(callSid, {
  mode: "speech_to_speech",
  turnDetection: { ttsGuardMs: 400 },   // default 200 — longer guard for noisy legs
});

Inject a pre-roll prompt before the model speaks

Attach the audio bridge and push a downlink clip before the first model turn.

const bridge = await voice.audioBridge.attach(call.sid, { codec: "pcm16" });
bridge.publishDownlink(prerollPcm);   // "connecting you now…"
await voice.agents.attach(call.sid, { mode: "speech_to_speech" });

Record the caller leg while the agent runs

Tap the uplink without disturbing the agent bridge.

TypeScript
Python

const bridge = await voice.audioBridge.attach(call.sid, {
  codec: "pcm16",
  onUplink: (frame) => recorder.feed(frame),
});

bridge = voice.audio_bridge.attach(
    call.sid, codec="pcm16",
    on_uplink=lambda frame, ts_us: recorder.feed(frame),
)

Watch turn boundaries and barge events

Subscribe to call status to see when the agent commits a turn or gets interrupted.

call.onStatus((s) => {
  if (s === "in_progress") console.log("agent bridged, listening");
  if (s === "completed")   console.log("call done");
});

Detach the agent but keep the call up

Hand the call back to a human (or another app) by attaching a bare bridge and not re-attaching the agent.

await voice.audioBridge.attach(call.sid, { codec: "pcm16", onUplink: humanLeg.feed });
// agent no longer drives the conversation; you own the audio now

Attach a speech-to-speech agent in Python

The same flow, snake_case.

from clutchcall.voice import Voice

voice = Voice(base_url=BASE_URL, api_key=KEY, org_id=ORG)
call  = voice.calls.originate(
    to="+15551234567", from_="+15558675309",
    trunk_id="trunk_main", agent="support-s2s",
)

Inference — Recipes — full worked examples
Inference — SDK Methods — every method and the full knob table
Turn Detection & Barge-In — the policy these knobs drive

​Attach a speech-to-speech agent to a call

​Originate and attach in one call

​Attach with an inline agent spec

​Pick the codec the model wants

​Tighten end-of-turn for snappier replies

​Stop backchannels from triggering a reply

​Make barge-in forgiving (hold-and-confirm)

​Make barge-in instant (browser / AEC legs)

​Stop the agent self-triggering on a no-AEC SIP/PSTN leg

​Inject a pre-roll prompt before the model speaks

​Record the caller leg while the agent runs

​Watch turn boundaries and barge events

​Detach the agent but keep the call up

​Attach a speech-to-speech agent in Python

​Related

Attach a speech-to-speech agent to a call

Originate and attach in one call

Attach with an inline agent spec

Pick the codec the model wants

Tighten end-of-turn for snappier replies

Stop backchannels from triggering a reply

Make barge-in forgiving (hold-and-confirm)

Make barge-in instant (browser / AEC legs)

Stop the agent self-triggering on a no-AEC SIP/PSTN leg

Inject a pre-roll prompt before the model speaks

Record the caller leg while the agent runs

Watch turn boundaries and barge events

Detach the agent but keep the call up

Attach a speech-to-speech agent in Python

Related