The inference modality has no standalone typed client — it rides the Voice surface. You attach a speech-to-speech agent to a call with voice.agents.attach, configure turn detection and the model codec, and (optionally) tap the audio bridge. This page is the inference-relevant slice of the Voice SDK plus the turn-detection knobs.
A dedicated typed Inference client is Preview — see the bottom of this page. For shipping code today, use the Voice surface documented here.

Import

The agent path lives on the voice subpath:
import { Voice } from "@clutchcall/sdk/voice";

const voice = new Voice({ baseUrl: BASE_URL, apiKey: KEY, orgId: ORG });
Go, Rust, Java, and .NET bindings expose the same shapes (snake_case in Python, PascalCase methods in Go/.NET).

voice.agents.attach

Bind a speech-to-speech agent to a live (or just-originated) call. The runtime bridges the call’s uplink / downlink audio tracks to the model and owns turn-taking for the duration of the call.
await voice.agents.attach(callSid, agent);
callSid
string
required
The call sid to bind the agent to, from originate() or an inbound answer.
agent
string | AgentSpec
required
Either an agent id (string) registered in the control plane, or an inline agent spec selecting the speech-to-speech model and its turn-detection config.
Returns Promise<void> — resolves once the agent is bridged. Audio begins flowing immediately; the first agent utterance is gated by turn detection.

Attaching at originate time

originate accepts an agent directly, so the common case is one call:
const call = await voice.calls.originate({
  to:      "+15551234567",
  from:    "+15558675309",
  trunkId: "trunk_main",
  agent:   "support-s2s",     // attach the speech-to-speech agent inline
});

The agent spec

An inline AgentSpec selects the model leg and the turn-detection policy. The shape mirrors the agent config the control plane stores.
type AgentSpec = {
  // The model the runtime serves on the call's audio legs. Describe the model
  // as a speech-to-speech model in your registry; do not name a vendor here.
  mode: "speech_to_speech";
  modelCodec?: AudioCodec;        // what the model ingests/emits (default "pcm16")
  inputRateHz?: number;           // model input rate (default 16000)
  outputRateHz?: number;          // model output rate (default 24000)
  turnDetection?: TurnDetection;  // turn-taking + barge-in policy (below)
};

type AudioCodec = "opus" | "pcm16" | "g711_ulaw" | "g711_alaw";
ClutchCall resamples between the caller leg rate (commonly 8 kHz on PSTN) and the model’s inputRateHz / outputRateHz. You set the model’s rates; the caller leg is handled at the bridge.

Turn detection

The TurnDetection block is the heart of this modality — it owns the client-commit gate, backchannel suppression, and hold-and-confirm barge-in. Drop it into the agent spec, or set it server-side in the agent config.
type TurnDetection = {
  responseMinSpeechMs?: number;   // bursts shorter than this = backchannel, dropped
  commitMinIntervalMs?: number;   // cooldown between turn commits
  cancelMinIntervalMs?: number;   // cooldown between barge-ins
  bargeConfirmMs?: number;        // sustain speech this long before cancelling the agent
  ttsGuardMs?: number;            // raise the mic gate this long after agent audio
  silenceThresholdMs?: number;    // trailing silence before end-of-turn fires
  minSpeechMs?: number;           // discard candidate utterances shorter than this
  prefixPaddingMs?: number;       // audio kept before the speech-start marker
};
FieldDefaultWhat it doesWhen to change
responseMinSpeechMs600Speech shorter than this is a backchannel — never commits a turnLower (400) to let very short utterances answer; raise to ignore more continuers
commitMinIntervalMs1500Minimum gap between two turn commitsRaise if the agent over-answers rapid speech
cancelMinIntervalMs400Minimum gap between barge-insRaise on noisy trunks that false-trigger
bargeConfirmMs300Speech must be sustained this long over the agent before it is cancelled0 = cancel on first frame (browser/AEC legs); raise on noisy legs
ttsGuardMs200Mic gate is raised for this long after each agent chunkRaise on no-AEC SIP/PSTN legs where TTS bleeds into the mic
silenceThresholdMs500Trailing silence before end-of-turn firesLower (300) for snappier replies; higher (800) for thinkers
minSpeechMs300Discard candidate utterances shorter than thisRaise to 500 if line noise causes false triggers
prefixPaddingMs200Audio kept before the speech-start markerRaise if first syllables get clipped
silenceThresholdMs below ~300 fires end-of-turn on inter-word pauses — the agent will interrupt the caller mid-sentence. Keep it at or above 300.

Tapping the audio bridge (optional)

You usually don’t touch raw frames — the runtime bridges the model leg for you. When you do need to observe or inject audio (recording sidecar, custom DSP), attach the Voice AudioBridge to the same call.
const bridge = await voice.audioBridge.attach(call.sid, {
  codec: "pcm16",
  onUplink: (frame, tsUs) => recorder.feed(frame),   // caller audio (also fed to the model)
});
AudioBridge methods relevant here:
MethodSignatureNotes
publishDownlink(frame: Uint8Array, timestampUs?: bigint)Inject audio toward the caller (e.g. a pre-roll prompt before the model speaks).
publishUplink(frame: Uint8Array, timestampUs?: bigint)Inject audio toward the model (browser caller path).
close()Detach the tap. The agent stays attached.
See Voice — SDK Methods for the full AudioBridge surface.

Events

The agent leg surfaces turn-boundary and barge events through the call’s status stream. Subscribe with the Voice call handle:
call.onStatus((s) => {
  // dialing → ringing → in_progress → completed | failed | no_answer
  if (s === "in_progress") console.log("agent bridged, listening");
});
EventFires when
turn committedThe turn detector decided a real turn ended; the model is generating its reply.
barge-in confirmedSustained speech over the agent passed bargeConfirmMs; the in-flight reply is cancelled.
backchannel droppedA short continuer was suppressed; the agent kept the floor (diagnostic).

Preview: a dedicated typed Inference client

Preview. A standalone Inference client (a typed handle that wraps agent attach, the commit gate, and audio bridging behind one object) is in design. The shape below is forward-looking and may change — for shipping code, attach via voice.agents.attach as shown above.
// PREVIEW — not yet stable
import { Inference } from "@clutchcall/sdk/inference";

const session = await Inference.attach(call.sid, {
  model: "speech-to-speech",
  codec: "pcm16",
  turnDetection: { responseMinSpeechMs: 600, bargeConfirmMs: 300 },
  onTurn:    (t) => console.log("turn latency ms:", t.latencyMs),
  onBargeIn: ()  => console.log("caller interrupted"),
});
await session.close();