voice.agents.attach, configure turn detection and the
model codec, and (optionally) tap the audio bridge. This page is the
inference-relevant slice of the Voice SDK plus the turn-detection knobs.
A dedicated typed
Inference client is Preview — see the bottom of this
page. For shipping code today, use the Voice surface documented here.Import
The agent path lives on thevoice subpath:
- TypeScript
- Python
voice.agents.attach
Bind a speech-to-speech agent to a live (or just-originated) call. The runtime
bridges the call’s uplink / downlink audio tracks to the model and owns
turn-taking for the duration of the call.
- TypeScript
- Python
The call sid to bind the agent to, from
originate() or an inbound answer.Either an agent id (string) registered in the control plane, or an inline
agent spec selecting the speech-to-speech model and its turn-detection config.
Promise<void> — resolves once the agent is bridged. Audio begins
flowing immediately; the first agent utterance is gated by turn detection.
Attaching at originate time
originate accepts an agent directly, so the common case is one call:
- TypeScript
- Python
The agent spec
An inlineAgentSpec selects the model leg and the turn-detection policy. The
shape mirrors the agent config the control plane stores.
ClutchCall resamples between the caller leg rate (commonly 8 kHz on PSTN)
and the model’s inputRateHz / outputRateHz. You set the model’s rates; the
caller leg is handled at the bridge.Turn detection
TheTurnDetection block is the heart of this modality — it owns the
client-commit gate, backchannel suppression, and hold-and-confirm barge-in. Drop
it into the agent spec, or set it server-side in the agent config.
| Field | Default | What it does | When to change |
|---|---|---|---|
responseMinSpeechMs | 600 | Speech shorter than this is a backchannel — never commits a turn | Lower (400) to let very short utterances answer; raise to ignore more continuers |
commitMinIntervalMs | 1500 | Minimum gap between two turn commits | Raise if the agent over-answers rapid speech |
cancelMinIntervalMs | 400 | Minimum gap between barge-ins | Raise on noisy trunks that false-trigger |
bargeConfirmMs | 300 | Speech must be sustained this long over the agent before it is cancelled | 0 = cancel on first frame (browser/AEC legs); raise on noisy legs |
ttsGuardMs | 200 | Mic gate is raised for this long after each agent chunk | Raise on no-AEC SIP/PSTN legs where TTS bleeds into the mic |
silenceThresholdMs | 500 | Trailing silence before end-of-turn fires | Lower (300) for snappier replies; higher (800) for thinkers |
minSpeechMs | 300 | Discard candidate utterances shorter than this | Raise to 500 if line noise causes false triggers |
prefixPaddingMs | 200 | Audio kept before the speech-start marker | Raise if first syllables get clipped |
Tapping the audio bridge (optional)
You usually don’t touch raw frames — the runtime bridges the model leg for you. When you do need to observe or inject audio (recording sidecar, custom DSP), attach the VoiceAudioBridge to the same call.
- TypeScript
- Python
AudioBridge methods relevant here:
| Method | Signature | Notes |
|---|---|---|
publishDownlink | (frame: Uint8Array, timestampUs?: bigint) | Inject audio toward the caller (e.g. a pre-roll prompt before the model speaks). |
publishUplink | (frame: Uint8Array, timestampUs?: bigint) | Inject audio toward the model (browser caller path). |
close | () | Detach the tap. The agent stays attached. |
AudioBridge surface.
Events
The agent leg surfaces turn-boundary and barge events through the call’s status stream. Subscribe with the Voice call handle:- TypeScript
- Python
| Event | Fires when |
|---|---|
| turn committed | The turn detector decided a real turn ended; the model is generating its reply. |
| barge-in confirmed | Sustained speech over the agent passed bargeConfirmMs; the in-flight reply is cancelled. |
| backchannel dropped | A short continuer was suppressed; the agent kept the floor (diagnostic). |
Preview: a dedicated typed Inference client
Preview. A standalone
Inference client (a typed handle that wraps agent
attach, the commit gate, and audio bridging behind one object) is in design.
The shape below is forward-looking and may change — for shipping code,
attach via voice.agents.attach as shown above.Related
- Inference — Details — wire model, the commit gate, turn-latency metric
- Inference — Cookbook — short task snippets
- Voice — SDK Methods — the full call + audio-bridge surface

