# Inference — SDK Methods

> Attach a speech-to-speech agent to a call, tune turn detection and codecs, and drive the duplex audio bridge.

The inference modality has **no standalone typed client** — it rides the
[Voice](/modalities/voice/sdk-methods) surface. You attach a speech-to-speech
agent to a call with `voice.agents.attach`, configure turn detection and the
model codec, and (optionally) tap the audio bridge. This page is the
inference-relevant slice of the Voice SDK plus the turn-detection knobs.

> **NOTE:**
> A dedicated typed `Inference` client is **Preview** — see the bottom of this
>   page. For shipping code today, use the Voice surface documented here.

## Import

The agent path lives on the `voice` subpath:

  <Tab title="TypeScript">
```ts
import { Voice } from "@clutchcall/sdk/voice";

const voice = new Voice({ baseUrl: BASE_URL, apiKey: KEY, orgId: ORG });
```
  </Tab>
  <Tab title="Python">
```python
from clutchcall.voice import Voice

voice = Voice(base_url=BASE_URL, api_key=KEY, org_id=ORG)
```
  </Tab>

Go, Rust, Java, and .NET bindings expose the same shapes (snake_case in Python,
PascalCase methods in Go/.NET).

## `voice.agents.attach`

Bind a speech-to-speech agent to a live (or just-originated) call. The runtime
bridges the call's `uplink` / `downlink` audio tracks to the model and owns
turn-taking for the duration of the call.

  <Tab title="TypeScript">
```ts
await voice.agents.attach(callSid, agent);
```
  </Tab>
  <Tab title="Python">
```python
voice.agents.attach(call_sid, agent)
```
  </Tab>

<ParamField path="callSid" type="string" required>
  The call sid to bind the agent to, from `originate()` or an inbound answer.
</ParamField>

<ParamField path="agent" type="string | AgentSpec" required>
  Either an agent id (string) registered in the control plane, or an inline
  agent spec selecting the speech-to-speech model and its turn-detection config.
</ParamField>

**Returns** `Promise<void>` — resolves once the agent is bridged. Audio begins
flowing immediately; the first agent utterance is gated by turn detection.

### Attaching at originate time

`originate` accepts an `agent` directly, so the common case is one call:

  <Tab title="TypeScript">
```ts
const call = await voice.calls.originate({
  to:      "+15551234567",
  from:    "+15558675309",
  trunkId: "trunk_main",
  agent:   "support-s2s",     // attach the speech-to-speech agent inline
});
```
  </Tab>
  <Tab title="Python">
```python
call = voice.calls.originate(
    to="+15551234567",
    from_="+15558675309",
    trunk_id="trunk_main",
    agent="support-s2s",
)
```
  </Tab>

## The agent spec

An inline `AgentSpec` selects the model leg and the turn-detection policy. The
shape mirrors the agent config the control plane stores.

```ts
type AgentSpec = {
  // The model the runtime serves on the call's audio legs. Describe the model
  // as a speech-to-speech model in your registry; do not name a vendor here.
  mode: "speech_to_speech";
  modelCodec?: AudioCodec;        // what the model ingests/emits (default "pcm16")
  inputRateHz?: number;           // model input rate (default 16000)
  outputRateHz?: number;          // model output rate (default 24000)
  turnDetection?: TurnDetection;  // turn-taking + barge-in policy (below)
};

type AudioCodec = "opus" | "pcm16" | "g711_ulaw" | "g711_alaw";
```

> **NOTE:**
> `ClutchCall` resamples between the caller leg rate (commonly 8 kHz on PSTN)
>   and the model's `inputRateHz` / `outputRateHz`. You set the model's rates; the
>   caller leg is handled at the bridge.

## Turn detection

The `TurnDetection` block is the heart of this modality — it owns the
client-commit gate, backchannel suppression, and hold-and-confirm barge-in. Drop
it into the agent spec, or set it server-side in the agent config.

```ts
type TurnDetection = {
  responseMinSpeechMs?: number;   // bursts shorter than this = backchannel, dropped
  commitMinIntervalMs?: number;   // cooldown between turn commits
  cancelMinIntervalMs?: number;   // cooldown between barge-ins
  bargeConfirmMs?: number;        // sustain speech this long before cancelling the agent
  ttsGuardMs?: number;            // raise the mic gate this long after agent audio
  silenceThresholdMs?: number;    // trailing silence before end-of-turn fires
  minSpeechMs?: number;           // discard candidate utterances shorter than this
  prefixPaddingMs?: number;       // audio kept before the speech-start marker
};
```

| Field | Default | What it does | When to change |
| --- | --- | --- | --- |
| `responseMinSpeechMs` | 600 | Speech shorter than this is a backchannel — never commits a turn | Lower (400) to let very short utterances answer; raise to ignore more continuers |
| `commitMinIntervalMs` | 1500 | Minimum gap between two turn commits | Raise if the agent over-answers rapid speech |
| `cancelMinIntervalMs` | 400 | Minimum gap between barge-ins | Raise on noisy trunks that false-trigger |
| `bargeConfirmMs` | 300 | Speech must be sustained this long over the agent before it is cancelled | `0` = cancel on first frame (browser/AEC legs); raise on noisy legs |
| `ttsGuardMs` | 200 | Mic gate is raised for this long after each agent chunk | Raise on no-AEC SIP/PSTN legs where TTS bleeds into the mic |
| `silenceThresholdMs` | 500 | Trailing silence before end-of-turn fires | Lower (300) for snappier replies; higher (800) for thinkers |
| `minSpeechMs` | 300 | Discard candidate utterances shorter than this | Raise to 500 if line noise causes false triggers |
| `prefixPaddingMs` | 200 | Audio kept before the speech-start marker | Raise if first syllables get clipped |

> **WARNING:**
> `silenceThresholdMs` below ~300 fires end-of-turn on inter-word pauses — the
>   agent will interrupt the caller mid-sentence. Keep it at or above 300.

## Tapping the audio bridge (optional)

You usually don't touch raw frames — the runtime bridges the model leg for you.
When you *do* need to observe or inject audio (recording sidecar, custom DSP),
attach the Voice `AudioBridge` to the same call.

  <Tab title="TypeScript">
```ts
const bridge = await voice.audioBridge.attach(call.sid, {
  codec: "pcm16",
  onUplink: (frame, tsUs) => recorder.feed(frame),   // caller audio (also fed to the model)
});
```
  </Tab>
  <Tab title="Python">
```python
bridge = voice.audio_bridge.attach(
    call.sid,
    codec="pcm16",
    on_uplink=lambda frame, ts_us: recorder.feed(frame),
)
```
  </Tab>

`AudioBridge` methods relevant here:

| Method | Signature | Notes |
| --- | --- | --- |
| `publishDownlink` | `(frame: Uint8Array, timestampUs?: bigint)` | Inject audio toward the caller (e.g. a pre-roll prompt before the model speaks). |
| `publishUplink` | `(frame: Uint8Array, timestampUs?: bigint)` | Inject audio toward the model (browser caller path). |
| `close` | `()` | Detach the tap. The agent stays attached. |

See [Voice — SDK Methods](/modalities/voice/sdk-methods) for the full
`AudioBridge` surface.

## Events

The agent leg surfaces turn-boundary and barge events through the call's status
stream. Subscribe with the Voice call handle:

  <Tab title="TypeScript">
```ts
call.onStatus((s) => {
  // dialing → ringing → in_progress → completed | failed | no_answer
  if (s === "in_progress") console.log("agent bridged, listening");
});
```
  </Tab>
  <Tab title="Python">
```python
call.on_status(lambda s: print("status:", s))
```
  </Tab>

| Event | Fires when |
| --- | --- |
| turn committed | The turn detector decided a real turn ended; the model is generating its reply. |
| barge-in confirmed | Sustained speech over the agent passed `bargeConfirmMs`; the in-flight reply is cancelled. |
| backchannel dropped | A short continuer was suppressed; the agent kept the floor (diagnostic). |

## Preview: a dedicated typed Inference client

> **NOTE:**
> **Preview.** A standalone `Inference` client (a typed handle that wraps agent
>   attach, the commit gate, and audio bridging behind one object) is in design.
>   The shape below is **forward-looking and may change** — for shipping code,
>   attach via `voice.agents.attach` as shown above.
> 
>   ```ts
>   // PREVIEW — not yet stable
>   import { Inference } from "@clutchcall/sdk/inference";
> 
>   const session = await Inference.attach(call.sid, {
>     model: "speech-to-speech",
>     codec: "pcm16",
>     turnDetection: { responseMinSpeechMs: 600, bargeConfirmMs: 300 },
>     onTurn:    (t) => console.log("turn latency ms:", t.latencyMs),
>     onBargeIn: ()  => console.log("caller interrupted"),
>   });
>   await session.close();
>   ```

## Related

- [Inference — Details](/modalities/inference/details) — wire model, the commit gate, turn-latency metric
- [Inference — Cookbook](/modalities/inference/cookbook) — short task snippets
- [Voice — SDK Methods](/modalities/voice/sdk-methods) — the full call + audio-bridge surface
