# Inference — Cookbook

> Short task recipes for serving a speech-to-speech agent: attach, tune turn detection, codecs, barge-in, and latency.

Copy-pasteable answers to "how do I X" for the inference modality. Everything
rides the [Voice](/modalities/voice/sdk-methods) surface — `voice.agents.attach`
plus the turn-detection knobs. TypeScript primary; Python shown where it differs.

## Attach a speech-to-speech agent to a call

The one-liner: bind a registered agent to a live call sid.

```ts
import { Voice } from "@clutchcall/sdk/voice";

const voice = new Voice({ baseUrl: BASE_URL, apiKey: KEY, orgId: ORG });
await voice.agents.attach(callSid, "support-s2s");
```

## Originate and attach in one call

Pass `agent` to `originate` so the model is bridged the moment the callee answers.

```ts
const call = await voice.calls.originate({
  to:      "+15551234567",
  from:    "+15558675309",
  trunkId: "trunk_main",
  agent:   "support-s2s",
});
```

## Attach with an inline agent spec

Skip the registry — select the model leg and turn-detection policy inline.

```ts
await voice.agents.attach(callSid, {
  mode:          "speech_to_speech",
  modelCodec:    "pcm16",
  inputRateHz:   16000,
  outputRateHz:  24000,
  turnDetection: { responseMinSpeechMs: 600, bargeConfirmMs: 300 },
});
```

## Pick the codec the model wants

Most speech-to-speech models ingest raw PCM16. Set the model codec; the caller
leg is transcoded at the bridge.

```ts
await voice.agents.attach(callSid, {
  mode:       "speech_to_speech",
  modelCodec: "pcm16",   // "opus" for a browser caller leg
});
```

> **NOTE:**
> A PSTN caller leg may be `g711_ulaw` / `g711_alaw`; a browser leg is `opus`.
>   You never set the caller codec here — only what the **model** expects.

## Tighten end-of-turn for snappier replies

Lower the trailing-silence threshold so the agent answers sooner after the
caller stops.

```ts
await voice.agents.attach(callSid, {
  mode: "speech_to_speech",
  turnDetection: { silenceThresholdMs: 300 },   // default 500
});
```

> **WARNING:**
> Below ~300 ms, end-of-turn fires on inter-word pauses and the agent interrupts
>   mid-sentence. Don't go lower.

## Stop backchannels from triggering a reply

Continuers like "mhm" / "ok" / "right" shouldn't make the model answer. Raise
the minimum speech length that counts as a turn.

```ts
await voice.agents.attach(callSid, {
  mode: "speech_to_speech",
  turnDetection: { responseMinSpeechMs: 700 },   // default 600 — anything shorter is dropped
});
```

## Make barge-in forgiving (hold-and-confirm)

Require speech to be sustained before cancelling the agent, so a muttered
continuer mid-reply doesn't kill the turn.

```ts
await voice.agents.attach(callSid, {
  mode: "speech_to_speech",
  turnDetection: { bargeConfirmMs: 400 },   // sustain 400ms before cancelling
});
```

## Make barge-in instant (browser / AEC legs)

When the caller leg has client-side echo cancellation, cut the agent on the
first speech frame.

```ts
await voice.agents.attach(callSid, {
  mode: "speech_to_speech",
  turnDetection: { bargeConfirmMs: 0 },   // 0 = fire immediately on onset
});
```

## Stop the agent self-triggering on a no-AEC SIP/PSTN leg

When there's no echo cancellation, the agent's own audio bleeds into the mic.
Raise the post-audio mic guard so it doesn't read its own voice as barge-in.

```ts
await voice.agents.attach(callSid, {
  mode: "speech_to_speech",
  turnDetection: { ttsGuardMs: 400 },   // default 200 — longer guard for noisy legs
});
```

## Inject a pre-roll prompt before the model speaks

Attach the audio bridge and push a downlink clip before the first model turn.

```ts
const bridge = await voice.audioBridge.attach(call.sid, { codec: "pcm16" });
bridge.publishDownlink(prerollPcm);   // "connecting you now…"
await voice.agents.attach(call.sid, { mode: "speech_to_speech" });
```

## Record the caller leg while the agent runs

Tap the uplink without disturbing the agent bridge.

  <Tab title="TypeScript">
```ts
const bridge = await voice.audioBridge.attach(call.sid, {
  codec: "pcm16",
  onUplink: (frame) => recorder.feed(frame),
});
```
  </Tab>
  <Tab title="Python">
```python
bridge = voice.audio_bridge.attach(
    call.sid, codec="pcm16",
    on_uplink=lambda frame, ts_us: recorder.feed(frame),
)
```
  </Tab>

## Watch turn boundaries and barge events

Subscribe to call status to see when the agent commits a turn or gets interrupted.

```ts
call.onStatus((s) => {
  if (s === "in_progress") console.log("agent bridged, listening");
  if (s === "completed")   console.log("call done");
});
```

## Detach the agent but keep the call up

Hand the call back to a human (or another app) by attaching a bare bridge and
not re-attaching the agent.

```ts
await voice.audioBridge.attach(call.sid, { codec: "pcm16", onUplink: humanLeg.feed });
// agent no longer drives the conversation; you own the audio now
```

## Attach a speech-to-speech agent in Python

The same flow, snake_case.

```python
from clutchcall.voice import Voice

voice = Voice(base_url=BASE_URL, api_key=KEY, org_id=ORG)
call  = voice.calls.originate(
    to="+15551234567", from_="+15558675309",
    trunk_id="trunk_main", agent="support-s2s",
)
```

## Related

- [Inference — Recipes](/modalities/inference/recipes) — full worked examples
- [Inference — SDK Methods](/modalities/inference/sdk-methods) — every method and the full knob table
- [Turn Detection & Barge-In](/concepts/turn-detection) — the policy these knobs drive
