# Audio Frames

> Bidirectional audio over QUIC unidirectional streams.

`AudioFrame` carries a single packet of voice — typically 20 ms of µ-law or
PCM. It rides on the same envelope as every other RPC, but on **uni-streams**
to keep latency low and avoid pairing every send with an ack.

## `AudioFrame` schema

| Field             | Type     | Notes                                                    |
| ----------------- | -------- | -------------------------------------------------------- |
| `call_sid`        | `string` | Identifies the call this packet belongs to.              |
| `payload`         | `string` | Raw codec bytes (length-prefixed). Treated as opaque.    |
| `codec`           | `string` | `"PCMU"` (G.711 µ-law), `"PCMA"` (G.711 A-law), or `"PCM16"`. |
| `sequence_number` | `uint64` | Monotonic per `(call_sid, direction)`. Used to detect loss. |
| `end_of_stream`   | `bool`   | Final frame; the gateway will close the audio uni-stream. |

`method_id` for an audio frame is always `2991054320` (`0xb241_b9b0`).

## Frame layout on the wire

```
+----------------+----------------+----------------------+
| u32 length LE  | method_id      | AudioFrame envelope  |
+----------------+----------------+----------------------+
```

This is the standard [envelope format](/rpc/envelope-format) — audio is not
special.

## Outbound (your mic → trunk)

Open exactly **one** client-initiated unidirectional stream per call and
write framed `AudioFrame`s back-to-back. Don't open a stream per packet —
that overwhelms the gateway's flow-control budget within seconds.

```
client uni-stream  ─────[frame][frame][frame] … [frame eos=true]─────▶
```

After `end_of_stream = true`, close the stream. The gateway will not accept
more frames on it.

### Pacing

For µ-law @ 8 kHz with a 20 ms ptime, payload = 160 bytes. Send one frame
every 20 ms (50 fps). Faster pacing is buffered by the trunk and arrives
late on the far end; slower pacing causes audible gaps.

## Inbound (trunk → your speaker)

The gateway opens server-initiated uni-streams. After your `EventStreamRequest`
subscription, every uni-stream is multiplexed: each frame's `method_id`
determines whether it's audio (`2991054320`) or a `CallEvent` (`959835745`).

A typical demuxer:

```python
async for frame in incoming_uni_streams:
    method_id, body = read_frame(frame)
    if method_id == 2991054320:        # AudioFrame
        af = parse_audio_frame(body)
        speaker.play(af.payload)
    elif method_id == 959835745:       # CallEvent
        ev = parse_call_event(body)
        on_event(ev)
```

## Codec choices

| Codec   | Bitrate     | Use when                                        |
| ------- | ----------- | ----------------------------------------------- |
| `PCMU`  | 64 kbit/s   | Talking to PSTN/SIP. The default for trunks.    |
| `PCMA`  | 64 kbit/s   | EU/PSTN. Same wire shape, different table.      |
| `PCM16` | 256 kbit/s  | Sending TTS or studio-quality content into the gateway. The gateway re-encodes to the trunk's codec. |

The gateway transcodes for you on ingress; on egress it sends whatever the
far-end negotiated. If you need a specific egress codec, set
`OriginateRequest.default_app_args` accordingly.

## Loss handling

Use `sequence_number` to detect dropped packets. The gateway does **not**
retransmit audio (that defeats latency). For PSTN calls, any missing
sequence number on egress is heard as a 20 ms silence — fine for voice,
catastrophic for DTMF, so use INFO-method DTMF rather than in-band tones
when possible.
