# Telemetry

> Metrics, traces, CDRs, and SIP packet capture.

The ClutchCall gateway emits four streams of operational data. Each
one targets a different consumer; you can opt out of any of them via
the gateway's deployment config.

| Stream                   | Protocol           | Consumer                         |
| ------------------------ | ------------------ | -------------------------------- |
| Prometheus metrics       | OpenMetrics scrape | Prometheus / Grafana / Mimir     |
| Distributed traces       | OTLP / HTTP `:4318` | SigNoz, Tempo, Honeycomb, Datadog |
| Call Detail Records      | HTTP JSONEachRow   | ClickHouse                       |
| SIP packet capture       | HEPv3 UDP          | Homer / heplify-server           |

## Metrics

A Prometheus-style endpoint exposes a fixed set of metric families.
Names are stable; relevant ones for capacity planning:

| Metric                                | Labels                                        | Type      |
| ------------------------------------- | --------------------------------------------- | --------- |
| `clutchcall_calls_total`       | `tenant`, `trunk`, `direction`, `result`      | Counter   |
| `clutchcall_calls_active`      | `tenant`, `trunk`                             | Gauge     |
| `clutchcall_call_duration_seconds` | `tenant`, `trunk`, `result`               | Histogram |
| `clutchcall_rtp_packets_total` | `direction`, `codec`                          | Counter   |
| `clutchcall_rtp_loss_ratio`    | `tenant`                                      | Gauge     |
| `clutchcall_jitter_ms`         | `tenant`                                      | Histogram |
| `clutchcall_estimated_mos`     | `tenant`                                      | Histogram |
| `clutchcall_codec_engine_lag_us` | `shard`                                     | Histogram |
| `clutchcall_quic_handshake_seconds` | (none)                                   | Histogram |
| `clutchcall_admin_requests_total` | `method`, `result`                         | Counter   |

Scrape from `http://<gateway-host>:9091/metrics`.

## Tracing

Every RPC produces an OpenTelemetry trace with these standard fields:

- `service.name` = `clutchcall-gateway`
- `tenant.id`
- `call.sid` (when applicable)
- `method.id` (the RPC's `method_id` decimal)
- `transport` = `quic` | `webtransport` | `webrtc` | `sip`

Spans cover RPC parse, JWT validation, trunk lookup, dialplan node
execution, and SIP/RTP setup. Correlate with metrics via the `tenant`
label.

OTLP endpoint defaults: `http://localhost:4318` (HTTP) /
`http://localhost:4317` (gRPC). Override via the gateway's
`OTEL_EXPORTER_OTLP_ENDPOINT` env var.

## Call Detail Records

Each call ends with a CDR row pushed to ClickHouse via the HTTP
`JSONEachRow` interface. Schema (in the `telephony_cdrs` table,
partitioned by month, indexed on `(tenant_id, timestamp_ms)`):

| Column              | ClickHouse type | Source                                        |
| ------------------- | --------------- | --------------------------------------------- |
| `call_sid`          | `String`        |                                               |
| `tenant_id`         | `LowCardinality(String)` |                                       |
| `trunk_id`          | `LowCardinality(String)` |                                       |
| `direction`         | `Enum8`         | `outbound` / `inbound`.                       |
| `from_number`       | `String`        |                                               |
| `to_number`         | `String`        |                                               |
| `start_timestamp_ms`  | `DateTime64(3)` |                                             |
| `answer_timestamp_ms` | `Nullable(DateTime64(3))` |                                   |
| `end_timestamp_ms`  | `DateTime64(3)` |                                               |
| `duration_seconds`  | `UInt32`        |                                               |
| `q850_cause`        | `UInt16`        | ITU-T Q.850 cause code.                       |
| `status`            | `LowCardinality(String)` |                                       |
| `codec`             | `LowCardinality(String)` |                                       |
| `packets_sent`      | `UInt32`        |                                               |
| `packets_received`  | `UInt32`        |                                               |
| `packets_lost`      | `UInt32`        |                                               |
| `bytes_sent`        | `UInt64`        |                                               |
| `jitter_ms`         | `Float32`       |                                               |
| `estimated_mos`     | `Float32`       | R-factor → MOS estimate.                      |
| `recording_url`     | `String`        | Empty if not recorded.                        |
| `agent_id`          | `LowCardinality(String)` | If routed through an AI agent DAG.   |

These fields mirror the `CallEvent` you receive on
`CHANNEL_HANGUP_COMPLETE` — same data, just persisted server-side.

## SIP capture (HEPv3)

For SIP-side debugging the gateway can also emit HEPv3 frames over UDP
to a Homer / heplify-server instance. Useful when a call fails before
any `CallEvent` makes it to the SDK and you need to inspect the actual
SIP exchange.

Configure via:

```yaml
# gateway config snippet
heplify:
  enabled: true
  endpoint: 192.0.2.10:9060
  capture_id: 2001
```

Disable in production unless you're actively debugging — HEPv3 doubles
the SIP-side packet rate.
