Call Recording and Transcription
CallsOne Call, Many Recordings
A call can have multiple recording records, not just one. Each
recording is differentiated by its type:
- call - the full recording of the call.
- consumer_and_agent - a recording isolated to the consumer and Power Dialer agent legs.
- consumer_and_buyer - a recording isolated to the consumer and buyer legs.
Multi-recording mode lets the platform store focused audio for each
party pairing, which improves downstream transcription accuracy and
lets the UI surface the right snippet for the right reviewer.
Recording only occurs when the offer or buyer configuration requests
it, or when a call type (such as a voice agent call) forces it.
Recordings are stored in object storage and streamed back through
TrackDrive for authenticated playback; raw storage URLs are never
exposed.
Transcription: Belongs Directly to a Call
Transcriptions are associated directly with the call, not with a
specific recording. A call can have multiple transcriptions.
A transcription contains per-utterance timestamps, per-channel
speaker attribution (when dual-channel audio is available), and the
full text of the conversation.
Transcription is lazy on ordinary calls: a transcription is produced
only when requested, or when the offer or buyer has opted into
automatic transcription. This avoids paying the per-minute
transcription cost on calls nobody will ever read. Voice agent
calls always transcribe because the transcription stream drives
keyword spotting and token extraction.
Keyword Spotting on Voice Agent Calls
On voice-agent calls, a keyword spotting service watches the live
transcription stream. It runs against every keyword configured for
the active flow, matches against the caller channel, the agent
channel, or both, and fires the configured actions: setting a
disposition, running a schedule action group, or applying a call
control action. Each keyword match is deduplicated for one day, so
repeated utterances of the same keyword on the same call do not
trigger the action more than once.
On regular (non-voice-agent) calls, transcription is a passive
artifact used for reporting and manual review.