Documentation Index Fetch the complete documentation index at: https://plivo.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Real-time bidirectional audio streaming enables Voice AI applications, live transcription, voice assistants, and custom audio processing on Plivo calls.
Prerequisites
1. Plivo Account
Sign up for Plivo and get your credentials:
Credential Where to Find Auth ID Plivo Console Auth Token Plivo Console
2. Phone Number
You need a voice-enabled Plivo number to make or receive calls.
Call Type Number Requirement Inbound Callers dial your Plivo number, triggers your Answer URL, starts stream Outbound Your Plivo number is the Caller ID when making calls via API
Get a number:
Go to Phone Numbers > Buy Numbers
Select country and type (local, toll-free, mobile)
Filter by voice_enabled = true
Purchase
India Numbers (Additional Requirements)
Indian phone numbers require KYC compliance: Requirement Details Account currency Must be INR KYC documents Certificate of Incorporation (COI) + GST Certificate Business registration India-registered businesses only
Submit compliance at Compliance Application before purchasing. See Rent India Numbers for details.
3. WebSocket Server
Your server must:
Accept WebSocket connections over wss://
Be publicly accessible (use ngrok for local development)
Handle Plivo’s stream events (start, media, dtmf, stop)
4. AI Service Credentials (Optional)
For voice AI applications, you’ll typically need:
Speech-to-Text : Deepgram, Google Speech, AWS Transcribe
LLM : OpenAI, Anthropic, Google Gemini
Text-to-Speech : ElevenLabs, Google TTS, Amazon Polly
How It Works
Plivo streams real-time audio between phone calls and your WebSocket server.
Phone Call <-> Plivo <-> WebSocket <-> Your Server <-> AI Services
Architecture
Step-by-Step Flow
Call Initiation : A caller dials your Plivo number, or your application initiates an outbound call.
Answer URL Request : Plivo makes an HTTP request to your configured Answer URL.
Stream XML Response : Your server responds with XML containing the <Stream> element, specifying the WebSocket URL and streaming parameters.
WebSocket Connection : Plivo establishes a WebSocket connection to your specified URL.
Start Event : Plivo sends a start event containing call metadata (call ID, stream ID, media format).
Media Streaming :
Inbound : Plivo continuously sends media events containing base64-encoded audio chunks from the caller.
Outbound : Your server sends playAudio events with base64-encoded audio to be played to the caller.
DTMF Events : When the caller presses keys, Plivo sends dtmf events with the digit information.
Control Events : Your server can send clearAudio to interrupt playback or checkpoint to track playback progress.
Connection Close : When the call ends or streaming stops, the WebSocket connection closes.
Stream XML
The <Stream> XML element initiates audio streaming for a call. Include it in your Answer URL response.
Basic Syntax
<? xml version = "1.0" encoding = "UTF-8" ?>
< Response >
< Stream bidirectional = "true" keepCallAlive = "true" contentType = "audio/x-mulaw;rate=8000" >
wss://your-server.com/stream
</ Stream >
</ Response >
Parameters
Parameter Type Default Description bidirectionalboolean falseEnable two-way audio streaming. When true, you can send audio back to the caller. keepCallAliveboolean falseKeep the call active after the stream ends. When false, the call ends when streaming stops. contentTypestring audio/x-mulaw;rate=8000Audio codec and sample rate. See Supported Content Types . statusCallbackUrlstring — URL for stream status callbacks (started, stopped, failed). statusCallbackMethodstring POSTHTTP method for status callbacks (GET or POST). extraHeadersstring — Custom headers to include in the start event. Format: key1=value1;key2=value2
Supported Content Types
Content Type Description Use Case audio/x-mulaw;rate=8000mu-law codec at 8kHz Recommended . Standard telephony, lowest latency, best compatibility.audio/x-l16;rate=8000Linear PCM 16-bit at 8kHz Higher quality for speech processing. audio/x-l16;rate=16000Linear PCM 16-bit at 16kHz High-quality speech recognition.
Examples
Bidirectional Stream with mu-law Codec
<? xml version = "1.0" encoding = "UTF-8" ?>
< Response >
< Speak > Hello! I'm connecting you to our AI assistant. </ Speak >
< Stream bidirectional = "true"
keepCallAlive = "true"
contentType = "audio/x-mulaw;rate=8000" >
wss://your-server.com/stream
</ Stream >
</ Response >
Stream with Status Callbacks and Extra Headers
<? xml version = "1.0" encoding = "UTF-8" ?>
< Response >
< Stream bidirectional = "true"
keepCallAlive = "true"
contentType = "audio/x-mulaw;rate=8000"
statusCallbackUrl = "https://your-server.com/stream-status"
statusCallbackMethod = "POST"
extraHeaders = "userId=12345;sessionId=abc-xyz" >
wss://your-server.com/stream
</ Stream >
</ Response >
Stream APIs
Control active streams programmatically via REST API calls.
Base URL
https://api.plivo.com/v1/Account/{auth_id}/Call/{call_uuid}/Stream/
Authentication
Use HTTP Basic Authentication with your Plivo Auth ID and Auth Token.
Stop a Stream
Endpoint : DELETE /v1/Account/{auth_id}/Call/{call_uuid}/Stream/
curl -X DELETE \
https://api.plivo.com/v1/Account/YOUR_AUTH_ID/Call/CALL_UUID/Stream/ \
-u YOUR_AUTH_ID:YOUR_AUTH_TOKEN
Get Stream Details
Endpoint : GET /v1/Account/{auth_id}/Call/{call_uuid}/Stream/
curl -X GET \
https://api.plivo.com/v1/Account/YOUR_AUTH_ID/Call/CALL_UUID/Stream/ \
-u YOUR_AUTH_ID:YOUR_AUTH_TOKEN
Using the Plivo SDK
Node.js
const plivo = require ( 'plivo' );
const client = new plivo . Client ( 'YOUR_AUTH_ID' , 'YOUR_AUTH_TOKEN' );
// Stop a stream
await client . calls . stopStream ( 'CALL_UUID' );
Python
import plivo
client = plivo.RestClient( 'YOUR_AUTH_ID' , 'YOUR_AUTH_TOKEN' )
# Stop a stream
client.calls.stop_stream( call_uuid = 'CALL_UUID' )
Stream Status Callbacks
Configure a callback URL to receive notifications about stream lifecycle events.
Configuration
< Stream bidirectional = "true"
statusCallbackUrl = "https://your-server.com/stream-status"
statusCallbackMethod = "POST" >
wss://your-server.com/stream
</ Stream >
Callback Parameters
Parameter Type Description CallUUIDstring The unique identifier for the call StreamIDstring The unique identifier for the stream Eventstring The event type: started, stopped, failed Timestampstring ISO 8601 timestamp of the event Fromstring The caller’s phone number Tostring The called phone number Directionstring Call direction: inbound or outbound StatusReasonstring Reason for status (on stopped or failed) Durationnumber Stream duration in seconds (on stopped)
Example Handler
app . post ( '/stream-status' , ( req , res ) => {
const { CallUUID , StreamID , Event , StatusReason , Duration } = req . body ;
switch ( Event ) {
case 'started' :
console . log ( `Stream ${ StreamID } started for call ${ CallUUID } ` );
break ;
case 'stopped' :
console . log ( `Stream ${ StreamID } stopped after ${ Duration } s: ${ StatusReason } ` );
break ;
case 'failed' :
console . error ( `Stream ${ StreamID } failed: ${ StatusReason } ` );
break ;
}
res . sendStatus ( 200 );
});
Signature Validation
Plivo signs WebSocket connection requests to verify authenticity. Validate these signatures to ensure requests originate from Plivo.
Header Description X-Plivo-Signature-V3The HMAC-SHA256 signature X-Plivo-Signature-V3-NonceA unique nonce for this request
Using the Plivo SDK
import { validateV3Signature } from 'plivo' ;
const isValid = validateV3Signature (
method , // 'GET' for WebSocket upgrade requests
uri , // Full URI including protocol and path
nonce , // X-Plivo-Signature-V3-Nonce header value
authToken , // Your Plivo Auth Token
signature , // X-Plivo-Signature-V3 header value
);
Using the Node.js Stream SDK
The plivo-stream-sdk-node handles signature validation automatically:
const plivoServer = new PlivoWebSocketServer ({
server ,
path: '/stream' ,
validateSignature: true ,
authToken: process . env . PLIVO_AUTH_TOKEN ,
});
When validateSignature is enabled, connections with invalid signatures are automatically rejected with a 1008 WebSocket close code.
WebSocket Events
All communication over the WebSocket uses JSON messages. Here are the essential events you need to handle.
Event Description startSent once when stream begins. Contains call metadata (callId, streamId, mediaFormat). mediaSent continuously. Contains base64-encoded audio chunks (~20ms each). dtmfSent when caller presses keys. Contains the digit pressed. playedStreamConfirmation that audio with a checkpoint finished playing. clearedAudioConfirmation that the audio queue was cleared.
Events to Plivo (Output)
Event Description playAudioSend audio to the caller. Include base64 payload matching stream contentType. checkpointMark a point in audio queue. Receive playedStream when reached. clearAudioClear all queued audio. Use for interruption handling.
Quick Example
// Handle incoming events
ws . on ( 'message' , ( data ) => {
const event = JSON . parse ( data );
switch ( event . event ) {
case 'start' :
console . log ( 'Stream started:' , event . start . streamId );
break ;
case 'media' :
// Forward audio to STT service
const audio = Buffer . from ( event . media . payload , 'base64' );
sttClient . send ( audio );
break ;
case 'dtmf' :
console . log ( 'DTMF pressed:' , event . dtmf . digit );
break ;
}
});
// Send audio to caller
ws . send ( JSON . stringify ({
event: 'playAudio' ,
media: {
contentType: 'audio/x-mulaw' ,
sampleRate: 8000 ,
payload: base64EncodedAudio
}
}));
For complete event schemas, TypeScript types, and detailed field documentation, see the Audio Streaming Protocol Reference .
Pass custom metadata from your Stream XML to your WebSocket server.
Usage
< Stream bidirectional = "true"
extraHeaders = "userId=12345;sessionId=abc-xyz;tier=premium" >
wss://your-server.com/stream
</ Stream >
Parsing
function parseExtraHeaders ( extraHeaders ) {
const headers = {};
if ( ! extraHeaders ) return headers ;
for ( const pair of extraHeaders . split ( ';' )) {
const [ key , value ] = pair . split ( '=' );
if ( key && value ) {
headers [ key . trim ()] = decodeURIComponent ( value . trim ());
}
}
return headers ;
}
// Usage
const headers = parseExtraHeaders ( event . extra_headers );
console . log ( headers . userId ); // "12345"
console . log ( headers . sessionId ); // "abc-xyz"
Limits
WebSocket and Stream Limits
Limit Value Maximum WebSocket URL length 2048 characters Maximum concurrent streams per call 1 Maximum stream duration Same as call duration Audio buffer size (playback queue) ~60 seconds of audio Maximum WebSocket message size 64 KB Recommended audio chunk size 16 KB base64-encoded or less
Best Practices
Use mu-law 8000Hz
Why mu-law at 8kHz is recommended:
Native Telephony Format : No transcoding required, lowest latency
Bandwidth Efficient : Compresses 16-bit audio to 8-bit while maintaining voice quality
Universal Compatibility : Every STT/TTS service supports mu-law
Sufficient for Voice : Human speech is well-represented at 8kHz
<!-- Recommended configuration -->
< Stream bidirectional = "true"
contentType = "audio/x-mulaw;rate=8000" >
wss://your-server.com/stream
</ Stream >
Minimize Latency
For a responsive Voice AI experience, aim for under 1 second total response time:
Component Target Latency Speech-to-Text < 200ms LLM Processing < 500ms Text-to-Speech < 200ms Network (round trip) < 100ms
Server Location : Deploy your WebSocket server close to your expected caller locations. Plivo routes calls through the edge location closest to the caller.
Traffic Source Recommended Server Location US-focused US East (Virginia) or US West (Oregon) Europe-focused Frankfurt or London Asia-Pacific Singapore or Mumbai Global Deploy in multiple regions with geographic routing
Handle Interruptions
Always support user interruption using clearAudio:
// When user speaks while AI is playing
if ( userSpeaking && aiPlaying ) {
ws . send ( JSON . stringify ({
event: 'clearAudio' ,
streamId: streamId
}));
}
Integration Guides
For complete code examples and step-by-step tutorials:
Plivo Stream SDK Official SDKs for Python, Node.js, and Java with full examples using Deepgram, OpenAI, and ElevenLabs
Pipecat Build with the Pipecat framework for simplified voice AI pipelines
Next Steps
Support
For questions, issues, or feature requests:
Last updated: January 2026