Documentation Index
Fetch the complete documentation index at: https://plivo.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
The Plivo Stream SDK provides official libraries for Python, Node.js, and Java to build AI voice agents using Plivo’s Audio Streaming API. These SDKs handle WebSocket connections, audio encoding/decoding, and event management, letting you focus on your AI integration logic.
What You Can Build
- AI Voice Assistants - Natural conversations powered by speech-to-text, LLMs, and text-to-speech
- Real-time Transcription - Live call transcription with speech recognition services
- Voice Bots - Automated IVR systems with intelligent responses
- Call Analytics - Real-time audio analysis and sentiment detection
Get Started with Plivo
Before developing your AI voice agent, sign up for Plivo or sign in to your existing account. Purchase a voice-enabled number through the Plivo console.
Prerequisites
Required Accounts
- Plivo - Account with Auth ID and Auth Token
- Deepgram - Sign up for speech-to-text
- OpenAI - Sign up for conversational AI
- ElevenLabs - Sign up for text-to-speech
Language Requirements
- Python 3.8 or later
- pip package manager
- Node.js 18 or later
- npm or yarn
- Java 17 or later
- Maven or Gradle
- Jakarta EE compatible server (Tomcat 10+, Jetty 11+)
Installation
The Python SDK supports two WebSocket implementations:
- FastAPI - For production applications using ASGI
- websockets - Lightweight option for simple use cases
npm i plivo-stream-sdk-node
Or with yarn:yarn add plivo-stream-sdk-node
The Node.js SDK is built on the ws WebSocket library and includes TypeScript definitions.Add to your pom.xml:<dependency>
<groupId>com.plivo</groupId>
<artifactId>plivo-stream-sdk</artifactId>
<version>1.0.0</version>
</dependency>
Or with Gradle:implementation 'com.plivo:plivo-stream-sdk:1.0.0'
The Java SDK uses Jakarta WebSocket API 2.1.1.
Core Concepts
Audio Streaming Flow
┌─────────────┐ WebSocket ┌─────────────┐ API Calls ┌─────────────┐
│ Plivo │ ───────────────▶│ Your App │ ───────────────▶│ AI Services│
│ Call │ ◀─────────────── │ (SDK) │ ◀─────────────── │ STT/LLM/TTS│
└─────────────┘ Audio Events └─────────────┘ Text/Audio └─────────────┘
- Caller dials your Plivo number
- Plivo connects to your WebSocket endpoint
- SDK receives START event with stream metadata
- Audio flows as MEDIA events (base64-encoded mu-law)
- Your app processes audio through AI services
- SDK sends audio back to the caller
Event Types
| Event | Description |
|---|
START | Stream initialized with call metadata (stream ID, call UUID, from/to numbers) |
MEDIA | Audio chunk received (base64-encoded, mu-law at 8kHz or linear PCM at 16kHz) |
DTMF | Caller pressed a key on their phone |
STOP | Stream ended |
| Format | Encoding | Sample Rate | Use Case |
|---|
audio/x-mulaw | mu-law | 8000 Hz | Standard telephony (default) |
audio/x-l16 | Linear PCM | 16000 Hz | Higher quality for STT |
Quick Start
Step 1: Create a WebSocket Handler
Python (FastAPI)
Node.js
Java
from fastapi import FastAPI, WebSocket
from plivo_stream import PlivoFastAPIStreamingHandler, StartEvent, MediaEvent
app = FastAPI()
@app.websocket("/stream")
async def websocket_endpoint(websocket: WebSocket):
handler = PlivoFastAPIStreamingHandler(websocket)
@handler.on_start
async def handle_start(event: StartEvent):
print(f"Stream started: {handler.get_stream_id()}")
print(f"Call from: {event.start.from_}")
print(f"Call to: {event.start.to}")
@handler.on_media
async def handle_media(event: MediaEvent):
# Get raw audio bytes from the event
audio_bytes = event.get_raw_media()
# Process audio (send to STT, etc.)
# ...
# Send audio back to caller
await handler.send_media(response_audio)
@handler.on_dtmf
async def handle_dtmf(event):
print(f"DTMF digit pressed: {event.dtmf.digit}")
@handler.on_stop
async def handle_stop(event):
print("Stream ended")
await handler.start()
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=5000)
import express from 'express';
import { createServer } from 'http';
import { PlivoWebSocketServer, StartEvent, MediaEvent } from '@plivo/plivo-stream-sdk';
const app = express();
const server = createServer(app);
const plivoServer = new PlivoWebSocketServer({
server,
path: '/stream'
});
plivoServer
.onStart((event: StartEvent, ws) => {
console.log(`Stream started: ${event.start.streamId}`);
console.log(`Call from: ${event.start.from}`);
console.log(`Call to: ${event.start.to}`);
})
.onMedia((event: MediaEvent, ws) => {
// Get raw audio buffer from the event
const audioBuffer = event.getRawMedia();
// Process audio (send to STT, etc.)
// ...
// Send audio back to caller
plivoServer.playAudio(ws, 'audio/x-mulaw', 8000, responseAudio);
})
.onDtmf((event, ws) => {
console.log(`DTMF digit pressed: ${event.dtmf.digit}`);
})
.onStop((event, ws) => {
console.log('Stream ended');
})
.start();
server.listen(5000, () => {
console.log('Server listening on port 5000');
});
import com.plivo.stream.PlivoStreamingHandler;
import com.plivo.stream.PlivoWebSocketEndpoint;
import com.plivo.stream.event.StartEvent;
import com.plivo.stream.event.MediaEvent;
import jakarta.websocket.server.ServerEndpoint;
import jakarta.websocket.Session;
@ServerEndpoint("/stream")
public class StreamEndpoint extends PlivoWebSocketEndpoint {
@Override
protected PlivoStreamingHandler createHandler(Session session) {
PlivoStreamingHandler handler = new PlivoStreamingHandler(session);
handler.onStart(event -> {
System.out.println("Stream started: " + event.getStart().getStreamId());
System.out.println("Call from: " + event.getStart().getFrom());
System.out.println("Call to: " + event.getStart().getTo());
});
handler.onMedia(event -> {
// Get raw audio bytes from the event
byte[] audioBytes = event.getRawMedia();
// Process audio (send to STT, etc.)
// ...
// Send audio back to caller
handler.playAudio(responseAudio, "audio/x-mulaw", 8000);
});
handler.onDtmf(event -> {
System.out.println("DTMF digit pressed: " + event.getDtmf().getDigit());
});
handler.onStop(event -> {
System.out.println("Stream ended");
});
return handler;
}
}
Create an XML application that routes calls to your WebSocket endpoint:
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Speak>Connected to AI Assistant. You may begin speaking.</Speak>
<Stream keepCallAlive="true" audioTrack="both" contentType="audio/x-mulaw;rate=8000">
wss://your-domain.com/stream
</Stream>
</Response>
Step 3: Set Up Local Development
For local testing, use ngrok to expose your WebSocket endpoint:
# Install ngrok
brew install ngrok # macOS
# or download from https://ngrok.com/download
# Start tunnel
ngrok http 5000
Update your Plivo XML with the ngrok URL:
<Stream keepCallAlive="true" audioTrack="both">
wss://abc123.ngrok.app/stream
</Stream>
SDK Reference
Sending Audio to Caller
# Send audio bytes to the caller
await handler.send_media(audio_bytes)
# Send a checkpoint (receive callback when audio finishes playing)
await handler.send_checkpoint(name="greeting_complete")
# Clear any queued audio (useful for interruptions)
await handler.send_clear_audio()
// Send audio to the caller
plivoServer.playAudio(ws, 'audio/x-mulaw', 8000, audioBuffer);
// Send a checkpoint (receive callback when audio finishes playing)
plivoServer.checkpoint(ws, 'greeting_complete');
// Clear any queued audio (useful for interruptions)
plivoServer.clearAudio(ws);
// Send audio to the caller
handler.playAudio(audioBytes, "audio/x-mulaw", 8000);
// Send a checkpoint (receive callback when audio finishes playing)
handler.checkpoint("greeting_complete");
// Clear any queued audio (useful for interruptions)
handler.clearAudio();
Event Handlers
| Event | Handler | Description |
|---|
| Connection | on_connected / onConnection | WebSocket connected (before START) |
| Start | on_start / onStart | Stream initialized, call metadata available |
| Media | on_media / onMedia | Audio chunk received |
| DTMF | on_dtmf / onDtmf | Keypad digit pressed |
| Stop | on_stop / onStop | Stream ended |
| Checkpoint | on_played_stream / onPlayedStream | Checkpoint reached (audio finished) |
| Audio Cleared | on_cleared_audio / onClearedAudio | Audio queue cleared |
@handler.on_start
async def handle_start(event: StartEvent):
stream_id = handler.get_stream_id()
call_uuid = event.start.call_id
from_number = event.start.from_
to_number = event.start.to
content_type = event.start.media_format.encoding # audio/x-mulaw
sample_rate = event.start.media_format.sample_rate # 8000
plivoServer.onStart((event: StartEvent, ws) => {
const streamId = event.start.streamId;
const callUuid = event.start.callId;
const fromNumber = event.start.from;
const toNumber = event.start.to;
const contentType = event.start.mediaFormat.encoding; // audio/x-mulaw
const sampleRate = event.start.mediaFormat.sampleRate; // 8000
});
handler.onStart(event -> {
String streamId = event.getStart().getStreamId();
String callUuid = event.getStart().getCallId();
String fromNumber = event.getStart().getFrom();
String toNumber = event.getStart().getTo();
String contentType = event.getStart().getMediaFormat().getEncoding();
int sampleRate = event.getStart().getMediaFormat().getSampleRate();
});
Configuration Options
Environment Variables
Create a .env file with your credentials:
# Plivo credentials
PLIVO_AUTH_ID=your_auth_id
PLIVO_AUTH_TOKEN=your_auth_token
# AI service credentials
DEEPGRAM_API_KEY=your_deepgram_key
OPENAI_API_KEY=your_openai_key
ELEVENLABS_API_KEY=your_elevenlabs_key
Plivo Stream XML Options
<Stream
keepCallAlive="true"
audioTrack="both"
contentType="audio/x-mulaw;rate=8000"
statusCallbackUrl="https://your-domain.com/stream-status"
statusCallbackMethod="POST">
wss://your-domain.com/stream
</Stream>
| Attribute | Description |
|---|
keepCallAlive | Keep call active after stream ends (true/false) |
audioTrack | Audio direction: inbound, outbound, or both |
contentType | Audio format: audio/x-mulaw;rate=8000 or audio/x-l16;rate=16000 |
statusCallbackUrl | URL for stream status webhooks |
Troubleshooting
WebSocket Connection Issues
- Verify ngrok is running and the URL matches your XML configuration
- Check firewall rules allow WebSocket connections on your server
- Validate SSL certificates if using custom domains
Audio Quality Issues
- Use correct audio format - mu-law at 8kHz for standard telephony
- Check sample rate matches between incoming and outgoing audio
- Monitor latency - keep processing under 200ms for natural conversation
No Audio Received
- Verify
audioTrack is set to both or inbound in your XML
- Check handler is registered before calling
start()
- Confirm call is connected - START event should fire first
Clone the Example Repositories
Full working examples are available in the SDK repositories:
git clone https://github.com/plivo/plivo-stream-sdk-python.git
cd plivo-stream-sdk-python/examples/demo
pip install -r requirements.txt
python server.py
git clone https://github.com/plivo/plivo-stream-sdk-node.git
cd plivo-stream-sdk-node/examples/express-streaming
npm install
npm start
git clone https://github.com/plivo/plivo-stream-sdk-java.git
cd plivo-stream-sdk-java
mvn clean install
# Run the example in examples/voice-ai-agent
Support