Build AI Voice Agents with the Plivo Stream SDK

The Plivo Stream SDK provides official libraries for Python, Node.js, and Java to build AI voice agents using Plivo’s Audio Streaming API. These SDKs handle WebSocket connections, audio encoding/decoding, and event management, letting you focus on your AI integration logic.

What You Can Build

AI Voice Assistants - Natural conversations powered by speech-to-text, LLMs, and text-to-speech
Real-time Transcription - Live call transcription with speech recognition services
Voice Bots - Automated IVR systems with intelligent responses
Call Analytics - Real-time audio analysis and sentiment detection

Get Started with Plivo

Before developing your AI voice agent, sign up for Plivo or sign in to your existing account. Purchase a voice-enabled number through the Plivo console.

Prerequisites

Required Accounts

Plivo - Account with Auth ID and Auth Token
Deepgram - Sign up for speech-to-text
OpenAI - Sign up for conversational AI
ElevenLabs - Sign up for text-to-speech

Language Requirements

Python
Node.js
Java

Python 3.8 or later
pip package manager

Installation

Python
Node.js
Java

pip install plivo-stream

The Python SDK supports two WebSocket implementations:

FastAPI - For production applications using ASGI
websockets - Lightweight option for simple use cases

npm i plivo-stream-sdk-node

Or with yarn:

yarn add plivo-stream-sdk-node

The Node.js SDK is built on the ws WebSocket library and includes TypeScript definitions.

Add to your pom.xml:

<dependency>
    <groupId>com.plivo</groupId>
    <artifactId>plivo-stream-sdk</artifactId>
    <version>1.0.0</version>
</dependency>

Or with Gradle:

implementation 'com.plivo:plivo-stream-sdk:1.0.0'

The Java SDK uses Jakarta WebSocket API 2.1.1.

Core Concepts

Audio Streaming Flow

┌─────────────┐    WebSocket    ┌─────────────┐    API Calls    ┌─────────────┐
│   Plivo     │ ───────────────▶│  Your App   │ ───────────────▶│  AI Services│
│   Call      │ ◀─────────────── │  (SDK)      │ ◀─────────────── │  STT/LLM/TTS│
└─────────────┘   Audio Events   └─────────────┘   Text/Audio    └─────────────┘

Caller dials your Plivo number
Plivo connects to your WebSocket endpoint
SDK receives START event with stream metadata
Audio flows as MEDIA events (base64-encoded mu-law)
Your app processes audio through AI services
SDK sends audio back to the caller

Event Types

Event	Description
`START`	Stream initialized with call metadata (stream ID, call UUID, from/to numbers)
`MEDIA`	Audio chunk received (base64-encoded, mu-law at 8kHz or linear PCM at 16kHz)
`DTMF`	Caller pressed a key on their phone
`STOP`	Stream ended

Audio Formats

Format	Encoding	Sample Rate	Use Case
`audio/x-mulaw`	mu-law	8000 Hz	Standard telephony (default)
`audio/x-l16`	Linear PCM	16000 Hz	Higher quality for STT

Quick Start

Step 1: Create a WebSocket Handler

Python (FastAPI)
Node.js
Java

from fastapi import FastAPI, WebSocket
from plivo_stream import PlivoFastAPIStreamingHandler, StartEvent, MediaEvent

app = FastAPI()

@app.websocket("/stream")
async def websocket_endpoint(websocket: WebSocket):
    handler = PlivoFastAPIStreamingHandler(websocket)

    @handler.on_start
    async def handle_start(event: StartEvent):
        print(f"Stream started: {handler.get_stream_id()}")
        print(f"Call from: {event.start.from_}")
        print(f"Call to: {event.start.to}")

    @handler.on_media
    async def handle_media(event: MediaEvent):
        # Get raw audio bytes from the event
        audio_bytes = event.get_raw_media()

        # Process audio (send to STT, etc.)
        # ...

        # Send audio back to caller
        await handler.send_media(response_audio)

    @handler.on_dtmf
    async def handle_dtmf(event):
        print(f"DTMF digit pressed: {event.dtmf.digit}")

    @handler.on_stop
    async def handle_stop(event):
        print("Stream ended")

    await handler.start()

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=5000)

import express from 'express';
import { createServer } from 'http';
import { PlivoWebSocketServer, StartEvent, MediaEvent } from '@plivo/plivo-stream-sdk';

const app = express();
const server = createServer(app);

const plivoServer = new PlivoWebSocketServer({
    server,
    path: '/stream'
});

plivoServer
    .onStart((event: StartEvent, ws) => {
        console.log(`Stream started: ${event.start.streamId}`);
        console.log(`Call from: ${event.start.from}`);
        console.log(`Call to: ${event.start.to}`);
    })
    .onMedia((event: MediaEvent, ws) => {
        // Get raw audio buffer from the event
        const audioBuffer = event.getRawMedia();

        // Process audio (send to STT, etc.)
        // ...

        // Send audio back to caller
        plivoServer.playAudio(ws, 'audio/x-mulaw', 8000, responseAudio);
    })
    .onDtmf((event, ws) => {
        console.log(`DTMF digit pressed: ${event.dtmf.digit}`);
    })
    .onStop((event, ws) => {
        console.log('Stream ended');
    })
    .start();

server.listen(5000, () => {
    console.log('Server listening on port 5000');
});

import com.plivo.stream.PlivoStreamingHandler;
import com.plivo.stream.PlivoWebSocketEndpoint;
import com.plivo.stream.event.StartEvent;
import com.plivo.stream.event.MediaEvent;
import jakarta.websocket.server.ServerEndpoint;
import jakarta.websocket.Session;

@ServerEndpoint("/stream")
public class StreamEndpoint extends PlivoWebSocketEndpoint {

    @Override
    protected PlivoStreamingHandler createHandler(Session session) {
        PlivoStreamingHandler handler = new PlivoStreamingHandler(session);

        handler.onStart(event -> {
            System.out.println("Stream started: " + event.getStart().getStreamId());
            System.out.println("Call from: " + event.getStart().getFrom());
            System.out.println("Call to: " + event.getStart().getTo());
        });

        handler.onMedia(event -> {
            // Get raw audio bytes from the event
            byte[] audioBytes = event.getRawMedia();

            // Process audio (send to STT, etc.)
            // ...

            // Send audio back to caller
            handler.playAudio(responseAudio, "audio/x-mulaw", 8000);
        });

        handler.onDtmf(event -> {
            System.out.println("DTMF digit pressed: " + event.getDtmf().getDigit());
        });

        handler.onStop(event -> {
            System.out.println("Stream ended");
        });

        return handler;
    }
}

Step 2: Configure Plivo to Stream Audio

Create an XML application that routes calls to your WebSocket endpoint:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Speak>Connected to AI Assistant. You may begin speaking.</Speak>
    <Stream keepCallAlive="true" audioTrack="both" contentType="audio/x-mulaw;rate=8000">
        wss://your-domain.com/stream
    </Stream>
</Response>

Step 3: Set Up Local Development

For local testing, use ngrok to expose your WebSocket endpoint:

# Install ngrok
brew install ngrok  # macOS
# or download from https://ngrok.com/download

# Start tunnel
ngrok http 5000

Update your Plivo XML with the ngrok URL:

<Stream keepCallAlive="true" audioTrack="both">
    wss://abc123.ngrok.app/stream
</Stream>

For complete AI voice agent examples with Deepgram, OpenAI, and ElevenLabs integration, see Clone the Example Repositories or the Deepgram + OpenAI + ElevenLabs Guide.

SDK Reference

Sending Audio to Caller

Python
Node.js
Java

# Send audio bytes to the caller
await handler.send_media(audio_bytes)

# Send a checkpoint (receive callback when audio finishes playing)
await handler.send_checkpoint(name="greeting_complete")

# Clear any queued audio (useful for interruptions)
await handler.send_clear_audio()

// Send audio to the caller
plivoServer.playAudio(ws, 'audio/x-mulaw', 8000, audioBuffer);

// Send a checkpoint (receive callback when audio finishes playing)
plivoServer.checkpoint(ws, 'greeting_complete');

// Clear any queued audio (useful for interruptions)
plivoServer.clearAudio(ws);

// Send audio to the caller
handler.playAudio(audioBytes, "audio/x-mulaw", 8000);

// Send a checkpoint (receive callback when audio finishes playing)
handler.checkpoint("greeting_complete");

// Clear any queued audio (useful for interruptions)
handler.clearAudio();

Event Handlers

Event	Handler	Description
Connection	`on_connected` / `onConnection`	WebSocket connected (before START)
Start	`on_start` / `onStart`	Stream initialized, call metadata available
Media	`on_media` / `onMedia`	Audio chunk received
DTMF	`on_dtmf` / `onDtmf`	Keypad digit pressed
Stop	`on_stop` / `onStop`	Stream ended
Checkpoint	`on_played_stream` / `onPlayedStream`	Checkpoint reached (audio finished)
Audio Cleared	`on_cleared_audio` / `onClearedAudio`	Audio queue cleared

Getting Stream Information

Python
Node.js
Java

@handler.on_start
async def handle_start(event: StartEvent):
    stream_id = handler.get_stream_id()
    call_uuid = event.start.call_id
    from_number = event.start.from_
    to_number = event.start.to
    content_type = event.start.media_format.encoding  # audio/x-mulaw
    sample_rate = event.start.media_format.sample_rate  # 8000

plivoServer.onStart((event: StartEvent, ws) => {
    const streamId = event.start.streamId;
    const callUuid = event.start.callId;
    const fromNumber = event.start.from;
    const toNumber = event.start.to;
    const contentType = event.start.mediaFormat.encoding;  // audio/x-mulaw
    const sampleRate = event.start.mediaFormat.sampleRate;  // 8000
});

handler.onStart(event -> {
    String streamId = event.getStart().getStreamId();
    String callUuid = event.getStart().getCallId();
    String fromNumber = event.getStart().getFrom();
    String toNumber = event.getStart().getTo();
    String contentType = event.getStart().getMediaFormat().getEncoding();
    int sampleRate = event.getStart().getMediaFormat().getSampleRate();
});

Configuration Options

Environment Variables

Create a .env file with your credentials:

# Plivo credentials
PLIVO_AUTH_ID=your_auth_id
PLIVO_AUTH_TOKEN=your_auth_token

# AI service credentials
DEEPGRAM_API_KEY=your_deepgram_key
OPENAI_API_KEY=your_openai_key
ELEVENLABS_API_KEY=your_elevenlabs_key

Plivo Stream XML Options

<Stream
    keepCallAlive="true"
    audioTrack="both"
    contentType="audio/x-mulaw;rate=8000"
    statusCallbackUrl="https://your-domain.com/stream-status"
    statusCallbackMethod="POST">
    wss://your-domain.com/stream
</Stream>

Attribute	Description
`keepCallAlive`	Keep call active after stream ends (`true`/`false`)
`audioTrack`	Audio direction: `inbound`, `outbound`, or `both`
`contentType`	Audio format: `audio/x-mulaw;rate=8000` or `audio/x-l16;rate=16000`
`statusCallbackUrl`	URL for stream status webhooks

Troubleshooting

WebSocket Connection Issues

Verify ngrok is running and the URL matches your XML configuration
Check firewall rules allow WebSocket connections on your server
Validate SSL certificates if using custom domains

Audio Quality Issues

Use correct audio format - mu-law at 8kHz for standard telephony
Check sample rate matches between incoming and outgoing audio
Monitor latency - keep processing under 200ms for natural conversation

No Audio Received

Verify audioTrack is set to both or inbound in your XML
Check handler is registered before calling start()
Confirm call is connected - START event should fire first

Clone the Example Repositories

Full working examples are available in the SDK repositories:

Python
Node.js
Java

git clone https://github.com/plivo/plivo-stream-sdk-python.git
cd plivo-stream-sdk-python/examples/demo
pip install -r requirements.txt
python server.py

git clone https://github.com/plivo/plivo-stream-sdk-node.git
cd plivo-stream-sdk-node/examples/express-streaming
npm install
npm start

git clone https://github.com/plivo/plivo-stream-sdk-java.git
cd plivo-stream-sdk-java
mvn clean install
# Run the example in examples/voice-ai-agent

Deepgram + OpenAI + ElevenLabs Guide - Complete AI voice agent tutorial with full code examples
Audio Streaming API - API reference for audio streaming
Stream XML Element - XML configuration reference
OpenAI Realtime Integration - Alternative integration using OpenAI’s native voice API

Support

Plivo Documentation
Plivo Support for technical assistance
GitHub Issues for SDK bugs

Concepts

Integration Guides

Workflows

API Reference

XML Reference

Troubleshooting

Build AI Voice Agents with the Plivo Stream SDK

What You Can Build

Get Started with Plivo

Prerequisites

Required Accounts

Language Requirements

Installation

Core Concepts

Audio Streaming Flow

Event Types

Audio Formats

Quick Start

Step 1: Create a WebSocket Handler

Step 2: Configure Plivo to Stream Audio

Step 3: Set Up Local Development

SDK Reference

Sending Audio to Caller

Event Handlers

Getting Stream Information

Configuration Options

Environment Variables

Plivo Stream XML Options

Troubleshooting

WebSocket Connection Issues

Audio Quality Issues

No Audio Received

Clone the Example Repositories

Support

Concepts

Integration Guides

Workflows

API Reference

XML Reference

Troubleshooting

Documentation Index

​What You Can Build

​Get Started with Plivo

​Prerequisites

​Required Accounts

​Language Requirements

​Installation

​Core Concepts

​Audio Streaming Flow

​Event Types

​Audio Formats

​Quick Start

​Step 1: Create a WebSocket Handler

​Step 2: Configure Plivo to Stream Audio

​Step 3: Set Up Local Development

​SDK Reference

​Sending Audio to Caller

​Event Handlers

​Getting Stream Information

​Configuration Options

​Environment Variables

​Plivo Stream XML Options

​Troubleshooting

​WebSocket Connection Issues

​Audio Quality Issues

​No Audio Received

​Clone the Example Repositories

​Related

​Support

What You Can Build

Get Started with Plivo

Prerequisites

Required Accounts

Language Requirements

Installation

Core Concepts

Audio Streaming Flow

Event Types

Audio Formats

Quick Start

Step 1: Create a WebSocket Handler

Step 2: Configure Plivo to Stream Audio

Step 3: Set Up Local Development

SDK Reference

Sending Audio to Caller

Event Handlers

Getting Stream Information

Configuration Options

Environment Variables

Plivo Stream XML Options

Troubleshooting

WebSocket Connection Issues

Audio Quality Issues

No Audio Received

Clone the Example Repositories

Related

Support