Skip to main content
The Plivo Stream SDK provides official libraries for Python, Node.js, and Java to build AI voice agents using Plivo’s Audio Streaming API. These SDKs handle WebSocket connections, audio encoding/decoding, and event management, letting you focus on your AI integration logic.

What You Can Build

  • AI Voice Assistants - Natural conversations powered by speech-to-text, LLMs, and text-to-speech
  • Real-time Transcription - Live call transcription with speech recognition services
  • Voice Bots - Automated IVR systems with intelligent responses
  • Call Analytics - Real-time audio analysis and sentiment detection

Get Started with Plivo

Before developing your AI voice agent, sign up for Plivo or sign in to your existing account. Purchase a voice-enabled number through the Plivo console.

Prerequisites

Required Accounts

  • Plivo - Account with Auth ID and Auth Token
  • Deepgram - Sign up for speech-to-text
  • OpenAI - Sign up for conversational AI
  • ElevenLabs - Sign up for text-to-speech

Language Requirements

  • Python 3.8 or later
  • pip package manager

Installation

pip install plivo-stream
The Python SDK supports two WebSocket implementations:
  • FastAPI - For production applications using ASGI
  • websockets - Lightweight option for simple use cases

Core Concepts

Audio Streaming Flow

┌─────────────┐    WebSocket    ┌─────────────┐    API Calls    ┌─────────────┐
│   Plivo     │ ───────────────▶│  Your App   │ ───────────────▶│  AI Services│
│   Call      │ ◀─────────────── │  (SDK)      │ ◀─────────────── │  STT/LLM/TTS│
└─────────────┘   Audio Events   └─────────────┘   Text/Audio    └─────────────┘
  1. Caller dials your Plivo number
  2. Plivo connects to your WebSocket endpoint
  3. SDK receives START event with stream metadata
  4. Audio flows as MEDIA events (base64-encoded mu-law)
  5. Your app processes audio through AI services
  6. SDK sends audio back to the caller

Event Types

EventDescription
STARTStream initialized with call metadata (stream ID, call UUID, from/to numbers)
MEDIAAudio chunk received (base64-encoded, mu-law at 8kHz or linear PCM at 16kHz)
DTMFCaller pressed a key on their phone
STOPStream ended

Audio Formats

FormatEncodingSample RateUse Case
audio/x-mulawmu-law8000 HzStandard telephony (default)
audio/x-l16Linear PCM16000 HzHigher quality for STT

Quick Start

Step 1: Create a WebSocket Handler

from fastapi import FastAPI, WebSocket
from plivo_stream import PlivoFastAPIStreamingHandler, StartEvent, MediaEvent

app = FastAPI()

@app.websocket("/stream")
async def websocket_endpoint(websocket: WebSocket):
    handler = PlivoFastAPIStreamingHandler(websocket)

    @handler.on_start
    async def handle_start(event: StartEvent):
        print(f"Stream started: {handler.get_stream_id()}")
        print(f"Call from: {event.start.from_}")
        print(f"Call to: {event.start.to}")

    @handler.on_media
    async def handle_media(event: MediaEvent):
        # Get raw audio bytes from the event
        audio_bytes = event.get_raw_media()

        # Process audio (send to STT, etc.)
        # ...

        # Send audio back to caller
        await handler.send_media(response_audio)

    @handler.on_dtmf
    async def handle_dtmf(event):
        print(f"DTMF digit pressed: {event.dtmf.digit}")

    @handler.on_stop
    async def handle_stop(event):
        print("Stream ended")

    await handler.start()

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=5000)

Step 2: Configure Plivo to Stream Audio

Create an XML application that routes calls to your WebSocket endpoint:
<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Speak>Connected to AI Assistant. You may begin speaking.</Speak>
    <Stream keepCallAlive="true" audioTrack="both" contentType="audio/x-mulaw;rate=8000">
        wss://your-domain.com/stream
    </Stream>
</Response>

Step 3: Set Up Local Development

For local testing, use ngrok to expose your WebSocket endpoint:
# Install ngrok
brew install ngrok  # macOS
# or download from https://ngrok.com/download

# Start tunnel
ngrok http 5000
Update your Plivo XML with the ngrok URL:
<Stream keepCallAlive="true" audioTrack="both">
    wss://abc123.ngrok.app/stream
</Stream>

Building an AI Voice Agent

This example shows a complete AI voice agent using Deepgram (STT), OpenAI (LLM), and ElevenLabs (TTS).
import asyncio
import os
from fastapi import FastAPI, WebSocket
from plivo_stream import PlivoFastAPIStreamingHandler, StartEvent, MediaEvent
from deepgram import DeepgramClient, LiveTranscriptionEvents
from openai import AsyncOpenAI
from elevenlabs import ElevenLabs

app = FastAPI()

# Initialize AI service clients
deepgram = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])
openai_client = AsyncOpenAI(api_key=os.environ["OPENAI_API_KEY"])
elevenlabs = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])

SYSTEM_PROMPT = """You are a helpful AI voice assistant. Keep responses
concise and conversational. Respond naturally as if speaking on a phone call."""

@app.websocket("/stream")
async def websocket_endpoint(websocket: WebSocket):
    handler = PlivoFastAPIStreamingHandler(websocket)
    conversation_history = []
    audio_buffer = bytearray()

    # Set up Deepgram live transcription
    dg_connection = deepgram.listen.live.v("1")

    @dg_connection.on(LiveTranscriptionEvents.Transcript)
    async def on_transcript(result):
        transcript = result.channel.alternatives[0].transcript
        if transcript and result.is_final:
            # Got final transcript, process with LLM
            await process_with_ai(transcript)

    async def process_with_ai(user_text: str):
        conversation_history.append({"role": "user", "content": user_text})

        # Get response from OpenAI
        response = await openai_client.chat.completions.create(
            model="gpt-4",
            messages=[
                {"role": "system", "content": SYSTEM_PROMPT},
                *conversation_history
            ]
        )

        assistant_text = response.choices[0].message.content
        conversation_history.append({"role": "assistant", "content": assistant_text})

        # Convert to speech with ElevenLabs
        audio = elevenlabs.text_to_speech.convert(
            text=assistant_text,
            voice_id="21m00Tcm4TlvDq8ikWAM",  # Rachel voice
            model_id="eleven_turbo_v2",
            output_format="ulaw_8000"  # mu-law for Plivo
        )

        # Send audio back to caller
        audio_bytes = b"".join(audio)
        await handler.send_media(audio_bytes)

    @handler.on_start
    async def handle_start(event: StartEvent):
        print(f"Call started from {event.start.from_}")
        await dg_connection.start()

    @handler.on_media
    async def handle_media(event: MediaEvent):
        # Forward audio to Deepgram for transcription
        audio_bytes = event.get_raw_media()
        await dg_connection.send(audio_bytes)

    @handler.on_stop
    async def handle_stop(event):
        await dg_connection.finish()

    await handler.start()

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=5000)

SDK Reference

Sending Audio to Caller

# Send audio bytes to the caller
await handler.send_media(audio_bytes)

# Send a checkpoint (receive callback when audio finishes playing)
await handler.send_checkpoint(name="greeting_complete")

# Clear any queued audio (useful for interruptions)
await handler.send_clear_audio()

Event Handlers

EventHandlerDescription
Connectionon_connected / onConnectionWebSocket connected (before START)
Starton_start / onStartStream initialized, call metadata available
Mediaon_media / onMediaAudio chunk received
DTMFon_dtmf / onDtmfKeypad digit pressed
Stopon_stop / onStopStream ended
Checkpointon_played_stream / onPlayedStreamCheckpoint reached (audio finished)
Audio Clearedon_cleared_audio / onClearedAudioAudio queue cleared

Getting Stream Information

@handler.on_start
async def handle_start(event: StartEvent):
    stream_id = handler.get_stream_id()
    call_uuid = event.start.call_id
    from_number = event.start.from_
    to_number = event.start.to
    content_type = event.start.media_format.encoding  # audio/x-mulaw
    sample_rate = event.start.media_format.sample_rate  # 8000

Configuration Options

Environment Variables

Create a .env file with your credentials:
# Plivo credentials
PLIVO_AUTH_ID=your_auth_id
PLIVO_AUTH_TOKEN=your_auth_token

# AI service credentials
DEEPGRAM_API_KEY=your_deepgram_key
OPENAI_API_KEY=your_openai_key
ELEVENLABS_API_KEY=your_elevenlabs_key

Plivo Stream XML Options

<Stream
    keepCallAlive="true"
    audioTrack="both"
    contentType="audio/x-mulaw;rate=8000"
    statusCallbackUrl="https://your-domain.com/stream-status"
    statusCallbackMethod="POST">
    wss://your-domain.com/stream
</Stream>
AttributeDescription
keepCallAliveKeep call active after stream ends (true/false)
audioTrackAudio direction: inbound, outbound, or both
contentTypeAudio format: audio/x-mulaw;rate=8000 or audio/x-l16;rate=16000
statusCallbackUrlURL for stream status webhooks

Troubleshooting

WebSocket Connection Issues

  1. Verify ngrok is running and the URL matches your XML configuration
  2. Check firewall rules allow WebSocket connections on your server
  3. Validate SSL certificates if using custom domains

Audio Quality Issues

  1. Use correct audio format - mu-law at 8kHz for standard telephony
  2. Check sample rate matches between incoming and outgoing audio
  3. Monitor latency - keep processing under 200ms for natural conversation

No Audio Received

  1. Verify audioTrack is set to both or inbound in your XML
  2. Check handler is registered before calling start()
  3. Confirm call is connected - START event should fire first

Clone the Example Repositories

Full working examples are available in the SDK repositories:
git clone https://github.com/plivo/plivo-stream-sdk-python.git
cd plivo-stream-sdk-python/examples/demo
pip install -r requirements.txt
python server.py

Support