Documentation Index
Fetch the complete documentation index at: https://plivo.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Build a voice agent using OpenAI’s GPT-4o Realtime API for native speech-to-speech processing. OpenAI Realtime processes audio directly without intermediate text conversion, delivering the lowest latency voice conversations.
Best for: Applications requiring minimal latency and native multimodal AI capabilities.
How Speech-to-Speech Differs
Standard Pipeline (STT → LLM → TTS):
Audio → Deepgram → Text → OpenAI → Text → Cartesia → Audio
Speech-to-Speech (Direct):
Audio → OpenAI Realtime → Audio
Speech-to-speech models process audio natively, preserving tone, emotion, and context that may be lost in text transcription.
Prerequisites
| Service | What You Need |
|---|
| Plivo | Auth ID, Auth Token, Voice-enabled phone number |
| OpenAI | API key from platform.openai.com with Realtime API access |
Installation
pip install "pipecat-ai[openai]"
Environment Variables
# Plivo credentials
PLIVO_AUTH_ID=your_auth_id
PLIVO_AUTH_TOKEN=your_auth_token
PLIVO_PHONE_NUMBER=+1234567890
# OpenAI credentials
OPENAI_API_KEY=sk-your_openai_key
Pipeline Configuration
from pipecat.services.openai import OpenAIRealtimeLLMService
# Speech-to-Speech service
llm = OpenAIRealtimeLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
# voice="alloy", # Choose voice: alloy, echo, fable, onyx, nova, shimmer
)
OpenAI Realtime Features
| Feature | Description |
|---|
| Minimal latency | Direct audio processing for fastest response times |
| Voice activity detection | Multiple VAD options including semantic-based |
| Function calling | Seamless integration with external APIs |
| Multiple voices | Choose from built-in voice personalities |
| Context management | Advanced conversation flow handling |
Available Voices
| Voice | Description |
|---|
alloy | Neutral, balanced |
echo | Warm, friendly |
fable | Expressive, storytelling |
onyx | Deep, authoritative |
nova | Bright, energetic |
shimmer | Clear, professional |
Architecture
With OpenAI Realtime, the pipeline is simplified:
Phone Call ↔ Plivo ↔ WebSocket ↔ Pipecat ↔ OpenAI Realtime
A single service handles:
- Speech recognition
- Language understanding
- Response generation
- Voice synthesis
Quick Start
Inbound Calls
git clone https://github.com/pipecat-ai/pipecat-examples.git
cd pipecat-examples/plivo-chatbot/inbound
# Configure environment
cp env.example .env
# Edit .env with Plivo and OpenAI credentials
# Modify bot.py to use OpenAIRealtimeLLMService
# Start server
uv sync && uv run server.py
# Expose with ngrok (development)
ngrok http 7860
Configure your Plivo number’s Answer URL to your ngrok URL.
Outbound Calls
cd pipecat-examples/plivo-chatbot/outbound
cp env.example .env
uv sync && uv run server.py
# Initiate a call
curl -X POST http://localhost:7860/start \
-H "Content-Type: application/json" \
-d '{"phone_number": "+1234567890"}'
When to Use OpenAI Realtime
Choose OpenAI Realtime when:
- Latency is your top priority
- You want the simplest integration
- Built-in voices meet your needs
- You’re already using OpenAI
Choose standard STT → LLM → TTS when:
- You need specific voice characteristics (ElevenLabs cloning, Cartesia emotion)
- You want to mix providers for cost optimization
- You need fine-grained control over each component