How Speech-to-Speech Differs
Standard Pipeline (STT → LLM → TTS):Prerequisites
| Service | What You Need |
|---|---|
| Plivo | Auth ID, Auth Token, Voice-enabled phone number |
| OpenAI | API key from platform.openai.com with Realtime API access |
Installation
Environment Variables
Pipeline Configuration
OpenAI Realtime Features
| Feature | Description |
|---|---|
| Minimal latency | Direct audio processing for fastest response times |
| Voice activity detection | Multiple VAD options including semantic-based |
| Function calling | Seamless integration with external APIs |
| Multiple voices | Choose from built-in voice personalities |
| Context management | Advanced conversation flow handling |
Available Voices
| Voice | Description |
|---|---|
alloy | Neutral, balanced |
echo | Warm, friendly |
fable | Expressive, storytelling |
onyx | Deep, authoritative |
nova | Bright, energetic |
shimmer | Clear, professional |
Architecture
With OpenAI Realtime, the pipeline is simplified:- Speech recognition
- Language understanding
- Response generation
- Voice synthesis
Quick Start
Inbound Calls
Outbound Calls
When to Use OpenAI Realtime
Choose OpenAI Realtime when:- Latency is your top priority
- You want the simplest integration
- Built-in voices meet your needs
- You’re already using OpenAI
- You need specific voice characteristics (ElevenLabs cloning, Cartesia emotion)
- You want to mix providers for cost optimization
- You need fine-grained control over each component
Related
- Pipecat Overview - Architecture and setup
- OpenAI Realtime Docs - Full configuration
- OpenAI Realtime Guide - Official documentation