Prerequisites
| Service | What You Need |
|---|---|
| Plivo | Auth ID, Auth Token, Voice-enabled phone number |
| Deepgram | API key from console.deepgram.com |
| API key from AI Studio | |
| ElevenLabs | API key from elevenlabs.io |
Installation
Environment Variables
Pipeline Configuration
Service Details
Deepgram STT
Real-time speech recognition with interim results and language detection.| Option | Description |
|---|---|
DeepgramSTTService | Standard WebSocket transcription |
DeepgramFluxSTTService | Enhanced turn detection for conversations |
Google Gemini LLM
Streaming responses with function calling and multimodal input support.| Model | Description |
|---|---|
gemini-1.5-flash | Fast, cost-effective |
gemini-1.5-pro | Most capable |
gemini-2.0-flash-exp | Latest experimental |
- Streaming responses
- Function calling
- Multimodal inputs (text, images)
- OpenAI-compatible context format
ElevenLabs TTS
Natural voice synthesis with word-level timing and voice cloning support.| Feature | Description |
|---|---|
| WebSocket streaming | Real-time audio with low latency |
| Word-level timing | Precise synchronization |
| Voice cloning | Create custom voices |
| Multilingual | 29+ languages supported |
ElevenLabsTTSService- WebSocket-based, recommended for real-timeElevenLabsHttpTTSService- HTTP-based, simpler setup
Quick Start
Inbound Calls
Outbound Calls
Related
- Pipecat Overview - Architecture and setup
- Deepgram Docs - STT configuration
- Gemini Docs - LLM configuration
- ElevenLabs Docs - TTS configuration