Skip to main content
Build a voice agent using Deepgram for speech recognition, Google Gemini for conversation, and ElevenLabs for natural-sounding text-to-speech synthesis. Best for: Balance of cost efficiency and voice quality with multilingual support.

Prerequisites

ServiceWhat You Need
PlivoAuth ID, Auth Token, Voice-enabled phone number
DeepgramAPI key from console.deepgram.com
GoogleAPI key from AI Studio
ElevenLabsAPI key from elevenlabs.io

Installation

pip install "pipecat-ai[deepgram,google,elevenlabs]"

Environment Variables

# Plivo credentials
PLIVO_AUTH_ID=your_auth_id
PLIVO_AUTH_TOKEN=your_auth_token
PLIVO_PHONE_NUMBER=+1234567890

# AI service credentials
DEEPGRAM_API_KEY=your_deepgram_key
GOOGLE_API_KEY=your_google_key
ELEVENLABS_API_KEY=your_elevenlabs_key

Pipeline Configuration

from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.google import GoogleLLMService
from pipecat.services.elevenlabs import ElevenLabsTTSService

# Speech-to-Text
stt = DeepgramSTTService(
    api_key=os.getenv("DEEPGRAM_API_KEY"),
)

# Language Model
llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    model="gemini-1.5-flash",  # or gemini-1.5-pro
)

# Text-to-Speech
tts = ElevenLabsTTSService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    voice_id="your_voice_id",  # Browse voices at elevenlabs.io/voice-library
)

Service Details

Deepgram STT

Real-time speech recognition with interim results and language detection.
OptionDescription
DeepgramSTTServiceStandard WebSocket transcription
DeepgramFluxSTTServiceEnhanced turn detection for conversations

Google Gemini LLM

Streaming responses with function calling and multimodal input support.
ModelDescription
gemini-1.5-flashFast, cost-effective
gemini-1.5-proMost capable
gemini-2.0-flash-expLatest experimental
Features:
  • Streaming responses
  • Function calling
  • Multimodal inputs (text, images)
  • OpenAI-compatible context format

ElevenLabs TTS

Natural voice synthesis with word-level timing and voice cloning support.
FeatureDescription
WebSocket streamingReal-time audio with low latency
Word-level timingPrecise synchronization
Voice cloningCreate custom voices
Multilingual29+ languages supported
Service options:
  • ElevenLabsTTSService - WebSocket-based, recommended for real-time
  • ElevenLabsHttpTTSService - HTTP-based, simpler setup

Quick Start

Inbound Calls

git clone https://github.com/pipecat-ai/pipecat-examples.git
cd pipecat-examples/plivo-chatbot/inbound

# Configure environment
cp env.example .env
# Edit .env with your credentials

# Start server
uv sync && uv run server.py

# Expose with ngrok (development)
ngrok http 7860
Configure your Plivo number’s Answer URL to your ngrok URL.

Outbound Calls

cd pipecat-examples/plivo-chatbot/outbound

cp env.example .env
uv sync && uv run server.py

# Initiate a call
curl -X POST http://localhost:7860/start \
  -H "Content-Type: application/json" \
  -d '{"phone_number": "+1234567890"}'