Deepgram + Gemini + ElevenLabs

Build a voice agent using Deepgram for speech recognition, Google Gemini for conversation, and ElevenLabs for natural-sounding text-to-speech synthesis. Best for: Balance of cost efficiency and voice quality with multilingual support.

Prerequisites

Service	What You Need
Plivo	Auth ID, Auth Token, Voice-enabled phone number
Deepgram	API key from console.deepgram.com
Google	API key from AI Studio
ElevenLabs	API key from elevenlabs.io

Installation

pip install "pipecat-ai[deepgram,google,elevenlabs]"

Environment Variables

# Plivo credentials
PLIVO_AUTH_ID=your_auth_id
PLIVO_AUTH_TOKEN=your_auth_token
PLIVO_PHONE_NUMBER=+1234567890

# AI service credentials
DEEPGRAM_API_KEY=your_deepgram_key
GOOGLE_API_KEY=your_google_key
ELEVENLABS_API_KEY=your_elevenlabs_key

Pipeline Configuration

from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.google import GoogleLLMService
from pipecat.services.elevenlabs import ElevenLabsTTSService

# Speech-to-Text
stt = DeepgramSTTService(
    api_key=os.getenv("DEEPGRAM_API_KEY"),
)

# Language Model
llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    model="gemini-1.5-flash",  # or gemini-1.5-pro
)

# Text-to-Speech
tts = ElevenLabsTTSService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    voice_id="your_voice_id",  # Browse voices at elevenlabs.io/voice-library
)

Service Details

Deepgram STT

Real-time speech recognition with interim results and language detection.

Option	Description
`DeepgramSTTService`	Standard WebSocket transcription
`DeepgramFluxSTTService`	Enhanced turn detection for conversations

Google Gemini LLM

Streaming responses with function calling and multimodal input support.

Model	Description
`gemini-1.5-flash`	Fast, cost-effective
`gemini-1.5-pro`	Most capable
`gemini-2.0-flash-exp`	Latest experimental

Features:

Streaming responses
Function calling
Multimodal inputs (text, images)
OpenAI-compatible context format

ElevenLabs TTS

Natural voice synthesis with word-level timing and voice cloning support.

Feature	Description
WebSocket streaming	Real-time audio with low latency
Word-level timing	Precise synchronization
Voice cloning	Create custom voices
Multilingual	29+ languages supported

Service options:

ElevenLabsTTSService - WebSocket-based, recommended for real-time
ElevenLabsHttpTTSService - HTTP-based, simpler setup

Quick Start

Inbound Calls

git clone https://github.com/pipecat-ai/pipecat-examples.git
cd pipecat-examples/plivo-chatbot/inbound

# Configure environment
cp env.example .env
# Edit .env with your credentials

# Start server
uv sync && uv run server.py

# Expose with ngrok (development)
ngrok http 7860

Configure your Plivo number’s Answer URL to your ngrok URL.

Outbound Calls

cd pipecat-examples/plivo-chatbot/outbound

cp env.example .env
uv sync && uv run server.py

# Initiate a call
curl -X POST http://localhost:7860/start \
  -H "Content-Type: application/json" \
  -d '{"phone_number": "+1234567890"}'

Pipecat Overview - Architecture and setup
Deepgram Docs - STT configuration
Gemini Docs - LLM configuration
ElevenLabs Docs - TTS configuration

Concepts

Integration Guides

API Reference

XML Reference

Troubleshooting

Deepgram + Gemini + ElevenLabs

Prerequisites

Installation

Environment Variables

Pipeline Configuration

Service Details

Deepgram STT

Google Gemini LLM

ElevenLabs TTS

Quick Start

Inbound Calls

Outbound Calls

Concepts

Integration Guides

API Reference

XML Reference

Troubleshooting

​Prerequisites

​Installation

​Environment Variables

​Pipeline Configuration

​Service Details

​Deepgram STT

​Google Gemini LLM

​ElevenLabs TTS

​Quick Start

​Inbound Calls

​Outbound Calls

​Related

Prerequisites

Installation

Environment Variables

Pipeline Configuration

Service Details

Deepgram STT

Google Gemini LLM

ElevenLabs TTS

Quick Start

Inbound Calls

Outbound Calls

Related