Deepgram + Gemini + ElevenLabs

Build a voice agent using Deepgram for speech recognition, Google Gemini for conversation, and ElevenLabs for natural-sounding text-to-speech synthesis. Best for: Balance of cost efficiency and voice quality with multilingual support.

Prerequisites

Service	What You Need
Plivo	Auth ID, Auth Token, Voice-enabled phone number
Deepgram	API key from console.deepgram.com
Google	API key from AI Studio
ElevenLabs	API key from elevenlabs.io

Installation

pip install "pipecat-ai[deepgram,google,elevenlabs]"

Environment Variables

# Plivo credentials
PLIVO_AUTH_ID=your_auth_id
PLIVO_AUTH_TOKEN=your_auth_token
PLIVO_PHONE_NUMBER=+1234567890

# AI service credentials
DEEPGRAM_API_KEY=your_deepgram_key
GOOGLE_API_KEY=your_google_key
ELEVENLABS_API_KEY=your_elevenlabs_key

Pipeline Configuration

from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.google import GoogleLLMService
from pipecat.services.elevenlabs import ElevenLabsTTSService

# Speech-to-Text
stt = DeepgramSTTService(
    api_key=os.getenv("DEEPGRAM_API_KEY"),
)

# Language Model
llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    model="gemini-1.5-flash",  # or gemini-1.5-pro
)

# Text-to-Speech
tts = ElevenLabsTTSService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    voice_id="your_voice_id",  # Browse voices at elevenlabs.io/voice-library
)

Service Details

Deepgram STT

Real-time speech recognition with interim results and language detection.

Option	Description
`DeepgramSTTService`	Standard WebSocket transcription
`DeepgramFluxSTTService`	Enhanced turn detection for conversations

Google Gemini LLM

Streaming responses with function calling and multimodal input support.

Model	Description
`gemini-1.5-flash`	Fast, cost-effective
`gemini-1.5-pro`	Most capable
`gemini-2.0-flash-exp`	Latest experimental

Features:

Streaming responses
Function calling
Multimodal inputs (text, images)
OpenAI-compatible context format

ElevenLabs TTS

Natural voice synthesis with word-level timing and voice cloning support.

Feature	Description
WebSocket streaming	Real-time audio with low latency
Word-level timing	Precise synchronization
Voice cloning	Create custom voices
Multilingual	29+ languages supported

Service options:

ElevenLabsTTSService - WebSocket-based, recommended for real-time
ElevenLabsHttpTTSService - HTTP-based, simpler setup

Quick Start

Inbound Calls

git clone https://github.com/pipecat-ai/pipecat-examples.git
cd pipecat-examples/plivo-chatbot/inbound

# Configure environment
cp env.example .env
# Edit .env with your credentials

# Start server
uv sync && uv run server.py

# Expose with ngrok (development)
ngrok http 7860

Configure your Plivo number’s Answer URL to your ngrok URL.

Outbound Calls

cd pipecat-examples/plivo-chatbot/outbound

cp env.example .env
uv sync && uv run server.py

# Initiate a call
curl -X POST http://localhost:7860/start \
  -H "Content-Type: application/json" \
  -d '{"phone_number": "+1234567890"}'

Pipecat Overview - Architecture and setup
Deepgram Docs - STT configuration
Gemini Docs - LLM configuration
ElevenLabs Docs - TTS configuration

Concepts

Integration Guides

Workflows

API Reference

XML Reference

Troubleshooting

Deepgram + Gemini + ElevenLabs

Prerequisites

Installation

Environment Variables

Pipeline Configuration

Service Details

Deepgram STT

Google Gemini LLM

ElevenLabs TTS

Quick Start

Inbound Calls

Outbound Calls

Concepts

Integration Guides

Workflows

API Reference

XML Reference

Troubleshooting

Documentation Index

​Prerequisites

​Installation

​Environment Variables

​Pipeline Configuration

​Service Details

​Deepgram STT

​Google Gemini LLM

​ElevenLabs TTS

​Quick Start

​Inbound Calls

​Outbound Calls

​Related

Prerequisites

Installation

Environment Variables

Pipeline Configuration

Service Details

Deepgram STT

Google Gemini LLM

ElevenLabs TTS

Quick Start

Inbound Calls

Outbound Calls

Related