Deepgram + OpenAI + Cartesia

Build a voice agent using Deepgram for speech recognition, OpenAI GPT-4o for conversation, and Cartesia for text-to-speech synthesis. Best for: Low latency applications requiring expressive, controllable voices.

Prerequisites

Service	What You Need
Plivo	Auth ID, Auth Token, Voice-enabled phone number
Deepgram	API key from console.deepgram.com
OpenAI	API key from platform.openai.com
Cartesia	API key from play.cartesia.ai

Installation

pip install "pipecat-ai[deepgram,openai,cartesia]"

Environment Variables

# Plivo credentials
PLIVO_AUTH_ID=your_auth_id
PLIVO_AUTH_TOKEN=your_auth_token
PLIVO_PHONE_NUMBER=+1234567890

# AI service credentials
DEEPGRAM_API_KEY=your_deepgram_key
OPENAI_API_KEY=sk-your_openai_key
CARTESIA_API_KEY=your_cartesia_key

Pipeline Configuration

from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.openai import OpenAILLMService
from pipecat.services.cartesia import CartesiaTTSService

# Speech-to-Text
stt = DeepgramSTTService(
    api_key=os.getenv("DEEPGRAM_API_KEY"),
    # Optional: Use Deepgram Flux for better turn detection
    # from pipecat.services.deepgram import DeepgramFluxSTTService
)

# Language Model
llm = OpenAILLMService(
    api_key=os.getenv("OPENAI_API_KEY"),
    model="gpt-4o",
)

# Text-to-Speech
tts = CartesiaTTSService(
    api_key=os.getenv("CARTESIA_API_KEY"),
    voice_id="your_voice_id",  # Browse voices at play.cartesia.ai
)

Service Details

Deepgram STT

Real-time speech recognition with interim results and language detection.

Option	Description
`DeepgramSTTService`	Standard WebSocket transcription
`DeepgramFluxSTTService`	Enhanced turn detection for conversations

Tip: Use DeepgramFluxSTTService with ExternalUserTurnStrategies for better conversation flow.

OpenAI LLM

Chat completion with GPT-4o supporting streaming responses and function calling.

Model	Description
`gpt-4o`	Most capable, multimodal
`gpt-4o-mini`	Faster, cost-effective
`gpt-4-turbo`	Previous generation

Cartesia TTS

Real-time voice synthesis with word-level timing and interruption handling.

Feature	Method
Spell out text	`SPELL("ABC")`
Add emotion	`EMOTION_TAG("SARCASM")`
Insert pause	`PAUSE_TAG(0.5)`
Adjust speed	`SPEED_TAG(1.2)`
Adjust volume	`VOLUME_TAG(0.8)`

Quick Start

Inbound Calls

git clone https://github.com/pipecat-ai/pipecat-examples.git
cd pipecat-examples/plivo-chatbot/inbound

# Configure environment
cp env.example .env
# Edit .env with your credentials

# Start server
uv sync && uv run server.py

# Expose with ngrok (development)
ngrok http 7860

Configure your Plivo number’s Answer URL to your ngrok URL.

Outbound Calls

cd pipecat-examples/plivo-chatbot/outbound

cp env.example .env
uv sync && uv run server.py

# Initiate a call
curl -X POST http://localhost:7860/start \
  -H "Content-Type: application/json" \
  -d '{"phone_number": "+1234567890"}'

Pipecat Overview - Architecture and setup
Deepgram Docs - STT configuration
OpenAI Docs - LLM configuration
Cartesia Docs - TTS configuration

​Prerequisites

​Installation

​Environment Variables

​Pipeline Configuration

​Service Details

​Deepgram STT

​OpenAI LLM

​Cartesia TTS

​Quick Start

​Inbound Calls

​Outbound Calls

​Related