Pipecat Overview

Pipecat is an open-source framework for building conversational AI agents. It orchestrates speech-to-text (STT), language models (LLM), and text-to-speech (TTS) services into a unified pipeline. Connect Plivo Audio Streaming to Pipecat to build AI voice agents that handle inbound and outbound phone calls.

How It Works

Phone Call ↔ Plivo ↔ WebSocket Stream ↔ Pipecat ↔ AI Services
                                            ├── STT (Deepgram)
                                            ├── LLM (OpenAI/Gemini)
                                            └── TTS (Cartesia/ElevenLabs)

Plivo handles phone call routing and streams real-time audio over WebSocket
Pipecat receives audio and orchestrates the AI pipeline
STT service converts speech to text
LLM processes the text and generates a response
TTS service converts the response back to speech
Plivo plays the audio to the caller

Choose Your Stack

Pipecat supports multiple AI service combinations. Choose based on your requirements:

Standard Pipelines (STT → LLM → TTS)

Guide	STT	LLM	TTS	Best For
OpenAI + Cartesia	Deepgram	OpenAI GPT-4o	Cartesia	Low latency, expressive voices
OpenAI + ElevenLabs	Deepgram	OpenAI GPT-4o	ElevenLabs	Natural voices, voice cloning
Gemini + Cartesia	Deepgram	Google Gemini	Cartesia	Cost-effective, fast responses
Gemini + ElevenLabs	Deepgram	Google Gemini	ElevenLabs	Balance of cost and voice quality

Speech-to-Speech (Direct Audio Processing)

Guide	Model	Best For
OpenAI Realtime	GPT-4o Realtime	Lowest latency, native multimodal
Gemini Live	Gemini Live	Multimodal with video support

Speech-to-speech models process audio directly without intermediate text conversion, resulting in lower latency and more natural conversations.

Prerequisites

Requirement	Description
Plivo Account	Sign up and get Auth ID and Auth Token
Phone Number	Purchase a voice-enabled number
Pipecat	Install via `pip install pipecat-ai`
AI Service Accounts	Credentials for your chosen STT, LLM, and TTS providers

Quick Start

1. Clone the Examples

git clone https://github.com/pipecat-ai/pipecat-examples.git
cd pipecat-examples/plivo-chatbot

2. Choose Inbound or Outbound

Inbound calls: cd inbound - Receive calls on your Plivo number
Outbound calls: cd outbound - Initiate calls programmatically

3. Configure Environment

cp env.example .env

Edit .env with your credentials (varies by provider stack).

4. Start the Server

uv sync && uv run server.py

5. Expose for Development (Inbound Only)

ngrok http 7860

Configure your Plivo number’s Answer URL to https://your-ngrok-url.ngrok.io/

Inbound vs Outbound Calls

Inbound Calls

Your Plivo number receives calls and connects them to your Pipecat bot. Setup:

Configure Answer URL on your Plivo number
Plivo sends call to your server
Server returns XML with <Stream> element
WebSocket connection established with Pipecat

Outbound Calls

Your application initiates calls to phone numbers. Setup:

Start your Pipecat server
Call the /start endpoint with target phone number
Plivo places the call and connects to your bot

curl -X POST http://localhost:7860/start \
  -H "Content-Type: application/json" \
  -d '{"phone_number": "+1234567890"}'

Pass custom data to your bot:

curl -X POST http://localhost:7860/start \
  -H "Content-Type: application/json" \
  -d '{
    "phone_number": "+1234567890",
    "user_name": "John",
    "context": "appointment reminder"
  }'

Access this data in your bot via runner_args.body.

Troubleshooting

Issue	Solution
Call doesn’t connect	Verify ngrok URL matches Plivo Answer URL
No audio	Check WebSocket connection in Pipecat logs
Bot not responding	Verify AI service API keys in `.env`
Authentication errors	Check Plivo Auth ID and Token

Debug logs:

Server logs: Terminal running server.py
Bot logs: bot_<room_name>.log files
Plivo logs: Console > Logs > Calls

Pipecat Documentation - Complete Pipecat setup
Pipecat Plivo Examples - Source code
Audio Streaming Overview - Plivo Audio Streaming docs

Concepts

Integration Guides

API Reference

XML Reference

Troubleshooting

Pipecat Overview

How It Works

Choose Your Stack

Standard Pipelines (STT → LLM → TTS)

Speech-to-Speech (Direct Audio Processing)

Prerequisites

Quick Start

1. Clone the Examples

2. Choose Inbound or Outbound

3. Configure Environment

4. Start the Server

5. Expose for Development (Inbound Only)

Inbound vs Outbound Calls

Inbound Calls

Outbound Calls

Troubleshooting

Concepts

Integration Guides

API Reference

XML Reference

Troubleshooting

​How It Works

​Choose Your Stack

​Standard Pipelines (STT → LLM → TTS)

​Speech-to-Speech (Direct Audio Processing)

​Prerequisites

​Quick Start

​1. Clone the Examples

​2. Choose Inbound or Outbound

​3. Configure Environment

​4. Start the Server

​5. Expose for Development (Inbound Only)

​Inbound vs Outbound Calls

​Inbound Calls

​Outbound Calls

​Troubleshooting

​Related

How It Works

Choose Your Stack

Standard Pipelines (STT → LLM → TTS)

Speech-to-Speech (Direct Audio Processing)

Prerequisites

Quick Start

1. Clone the Examples

2. Choose Inbound or Outbound

3. Configure Environment

4. Start the Server

5. Expose for Development (Inbound Only)

Inbound vs Outbound Calls

Inbound Calls

Outbound Calls

Troubleshooting

Related