Prerequisites
| Service | What You Need |
|---|---|
| Plivo | Auth ID, Auth Token, Voice-enabled phone number |
| Deepgram | API key from console.deepgram.com |
| OpenAI | API key from platform.openai.com |
| Rime | API key from rime.ai |
Installation
Environment Variables
Pipeline Configuration
Service Details
Deepgram STT
Real-time speech recognition with interim results and language detection.| Option | Description |
|---|---|
DeepgramSTTService | Standard WebSocket transcription |
DeepgramFluxSTTService | Enhanced turn detection for conversations |
OpenAI LLM
Chat completion with GPT-4o supporting streaming responses and function calling.| Model | Description |
|---|---|
gpt-4o | Most capable, multimodal |
gpt-4o-mini | Faster, cost-effective |
Rime TTS
Real-time voice synthesis with word-level timing and precise pronunciation control.| Feature | Method |
|---|---|
| Spell out text | SPELL("ABC") |
| Insert pause | PAUSE_TAG(0.5) |
| Custom pronunciation | PRONOUNCE(text, word, phoneme) |
| Adjust speed inline | INLINE_SPEED(text, 1.2) |
RimeTTSService- WebSocket-based, real-time with word timestampsRimeHttpTTSService- HTTP-based, simpler setup
Pronunciation Control
Rime excels at precise pronunciation control for names, technical terms, and branded content.Custom Pronunciations
Spelling Out Text
Dynamic Speed Control
Quick Start
Inbound Calls
Outbound Calls
When to Use Rime
Choose Rime when:- You need precise control over pronunciation
- Your content includes technical terms, names, or branded words
- You want word-level timing for synchronized experiences
- You need inline speed adjustments
- You need emotion/expression controls
- You want voice cloning capabilities
- You need broader multilingual support
Related
- Pipecat Overview - Architecture and setup
- Deepgram Docs - STT configuration
- OpenAI Docs - LLM configuration
- Rime Docs - TTS configuration