ElevenLabs Voice AI: TTS Configuration Guide
Complete guide to ElevenLabs API integration with Burki Voice AI. Configure voice cloning, optimize latency, and reduce costs with our step-by-step TTS setup guide.
Table of Contents▼
If you have spent any time evaluating text-to-speech providers, you already know the truth: ElevenLabs produces the most natural, human-like voices in the industry. When we built Burki's TTS integration layer, ElevenLabs was the first provider we added. The voices simply sound better than anything else available.
But having great voices is only half the battle. Configuring ElevenLabs correctly for real-time voice AI conversations requires understanding the nuances of their API, pricing structure, and performance characteristics. This guide walks you through everything you need to know to get ElevenLabs running optimally on Burki.
ElevenLabs Features That Matter for Voice AI
Before diving into configuration, let's establish what makes ElevenLabs particularly suited for conversational AI applications.
Voice Quality That Actually Works
ElevenLabs offers over 100 pre-built voices across multiple languages. More importantly, their voices handle conversational speech patterns well. They do not sound robotic when reading fragments, handling interruptions, or generating short responses. This matters enormously in voice AI where responses are often just a few words.
The voice library includes:
- Multilingual Support: 32 languages with native-sounding output
- Emotional Range: Voices can express frustration, excitement, sympathy
- Speaking Styles: From professional to casual, energetic to calm
- Age and Gender Diversity: Young, old, male, female, and everything in between
Voice Cloning Capabilities
ElevenLabs offers two tiers of voice cloning that fundamentally change what you can build.
Instant Voice Cloning creates a usable voice clone from just 1-2 minutes of audio. The quality is good enough for many applications, and the turnaround is essentially immediate. This is included on all paid plans starting at $5/month.
Professional Voice Cloning requires 30+ minutes of audio (2-3 hours recommended) but produces results that are nearly indistinguishable from the original speaker. This is the option you want for brand voices, executive assistants, or any application where voice authenticity is critical.
The AI captures everything about the source voice: cadence, tonality, pause patterns, breathing sounds, and even verbal tics like "um" and "ah" if they appear in your training data.
Safety and Verification
ElevenLabs requires voice verification before creating clones. You must record yourself reading an authorization message, which prevents unauthorized cloning of other people's voices. This is both a legal compliance feature and a practical safeguard.
ElevenLabs Pricing Breakdown (2026)
Understanding pricing is essential for cost optimization. ElevenLabs uses a credit-based system where 1 character equals 1 credit for their standard models.
| Plan | Monthly Cost | Credits | Key Features |
|---|---|---|---|
| Free | $0 | 10,000 | Non-commercial, instant cloning |
| Starter | $5 | 30,000 | Commercial license, instant cloning |
| Creator | $11 | 100,000 | Professional voice cloning, 192kbps |
| Pro | $99 | 500,000 | 44.1kHz PCM, production-scale |
| Scale | $330 | Millions | Multi-seat, low-latency TTS |
| Business | $1,320 | Millions | Full enterprise features |
| Enterprise | Custom | Custom | SLAs, HIPAA/BAA, dedicated support |
Annual billing saves you 2 months (roughly 16.7% discount).
Practical Cost Estimates:
- Average AI response: 100-200 characters
- Cost per response at Creator tier: ~0.011-0.022 cents
- Cost per minute of conversation: 3-5 cents (assuming back-and-forth dialog)
For most voice AI applications, the Creator tier ($11/month) provides excellent value. You get professional voice cloning and 192kbps audio quality, which is indistinguishable from higher bitrates on phone calls.
Burki + ElevenLabs Integration
Burki supports ElevenLabs as a first-class TTS provider with full feature support. Here is how the integration works.
Supported Features
| Feature | Burki Support |
|---|---|
| Voice Selection | Browse 100+ voices |
| Emotion Control | Full parameter access |
| Voice Cloning | Both instant and professional |
| Streaming | Real-time audio streaming |
| BYO API Keys | Per-assistant configuration |
Configuration Options in Burki
When setting up an ElevenLabs voice in Burki, you can configure:
Voice Parameters:
- Stability (0-1): Higher values produce more consistent output, lower values add expressiveness
- Similarity Boost (0-1): Controls how closely output matches the original voice model
- Style (0-1): Amplifies the voice's stylistic characteristics
- Speaker Boost: Enhances clarity for specific speaker characteristics
Model Selection:
- Turbo v2: Lowest latency, good quality
- Multilingual v2: Best quality, higher latency
- Turbo v2.5: Balanced option for most use cases
Setting Up Your ElevenLabs API Key
Burki supports bring-your-own (BYO) API keys at both the organization and assistant level. This gives you full control over billing and usage tracking.
To configure:
- Generate an API key from your ElevenLabs dashboard
- In Burki, navigate to Organization Settings > Provider Credentials
- Add your ElevenLabs API key
- Optionally, override at the assistant level for specific use cases
Your API key is encrypted at rest using AES-256 encryption. Burki never logs or exposes your credentials.
Voice Selection Guide
Choosing the right voice is one of the most important decisions you will make. Here is a practical framework.
For Customer Support
Priority: Clarity, warmth, neutral accent
Recommended voices:
- Rachel: Professional American female, excellent for technical support
- Josh: Friendly American male, good for conversational support
- Elli: Young female, works well for startup/tech companies
Parameter settings:
- Stability: 0.75 (consistent but not robotic)
- Similarity Boost: 0.75
- Style: 0.30 (natural without being dramatic)
For Sales and Outbound
Priority: Energy, persuasion, confidence
Recommended voices:
- Antoni: Confident American male, good closing energy
- Domi: Assertive female, projects authority
- Bella: Warm female, excellent for relationship-building
Parameter settings:
- Stability: 0.50 (more expressive for engagement)
- Similarity Boost: 0.80
- Style: 0.50 (more personality)
For Appointments and Scheduling
Priority: Professional, efficient, clear
Recommended voices:
- Sam: Neutral, professional
- Fin: Slightly robotic but extremely clear
- Charlotte: Formal British English, professional
Parameter settings:
- Stability: 0.85 (very consistent)
- Similarity Boost: 0.70
- Style: 0.20 (minimal personality)
Language Considerations
If your assistant needs to speak a language other than English, select a voice native to that language. Using an English voice to speak Spanish will produce an English accent, which may not be ideal.
ElevenLabs supports 32 languages. For multilingual deployments, create separate assistants with language-appropriate voices rather than trying to force one voice across languages.
Voice Cloning Setup
Voice cloning unlocks powerful personalization options. Here is how to set it up properly.
Instant Voice Clone (Quick Setup)
Requirements:
- 1-2 minutes of clear audio
- No background noise, reverb, or artifacts
- 22kHz+ sample rate recommended
- MP3, WAV, FLAC, M4A, or OGG format
Steps:
- Prepare your audio sample
- Navigate to Burki's Voice Library
- Click "Clone Voice" and select ElevenLabs as the provider
- Upload your audio file
- Complete the voice verification prompt
- Your cloned voice is ready in seconds
Tips for better instant clones:
- Use a sample that demonstrates your full vocal range
- Include varied intonation (questions, statements, emphasis)
- Record in a quiet room with no echo
- Speak at your natural pace
Professional Voice Clone (High Quality)
Requirements:
- Minimum 30 minutes of audio (2-3 hours recommended)
- Consistent recording environment across samples
- High-quality microphone (condenser recommended)
- Paid plan (Creator tier or above)
Steps:
- Gather your audio corpus (podcasts, recordings, read scripts)
- Ensure consistent audio quality across all samples
- Upload through Burki's voice cloning interface
- Processing takes 2-5 minutes
- Test extensively before deploying
What to include in training data:
- Various emotional states (neutral, happy, serious)
- Different sentence types (questions, commands, explanations)
- Technical vocabulary relevant to your use case
- Natural speech patterns (not overly scripted)
Cloned Voice Management
Once created, your cloned voices can be:
- Assigned to multiple assistants
- Tested with sample text before deployment
- Tracked for usage and costs
- Shared across your organization
Burki maintains a voice library where you can organize clones with tags, descriptions, and quality ratings.
Latency Optimization
For conversational AI, latency is everything. Users perceive delays over 400ms as unnatural. Here is how to minimize TTS latency with ElevenLabs.
Model Selection
| Model | Latency | Quality | Use Case |
|---|---|---|---|
| Turbo v2 | ~200ms | Good | Speed-critical applications |
| Turbo v2.5 | ~300ms | Very Good | Most voice AI applications |
| Multilingual v2 | ~500ms | Excellent | Quality-first applications |
For most voice AI use cases on Burki, Turbo v2.5 provides the best balance. The quality difference from Multilingual v2 is barely perceptible in real-time conversation.
Streaming Configuration
Burki streams TTS audio in real-time, meaning playback begins before the entire response is generated. This dramatically reduces perceived latency.
To optimize streaming:
- Enable chunked responses in your LLM configuration
- Use sentence-level streaming where possible
- Set appropriate buffer sizes (Burki handles this automatically)
Geographic Optimization
ElevenLabs operates multiple regions. Burki automatically routes to the nearest endpoint, but you can verify your configuration:
- US users: us-east or us-west
- EU users: eu-west
- APAC users: Consider latency trade-offs
Warmup and Connection Pooling
Burki maintains warm service pools for ElevenLabs connections. This eliminates cold-start latency on new requests. The service pool:
- Reuses expensive service instances across calls
- Automatically removes idle services after 5 minutes
- Tracks pool statistics for monitoring
This optimization reduces first-response latency by 50-100ms in typical deployments.
Cost Optimization Strategies
With proper optimization, you can significantly reduce your ElevenLabs spend without sacrificing quality.
Character Count Reduction
Every character costs credits. Reduce waste by:
- Trimming filler words: Remove unnecessary "Well," "So," and "Actually" from LLM prompts
- Using contractions: "I'm" vs "I am" saves 2 characters per instance
- Concise responses: Tune your LLM to be brief without being terse
Model Tiering
Not every response needs the highest quality model. Consider:
- Turbo v2 for acknowledgments ("Got it," "One moment")
- Turbo v2.5 for standard responses
- Multilingual v2 for complex explanations or sensitive conversations
Burki does not currently support automatic model switching, but you can create separate assistants with different TTS configurations for different call types.
Caching Common Phrases
For frequently used phrases (greetings, hold messages, error responses), pre-generate and cache the audio. Burki's hold audio system already does this for hold messages and music.
Plan Selection
Based on your volume:
| Monthly Characters | Recommended Plan | Cost per Character |
|---|---|---|
| Under 30K | Starter ($5) | $0.00017 |
| 30K - 100K | Creator ($11) | $0.00011 |
| 100K - 500K | Pro ($99) | $0.00020 |
| Over 500K | Scale/Business | Negotiate |
The Creator tier offers the best value for small to medium deployments. Scale becomes cost-effective only at very high volumes.
BYO Keys for Cost Control
Using Burki's BYO API key feature gives you:
- Direct billing relationship with ElevenLabs
- Access to your existing volume discounts
- Clearer usage tracking
- No markup on TTS costs (only Burki platform fees)
Frequently Asked Questions
Can I use my existing ElevenLabs voices in Burki?
Yes. Any voices in your ElevenLabs account, including cloned voices, are accessible when you configure your API key in Burki. The voice library syncs automatically.
What audio quality does Burki support?
Burki supports up to 44.1kHz PCM audio when using ElevenLabs Pro tier or higher. For phone calls, 8kHz mu-law is sufficient, so higher quality settings primarily benefit web-based voice interfaces.
How does Burki handle ElevenLabs rate limits?
Burki implements automatic retry logic with exponential backoff. If you hit rate limits, Burki will queue requests and retry. For high-volume applications, consider the Scale or Business tiers which have higher rate limits.
Can I switch between TTS providers mid-call?
Not currently. TTS provider is configured at the assistant level and remains constant throughout a call. However, you can use multi-assistant graphs to route different call types to assistants with different TTS configurations.
What happens if ElevenLabs is down?
Burki supports fallback providers. You can configure backup TTS providers (Deepgram, Cartesia, OpenAI) that activate if your primary provider fails. Fallback configuration is set at the assistant level.
How accurate is voice cloning for different accents?
ElevenLabs voice cloning preserves accents from the training data. If you train with an English speaker, the clone will have an English accent even when speaking other languages. For accent-accurate multilingual output, train separate clones in each language.
Can I use professional voice clones for commercial purposes?
Yes, all paid plans include commercial licensing. You own your cloned voices and can use them in revenue-generating applications. ElevenLabs takes no royalty or licensing fee beyond your subscription.
Start Building with ElevenLabs and Burki
The combination of ElevenLabs voices and Burki's optimized conversation pipeline delivers voice AI that actually sounds human. Our customers consistently report that callers cannot tell they are speaking with an AI.
Getting started takes minutes:
- Sign up for Burki at burki.dev
- Create an assistant using our Voice Builder
- Select ElevenLabs as your TTS provider
- Choose your voice and configure parameters
- Assign a phone number and start testing
With 200 free minutes included on signup and a free trial phone number for 30 days, you can fully evaluate the platform before committing. Our average response latency of 0.8-1.2 seconds (compared to 4-5 seconds for competitors) means your conversations flow naturally.
The best voice AI combines great voices with great infrastructure. ElevenLabs provides the voices. Burki provides everything else.
Ready to integrate ElevenLabs with Burki? Start your free trial and experience the difference that sub-second latency makes. No credit card required.
Sources:
Ready to try Burki?
Start your 200-minute free trial today. No credit card required.
Start Free Trial200 free minutes included. No credit card required.