ElevenLabs Voice AI: TTS Configuration Guide

If you have spent any time evaluating text-to-speech providers, you already know the truth: ElevenLabs produces the most natural, human-like voices in the industry. When we built Burki's TTS integration layer, ElevenLabs was the first provider we added. The voices simply sound better than anything else available.

But having great voices is only half the battle. Configuring ElevenLabs correctly for real-time voice AI conversations requires understanding the nuances of their API, pricing structure, and performance characteristics. This guide walks you through everything you need to know to get ElevenLabs running optimally on Burki.

ElevenLabs Features That Matter for Voice AI

Before diving into configuration, let's establish what makes ElevenLabs particularly suited for conversational AI applications.

Voice Quality That Actually Works

ElevenLabs offers over 100 pre-built voices across multiple languages. More importantly, their voices handle conversational speech patterns well. They do not sound robotic when reading fragments, handling interruptions, or generating short responses. This matters enormously in voice AI where responses are often just a few words.

The voice library includes:

Multilingual Support: 32 languages with native-sounding output
Emotional Range: Voices can express frustration, excitement, sympathy
Speaking Styles: From professional to casual, energetic to calm
Age and Gender Diversity: Young, old, male, female, and everything in between

Voice Cloning Capabilities

ElevenLabs offers two tiers of voice cloning that fundamentally change what you can build.

Instant Voice Cloning creates a usable voice clone from just 1-2 minutes of audio. The quality is good enough for many applications, and the turnaround is essentially immediate. This is included on all paid plans starting at $5/month.

Professional Voice Cloning requires 30+ minutes of audio (2-3 hours recommended) but produces results that are nearly indistinguishable from the original speaker. This is the option you want for brand voices, executive assistants, or any application where voice authenticity is critical.

The AI captures everything about the source voice: cadence, tonality, pause patterns, breathing sounds, and even verbal tics like "um" and "ah" if they appear in your training data.

Safety and Verification

ElevenLabs requires voice verification before creating clones. You must record yourself reading an authorization message, which prevents unauthorized cloning of other people's voices. This is both a legal compliance feature and a practical safeguard.

ElevenLabs Pricing Breakdown (2026)

Understanding pricing is essential for cost optimization. ElevenLabs uses a credit-based system where 1 character equals 1 credit for their standard models.

Plan	Monthly Cost	Credits	Key Features
Free	$0	10,000	Non-commercial, instant cloning
Starter	$5	30,000	Commercial license, instant cloning
Creator	$11	100,000	Professional voice cloning, 192kbps
Pro	$99	500,000	44.1kHz PCM, production-scale
Scale	$330	Millions	Multi-seat, low-latency TTS
Business	$1,320	Millions	Full enterprise features
Enterprise	Custom	Custom	SLAs, HIPAA/BAA, dedicated support

Annual billing saves you 2 months (roughly 16.7% discount).

Practical Cost Estimates:

Average AI response: 100-200 characters
Cost per response at Creator tier: ~0.011-0.022 cents
Cost per minute of conversation: 3-5 cents (assuming back-and-forth dialog)

For most voice AI applications, the Creator tier ($11/month) provides excellent value. You get professional voice cloning and 192kbps audio quality, which is indistinguishable from higher bitrates on phone calls.

Burki + ElevenLabs Integration

Burki supports ElevenLabs as a first-class TTS provider with full feature support. Here is how the integration works.

Supported Features

Feature	Burki Support
Voice Selection	Browse 100+ voices
Emotion Control	Full parameter access
Voice Cloning	Both instant and professional
Streaming	Real-time audio streaming
BYO API Keys	Per-assistant configuration

Configuration Options in Burki

When setting up an ElevenLabs voice in Burki, you can configure:

Voice Parameters:

Stability (0-1): Higher values produce more consistent output, lower values add expressiveness
Similarity Boost (0-1): Controls how closely output matches the original voice model
Style (0-1): Amplifies the voice's stylistic characteristics
Speaker Boost: Enhances clarity for specific speaker characteristics

Model Selection:

Turbo v2: Lowest latency, good quality
Multilingual v2: Best quality, higher latency
Turbo v2.5: Balanced option for most use cases

Setting Up Your ElevenLabs API Key

Burki supports bring-your-own (BYO) API keys at both the organization and assistant level. This gives you full control over billing and usage tracking.

To configure:

Generate an API key from your ElevenLabs dashboard
In Burki, navigate to Organization Settings > Provider Credentials
Add your ElevenLabs API key
Optionally, override at the assistant level for specific use cases

Your API key is encrypted at rest using AES-256 encryption. Burki never logs or exposes your credentials.

Voice Selection Guide

Choosing the right voice is one of the most important decisions you will make. Here is a practical framework.

For Customer Support

Priority: Clarity, warmth, neutral accent

Recommended voices:

Rachel: Professional American female, excellent for technical support
Josh: Friendly American male, good for conversational support
Elli: Young female, works well for startup/tech companies

Parameter settings:

Stability: 0.75 (consistent but not robotic)
Similarity Boost: 0.75
Style: 0.30 (natural without being dramatic)

For Sales and Outbound

Priority: Energy, persuasion, confidence

Recommended voices:

Antoni: Confident American male, good closing energy
Domi: Assertive female, projects authority
Bella: Warm female, excellent for relationship-building

Parameter settings:

Stability: 0.50 (more expressive for engagement)
Similarity Boost: 0.80
Style: 0.50 (more personality)

For Appointments and Scheduling

Priority: Professional, efficient, clear

Recommended voices:

Sam: Neutral, professional
Fin: Slightly robotic but extremely clear
Charlotte: Formal British English, professional

Parameter settings:

Stability: 0.85 (very consistent)
Similarity Boost: 0.70
Style: 0.20 (minimal personality)

Language Considerations

If your assistant needs to speak a language other than English, select a voice native to that language. Using an English voice to speak Spanish will produce an English accent, which may not be ideal.

ElevenLabs supports 32 languages. For multilingual deployments, create separate assistants with language-appropriate voices rather than trying to force one voice across languages.

Voice Cloning Setup

Voice cloning unlocks powerful personalization options. Here is how to set it up properly.

Instant Voice Clone (Quick Setup)

Requirements:

1-2 minutes of clear audio
No background noise, reverb, or artifacts
22kHz+ sample rate recommended
MP3, WAV, FLAC, M4A, or OGG format

Steps:

Prepare your audio sample
Navigate to Burki's Voice Library
Click "Clone Voice" and select ElevenLabs as the provider
Upload your audio file
Complete the voice verification prompt
Your cloned voice is ready in seconds

Tips for better instant clones:

Use a sample that demonstrates your full vocal range
Include varied intonation (questions, statements, emphasis)
Record in a quiet room with no echo
Speak at your natural pace

Professional Voice Clone (High Quality)

Requirements:

Minimum 30 minutes of audio (2-3 hours recommended)
Consistent recording environment across samples
High-quality microphone (condenser recommended)
Paid plan (Creator tier or above)

Steps:

Gather your audio corpus (podcasts, recordings, read scripts)
Ensure consistent audio quality across all samples
Upload through Burki's voice cloning interface
Processing takes 2-5 minutes
Test extensively before deploying

What to include in training data:

Various emotional states (neutral, happy, serious)
Different sentence types (questions, commands, explanations)
Technical vocabulary relevant to your use case
Natural speech patterns (not overly scripted)

Cloned Voice Management

Once created, your cloned voices can be:

Assigned to multiple assistants
Tested with sample text before deployment
Tracked for usage and costs
Shared across your organization

Burki maintains a voice library where you can organize clones with tags, descriptions, and quality ratings.

Latency Optimization

For conversational AI, latency is everything. Users perceive delays over 400ms as unnatural. Here is how to minimize TTS latency with ElevenLabs.

Model Selection

Model	Latency	Quality	Use Case
Turbo v2	~200ms	Good	Speed-critical applications
Turbo v2.5	~300ms	Very Good	Most voice AI applications
Multilingual v2	~500ms	Excellent	Quality-first applications

For most voice AI use cases on Burki, Turbo v2.5 provides the best balance. The quality difference from Multilingual v2 is barely perceptible in real-time conversation.

Streaming Configuration

Burki streams TTS audio in real-time, meaning playback begins before the entire response is generated. This dramatically reduces perceived latency.

To optimize streaming:

Enable chunked responses in your LLM configuration
Use sentence-level streaming where possible
Set appropriate buffer sizes (Burki handles this automatically)

Geographic Optimization

ElevenLabs operates multiple regions. Burki automatically routes to the nearest endpoint, but you can verify your configuration:

US users: us-east or us-west
EU users: eu-west
APAC users: Consider latency trade-offs

Warmup and Connection Pooling

Burki maintains warm service pools for ElevenLabs connections. This eliminates cold-start latency on new requests. The service pool:

Reuses expensive service instances across calls
Automatically removes idle services after 5 minutes
Tracks pool statistics for monitoring

This optimization reduces first-response latency by 50-100ms in typical deployments.

Cost Optimization Strategies

With proper optimization, you can significantly reduce your ElevenLabs spend without sacrificing quality.

Character Count Reduction

Every character costs credits. Reduce waste by:

Trimming filler words: Remove unnecessary "Well," "So," and "Actually" from LLM prompts
Using contractions: "I'm" vs "I am" saves 2 characters per instance
Concise responses: Tune your LLM to be brief without being terse

Model Tiering

Not every response needs the highest quality model. Consider:

Turbo v2 for acknowledgments ("Got it," "One moment")
Turbo v2.5 for standard responses
Multilingual v2 for complex explanations or sensitive conversations

Burki does not currently support automatic model switching, but you can create separate assistants with different TTS configurations for different call types.

Caching Common Phrases

For frequently used phrases (greetings, hold messages, error responses), pre-generate and cache the audio. Burki's hold audio system already does this for hold messages and music.

Plan Selection

Based on your volume:

Monthly Characters	Recommended Plan	Cost per Character
Under 30K	Starter ($5)	$0.00017
30K - 100K	Creator ($11)	$0.00011
100K - 500K	Pro ($99)	$0.00020
Over 500K	Scale/Business	Negotiate

The Creator tier offers the best value for small to medium deployments. Scale becomes cost-effective only at very high volumes.

BYO Keys for Cost Control

Using Burki's BYO API key feature gives you:

Direct billing relationship with ElevenLabs
Access to your existing volume discounts
Clearer usage tracking
No markup on TTS costs (only Burki platform fees)

Frequently Asked Questions

Can I use my existing ElevenLabs voices in Burki?

Yes. Any voices in your ElevenLabs account, including cloned voices, are accessible when you configure your API key in Burki. The voice library syncs automatically.

What audio quality does Burki support?

Burki supports up to 44.1kHz PCM audio when using ElevenLabs Pro tier or higher. For phone calls, 8kHz mu-law is sufficient, so higher quality settings primarily benefit web-based voice interfaces.

How does Burki handle ElevenLabs rate limits?

Burki implements automatic retry logic with exponential backoff. If you hit rate limits, Burki will queue requests and retry. For high-volume applications, consider the Scale or Business tiers which have higher rate limits.

Can I switch between TTS providers mid-call?

Not currently. TTS provider is configured at the assistant level and remains constant throughout a call. However, you can use multi-assistant graphs to route different call types to assistants with different TTS configurations.

What happens if ElevenLabs is down?

Burki supports fallback providers. You can configure backup TTS providers (Deepgram, Cartesia, OpenAI) that activate if your primary provider fails. Fallback configuration is set at the assistant level.

How accurate is voice cloning for different accents?

ElevenLabs voice cloning preserves accents from the training data. If you train with an English speaker, the clone will have an English accent even when speaking other languages. For accent-accurate multilingual output, train separate clones in each language.

Can I use professional voice clones for commercial purposes?

Yes, all paid plans include commercial licensing. You own your cloned voices and can use them in revenue-generating applications. ElevenLabs takes no royalty or licensing fee beyond your subscription.

Start Building with ElevenLabs and Burki

The combination of ElevenLabs voices and Burki's optimized conversation pipeline delivers voice AI that actually sounds human. Our customers consistently report that callers cannot tell they are speaking with an AI.

Getting started takes minutes:

Sign up for Burki at burki.dev
Create an assistant using our Voice Builder
Select ElevenLabs as your TTS provider
Choose your voice and configure parameters
Assign a phone number and start testing

With 200 free minutes included on signup and a free trial phone number for 30 days, you can fully evaluate the platform before committing. Our average response latency of 0.8-1.2 seconds (compared to 4-5 seconds for competitors) means your conversations flow naturally.

The best voice AI combines great voices with great infrastructure. ElevenLabs provides the voices. Burki provides everything else.

Ready to integrate ElevenLabs with Burki? Start your free trial and experience the difference that sub-second latency makes. No credit card required.

Sources:

ElevenLabs Features That Matter for Voice AI

Voice Quality That Actually Works

Voice Cloning Capabilities

Safety and Verification

ElevenLabs Pricing Breakdown (2026)

Burki + ElevenLabs Integration

Supported Features

Configuration Options in Burki

Setting Up Your ElevenLabs API Key

Voice Selection Guide

For Customer Support

For Sales and Outbound

For Appointments and Scheduling

Language Considerations

Voice Cloning Setup

Instant Voice Clone (Quick Setup)

Professional Voice Clone (High Quality)

Cloned Voice Management

Latency Optimization

Model Selection

Streaming Configuration

Geographic Optimization

Warmup and Connection Pooling

Cost Optimization Strategies

Character Count Reduction

Model Tiering

Caching Common Phrases

Plan Selection

BYO Keys for Cost Control

Frequently Asked Questions

Can I use my existing ElevenLabs voices in Burki?

What audio quality does Burki support?

How does Burki handle ElevenLabs rate limits?

Can I switch between TTS providers mid-call?

What happens if ElevenLabs is down?

How accurate is voice cloning for different accents?

Can I use professional voice clones for commercial purposes?

Start Building with ElevenLabs and Burki

Ready to try Burki?

Related Articles

Deepgram Nova 2/3 for Voice AI: The Developer's Guide to Production-Grade STT

Groq LLaMA: Ultra-Fast Voice AI

Telnyx Voice AI Integration: A Developer's Guide to the Twilio Alternative