Back to Blog
Provider Integrations

ElevenLabs Voice AI: TTS Configuration Guide

Complete guide to ElevenLabs API integration with Burki Voice AI. Configure voice cloning, optimize latency, and reduce costs with our step-by-step TTS setup guide.

Meeran Malik
12 min read

If you have spent any time evaluating text-to-speech providers, you already know the truth: ElevenLabs produces the most natural, human-like voices in the industry. When we built Burki's TTS integration layer, ElevenLabs was the first provider we added. The voices simply sound better than anything else available.

But having great voices is only half the battle. Configuring ElevenLabs correctly for real-time voice AI conversations requires understanding the nuances of their API, pricing structure, and performance characteristics. This guide walks you through everything you need to know to get ElevenLabs running optimally on Burki.

ElevenLabs Features That Matter for Voice AI

Before diving into configuration, let's establish what makes ElevenLabs particularly suited for conversational AI applications.

Voice Quality That Actually Works

ElevenLabs offers over 100 pre-built voices across multiple languages. More importantly, their voices handle conversational speech patterns well. They do not sound robotic when reading fragments, handling interruptions, or generating short responses. This matters enormously in voice AI where responses are often just a few words.

The voice library includes:

  • Multilingual Support: 32 languages with native-sounding output
  • Emotional Range: Voices can express frustration, excitement, sympathy
  • Speaking Styles: From professional to casual, energetic to calm
  • Age and Gender Diversity: Young, old, male, female, and everything in between

Voice Cloning Capabilities

ElevenLabs offers two tiers of voice cloning that fundamentally change what you can build.

Instant Voice Cloning creates a usable voice clone from just 1-2 minutes of audio. The quality is good enough for many applications, and the turnaround is essentially immediate. This is included on all paid plans starting at $5/month.

Professional Voice Cloning requires 30+ minutes of audio (2-3 hours recommended) but produces results that are nearly indistinguishable from the original speaker. This is the option you want for brand voices, executive assistants, or any application where voice authenticity is critical.

The AI captures everything about the source voice: cadence, tonality, pause patterns, breathing sounds, and even verbal tics like "um" and "ah" if they appear in your training data.

Safety and Verification

ElevenLabs requires voice verification before creating clones. You must record yourself reading an authorization message, which prevents unauthorized cloning of other people's voices. This is both a legal compliance feature and a practical safeguard.

ElevenLabs Pricing Breakdown (2026)

Understanding pricing is essential for cost optimization. ElevenLabs uses a credit-based system where 1 character equals 1 credit for their standard models.

PlanMonthly CostCreditsKey Features
Free$010,000Non-commercial, instant cloning
Starter$530,000Commercial license, instant cloning
Creator$11100,000Professional voice cloning, 192kbps
Pro$99500,00044.1kHz PCM, production-scale
Scale$330MillionsMulti-seat, low-latency TTS
Business$1,320MillionsFull enterprise features
EnterpriseCustomCustomSLAs, HIPAA/BAA, dedicated support

Annual billing saves you 2 months (roughly 16.7% discount).

Practical Cost Estimates:

  • Average AI response: 100-200 characters
  • Cost per response at Creator tier: ~0.011-0.022 cents
  • Cost per minute of conversation: 3-5 cents (assuming back-and-forth dialog)

For most voice AI applications, the Creator tier ($11/month) provides excellent value. You get professional voice cloning and 192kbps audio quality, which is indistinguishable from higher bitrates on phone calls.

Burki + ElevenLabs Integration

Burki supports ElevenLabs as a first-class TTS provider with full feature support. Here is how the integration works.

Supported Features

FeatureBurki Support
Voice SelectionBrowse 100+ voices
Emotion ControlFull parameter access
Voice CloningBoth instant and professional
StreamingReal-time audio streaming
BYO API KeysPer-assistant configuration

Configuration Options in Burki

When setting up an ElevenLabs voice in Burki, you can configure:

Voice Parameters:

  • Stability (0-1): Higher values produce more consistent output, lower values add expressiveness
  • Similarity Boost (0-1): Controls how closely output matches the original voice model
  • Style (0-1): Amplifies the voice's stylistic characteristics
  • Speaker Boost: Enhances clarity for specific speaker characteristics

Model Selection:

  • Turbo v2: Lowest latency, good quality
  • Multilingual v2: Best quality, higher latency
  • Turbo v2.5: Balanced option for most use cases

Setting Up Your ElevenLabs API Key

Burki supports bring-your-own (BYO) API keys at both the organization and assistant level. This gives you full control over billing and usage tracking.

To configure:

  1. Generate an API key from your ElevenLabs dashboard
  2. In Burki, navigate to Organization Settings > Provider Credentials
  3. Add your ElevenLabs API key
  4. Optionally, override at the assistant level for specific use cases

Your API key is encrypted at rest using AES-256 encryption. Burki never logs or exposes your credentials.

Voice Selection Guide

Choosing the right voice is one of the most important decisions you will make. Here is a practical framework.

For Customer Support

Priority: Clarity, warmth, neutral accent

Recommended voices:

  • Rachel: Professional American female, excellent for technical support
  • Josh: Friendly American male, good for conversational support
  • Elli: Young female, works well for startup/tech companies

Parameter settings:

  • Stability: 0.75 (consistent but not robotic)
  • Similarity Boost: 0.75
  • Style: 0.30 (natural without being dramatic)

For Sales and Outbound

Priority: Energy, persuasion, confidence

Recommended voices:

  • Antoni: Confident American male, good closing energy
  • Domi: Assertive female, projects authority
  • Bella: Warm female, excellent for relationship-building

Parameter settings:

  • Stability: 0.50 (more expressive for engagement)
  • Similarity Boost: 0.80
  • Style: 0.50 (more personality)

For Appointments and Scheduling

Priority: Professional, efficient, clear

Recommended voices:

  • Sam: Neutral, professional
  • Fin: Slightly robotic but extremely clear
  • Charlotte: Formal British English, professional

Parameter settings:

  • Stability: 0.85 (very consistent)
  • Similarity Boost: 0.70
  • Style: 0.20 (minimal personality)

Language Considerations

If your assistant needs to speak a language other than English, select a voice native to that language. Using an English voice to speak Spanish will produce an English accent, which may not be ideal.

ElevenLabs supports 32 languages. For multilingual deployments, create separate assistants with language-appropriate voices rather than trying to force one voice across languages.

Voice Cloning Setup

Voice cloning unlocks powerful personalization options. Here is how to set it up properly.

Instant Voice Clone (Quick Setup)

Requirements:

  • 1-2 minutes of clear audio
  • No background noise, reverb, or artifacts
  • 22kHz+ sample rate recommended
  • MP3, WAV, FLAC, M4A, or OGG format

Steps:

  1. Prepare your audio sample
  2. Navigate to Burki's Voice Library
  3. Click "Clone Voice" and select ElevenLabs as the provider
  4. Upload your audio file
  5. Complete the voice verification prompt
  6. Your cloned voice is ready in seconds

Tips for better instant clones:

  • Use a sample that demonstrates your full vocal range
  • Include varied intonation (questions, statements, emphasis)
  • Record in a quiet room with no echo
  • Speak at your natural pace

Professional Voice Clone (High Quality)

Requirements:

  • Minimum 30 minutes of audio (2-3 hours recommended)
  • Consistent recording environment across samples
  • High-quality microphone (condenser recommended)
  • Paid plan (Creator tier or above)

Steps:

  1. Gather your audio corpus (podcasts, recordings, read scripts)
  2. Ensure consistent audio quality across all samples
  3. Upload through Burki's voice cloning interface
  4. Processing takes 2-5 minutes
  5. Test extensively before deploying

What to include in training data:

  • Various emotional states (neutral, happy, serious)
  • Different sentence types (questions, commands, explanations)
  • Technical vocabulary relevant to your use case
  • Natural speech patterns (not overly scripted)

Cloned Voice Management

Once created, your cloned voices can be:

  • Assigned to multiple assistants
  • Tested with sample text before deployment
  • Tracked for usage and costs
  • Shared across your organization

Burki maintains a voice library where you can organize clones with tags, descriptions, and quality ratings.

Latency Optimization

For conversational AI, latency is everything. Users perceive delays over 400ms as unnatural. Here is how to minimize TTS latency with ElevenLabs.

Model Selection

ModelLatencyQualityUse Case
Turbo v2~200msGoodSpeed-critical applications
Turbo v2.5~300msVery GoodMost voice AI applications
Multilingual v2~500msExcellentQuality-first applications

For most voice AI use cases on Burki, Turbo v2.5 provides the best balance. The quality difference from Multilingual v2 is barely perceptible in real-time conversation.

Streaming Configuration

Burki streams TTS audio in real-time, meaning playback begins before the entire response is generated. This dramatically reduces perceived latency.

To optimize streaming:

  1. Enable chunked responses in your LLM configuration
  2. Use sentence-level streaming where possible
  3. Set appropriate buffer sizes (Burki handles this automatically)

Geographic Optimization

ElevenLabs operates multiple regions. Burki automatically routes to the nearest endpoint, but you can verify your configuration:

  • US users: us-east or us-west
  • EU users: eu-west
  • APAC users: Consider latency trade-offs

Warmup and Connection Pooling

Burki maintains warm service pools for ElevenLabs connections. This eliminates cold-start latency on new requests. The service pool:

  • Reuses expensive service instances across calls
  • Automatically removes idle services after 5 minutes
  • Tracks pool statistics for monitoring

This optimization reduces first-response latency by 50-100ms in typical deployments.

Cost Optimization Strategies

With proper optimization, you can significantly reduce your ElevenLabs spend without sacrificing quality.

Character Count Reduction

Every character costs credits. Reduce waste by:

  1. Trimming filler words: Remove unnecessary "Well," "So," and "Actually" from LLM prompts
  2. Using contractions: "I'm" vs "I am" saves 2 characters per instance
  3. Concise responses: Tune your LLM to be brief without being terse

Model Tiering

Not every response needs the highest quality model. Consider:

  • Turbo v2 for acknowledgments ("Got it," "One moment")
  • Turbo v2.5 for standard responses
  • Multilingual v2 for complex explanations or sensitive conversations

Burki does not currently support automatic model switching, but you can create separate assistants with different TTS configurations for different call types.

Caching Common Phrases

For frequently used phrases (greetings, hold messages, error responses), pre-generate and cache the audio. Burki's hold audio system already does this for hold messages and music.

Plan Selection

Based on your volume:

Monthly CharactersRecommended PlanCost per Character
Under 30KStarter ($5)$0.00017
30K - 100KCreator ($11)$0.00011
100K - 500KPro ($99)$0.00020
Over 500KScale/BusinessNegotiate

The Creator tier offers the best value for small to medium deployments. Scale becomes cost-effective only at very high volumes.

BYO Keys for Cost Control

Using Burki's BYO API key feature gives you:

  • Direct billing relationship with ElevenLabs
  • Access to your existing volume discounts
  • Clearer usage tracking
  • No markup on TTS costs (only Burki platform fees)

Frequently Asked Questions

Can I use my existing ElevenLabs voices in Burki?

Yes. Any voices in your ElevenLabs account, including cloned voices, are accessible when you configure your API key in Burki. The voice library syncs automatically.

What audio quality does Burki support?

Burki supports up to 44.1kHz PCM audio when using ElevenLabs Pro tier or higher. For phone calls, 8kHz mu-law is sufficient, so higher quality settings primarily benefit web-based voice interfaces.

How does Burki handle ElevenLabs rate limits?

Burki implements automatic retry logic with exponential backoff. If you hit rate limits, Burki will queue requests and retry. For high-volume applications, consider the Scale or Business tiers which have higher rate limits.

Can I switch between TTS providers mid-call?

Not currently. TTS provider is configured at the assistant level and remains constant throughout a call. However, you can use multi-assistant graphs to route different call types to assistants with different TTS configurations.

What happens if ElevenLabs is down?

Burki supports fallback providers. You can configure backup TTS providers (Deepgram, Cartesia, OpenAI) that activate if your primary provider fails. Fallback configuration is set at the assistant level.

How accurate is voice cloning for different accents?

ElevenLabs voice cloning preserves accents from the training data. If you train with an English speaker, the clone will have an English accent even when speaking other languages. For accent-accurate multilingual output, train separate clones in each language.

Can I use professional voice clones for commercial purposes?

Yes, all paid plans include commercial licensing. You own your cloned voices and can use them in revenue-generating applications. ElevenLabs takes no royalty or licensing fee beyond your subscription.

Start Building with ElevenLabs and Burki

The combination of ElevenLabs voices and Burki's optimized conversation pipeline delivers voice AI that actually sounds human. Our customers consistently report that callers cannot tell they are speaking with an AI.

Getting started takes minutes:

  1. Sign up for Burki at burki.dev
  2. Create an assistant using our Voice Builder
  3. Select ElevenLabs as your TTS provider
  4. Choose your voice and configure parameters
  5. Assign a phone number and start testing

With 200 free minutes included on signup and a free trial phone number for 30 days, you can fully evaluate the platform before committing. Our average response latency of 0.8-1.2 seconds (compared to 4-5 seconds for competitors) means your conversations flow naturally.

The best voice AI combines great voices with great infrastructure. ElevenLabs provides the voices. Burki provides everything else.


Ready to integrate ElevenLabs with Burki? Start your free trial and experience the difference that sub-second latency makes. No credit card required.


Sources:

Ready to try Burki?

Start your 200-minute free trial today. No credit card required.

Start Free Trial

200 free minutes included. No credit card required.

Related Articles