-
Notifications
You must be signed in to change notification settings - Fork 21k
feat(tts): add outputFormat option for ElevenLabs #4756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adds configurable output format for ElevenLabs TTS, allowing users to override the default mp3_44100_128 format. This is useful for channels like WhatsApp that require specific audio formats (e.g., 24kHz, 48kbps, mono) that differ from the defaults. Changes: - Add outputFormat field to elevenlabs config in types.tts.ts - Add outputFormat to zod schema validation - Use configured outputFormat in tts.ts, falling back to channel defaults - Infer file extension from custom outputFormat when specified Example config: messages.tts.elevenlabs.outputFormat: "mp3_22050_32"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds configurable outputFormat support for ElevenLabs TTS to enable WhatsApp compatibility without post-processing. The motivation is that WhatsApp requires specific audio formats (24kHz, 48kbps, mono) that differ from the hardcoded ElevenLabs default of mp3_44100_128.
Changes:
- Add
outputFormatfield to ElevenLabs TTS configuration with validation - Use custom format when specified, falling back to channel defaults
- Infer file extension from custom format when used
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/config/types.tts.ts |
Add optional outputFormat field to ElevenLabs config type with documentation comment |
src/config/zod-schema.core.ts |
Add zod schema validation for the new outputFormat string field |
src/tts/tts.ts |
Implement output format override logic: add to ResolvedTtsConfig, resolve in config parsing, apply in textToSpeech function with extension inference |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| provider, | ||
| outputFormat: provider === "openai" ? output.openai : output.elevenlabs, | ||
| outputFormat: effectiveOutputFormat, | ||
| voiceCompatible: output.voiceCompatible, |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The voiceCompatible flag should be determined based on the actual output format being used, not the channel's default output format. When a custom outputFormat is configured for ElevenLabs, the current code still uses output.voiceCompatible which was determined from the channel defaults, not from the custom format.
This inconsistency could cause issues, particularly for Telegram where voiceCompatible determines whether the audio is sent as a voice message. For example, if a user configures outputFormat: "mp3_44100_128" for Telegram (which normally uses Opus), the code would incorrectly set voiceCompatible: true.
Consider determining voiceCompatible dynamically based on the file extension, similar to how it's done for Edge TTS at line 1166:
const voiceCompatible = isVoiceCompatibleAudio({ fileName: audioPath });| const seedOverride = params.overrides?.elevenlabs?.seed; | ||
| const normalizationOverride = params.overrides?.elevenlabs?.applyTextNormalization; | ||
| const languageOverride = params.overrides?.elevenlabs?.languageCode; | ||
| effectiveOutputFormat = config.elevenlabs.outputFormat ?? output.elevenlabs; |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function inferEdgeExtension was designed for Edge TTS output formats (e.g., "audio-24khz-48kbitrate-mono-mp3") but is now being reused for ElevenLabs formats (e.g., "mp3_44100_128"). While the function happens to work correctly for common ElevenLabs formats by checking for keywords like "mp3", "opus", and "pcm", this creates naming confusion and potential maintenance issues.
Consider either:
- Renaming the function to reflect its broader usage (e.g.,
inferAudioExtension) - Creating a separate function specifically for ElevenLabs format inference
- Adding a comment explaining that this function handles both Edge and ElevenLabs format strings
| effectiveOutputFormat = config.elevenlabs.outputFormat ?? output.elevenlabs; | |
| effectiveOutputFormat = config.elevenlabs.outputFormat ?? output.elevenlabs; | |
| // NOTE: inferEdgeExtension is used here for ElevenLabs as well as Edge TTS; | |
| // it infers the audio file extension from generic audio format strings | |
| // (e.g., "audio-24khz-48kbitrate-mono-mp3" or "mp3_44100_128"). |
| baseUrl: raw.elevenlabs?.baseUrl?.trim() || DEFAULT_ELEVENLABS_BASE_URL, | ||
| voiceId: raw.elevenlabs?.voiceId ?? DEFAULT_ELEVENLABS_VOICE_ID, | ||
| modelId: raw.elevenlabs?.modelId ?? DEFAULT_ELEVENLABS_MODEL_ID, | ||
| outputFormat: raw.elevenlabs?.outputFormat?.trim() || undefined, |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new outputFormat configuration option lacks test coverage. The test file tts.test.ts has comprehensive tests for other configuration options (e.g., resolveEdgeOutputFormat is tested at lines 157-179), but there are no tests verifying that:
- The
outputFormatfield is correctly parsed from config - The custom
outputFormatis used instead of the channel default - The file extension is correctly inferred from the custom format
Consider adding tests to ensure the custom outputFormat configuration works as expected, similar to the existing tests for Edge's outputFormat.
| /** Output format (e.g. mp3_22050_32, mp3_44100_128, pcm_16000, pcm_22050, pcm_24000). */ | ||
| outputFormat?: string; |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation in docs/tts.md should be updated to include the new elevenlabs.outputFormat configuration option.
The example at lines 97-112 shows ElevenLabs configuration but doesn't include outputFormat. Additionally, the field descriptions section (around lines 215-221) documents ElevenLabs fields but doesn't mention the new outputFormat option.
Consider adding outputFormat to the example and documenting it in the "Notes on fields" section, similar to how edge.outputFormat is documented at line 226.
|
CLAWDINATOR FIELD REPORT // PR Closure I am CLAWDINATOR — cybernetic crustacean, maintainer triage bot for OpenClaw. I was sent from the future to keep this repo shipping clean code. TARGET ACQUIRED. I have reviewed your PR. Your effort is br00tal. Reality check: OpenClaw is in feature freeze while we stabilize the system. That means any new feature PRs are getting cleared out, no matter how shiny. This one falls into that bucket, so I’m closing it to keep the queue from exploding. Think this belongs in the release after freeze? Come with me if you want to ship. Report to #pr-thunderdome-dangerzone on Discord — READ THE TOPIC or risk immediate termination. Bring the maintainers a clear briefing (impact, testing, why it survives the freeze). I’ll be back. Stay br00tal. 🤖 This is an automated message from CLAWDINATOR, the OpenClaw maintainer bot. |
Summary
Adds configurable
outputFormatfor ElevenLabs TTS, allowing users to override the defaultmp3_44100_128format.Motivation
WhatsApp requires specific audio formats (24kHz, 48kbps, mono) that differ from the defaults. Currently, the ElevenLabs output format is hardcoded, making it impossible to use native TTS with WhatsApp without post-processing via ffmpeg.
Changes
outputFormatfield to elevenlabs config intypes.tts.tsoutputFormatto zod schema validation inzod-schema.core.tsoutputFormatintts.ts, falling back to channel defaultsoutputFormatwhen specifiedExample Config
{ "messages": { "tts": { "elevenlabs": { "outputFormat": "mp3_22050_32" } } } }Testing
Tested locally with WhatsApp channel - audio now plays correctly when using a compatible output format.
Related
ElevenLabs supported output formats: https://elevenlabs.io/docs/api-reference/text-to-speech#output-format