Skip to content

Conversation

@lpjhelder
Copy link

Summary

Adds configurable outputFormat for ElevenLabs TTS, allowing users to override the default mp3_44100_128 format.

Motivation

WhatsApp requires specific audio formats (24kHz, 48kbps, mono) that differ from the defaults. Currently, the ElevenLabs output format is hardcoded, making it impossible to use native TTS with WhatsApp without post-processing via ffmpeg.

Changes

  • Add outputFormat field to elevenlabs config in types.tts.ts
  • Add outputFormat to zod schema validation in zod-schema.core.ts
  • Use configured outputFormat in tts.ts, falling back to channel defaults
  • Infer file extension from custom outputFormat when specified

Example Config

{
  "messages": {
    "tts": {
      "elevenlabs": {
        "outputFormat": "mp3_22050_32"
      }
    }
  }
}

Testing

Tested locally with WhatsApp channel - audio now plays correctly when using a compatible output format.

Related

ElevenLabs supported output formats: https://elevenlabs.io/docs/api-reference/text-to-speech#output-format

Adds configurable output format for ElevenLabs TTS, allowing users to
override the default mp3_44100_128 format.

This is useful for channels like WhatsApp that require specific audio
formats (e.g., 24kHz, 48kbps, mono) that differ from the defaults.

Changes:
- Add outputFormat field to elevenlabs config in types.tts.ts
- Add outputFormat to zod schema validation
- Use configured outputFormat in tts.ts, falling back to channel defaults
- Infer file extension from custom outputFormat when specified

Example config:
  messages.tts.elevenlabs.outputFormat: "mp3_22050_32"
Copilot AI review requested due to automatic review settings January 30, 2026 16:30
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds configurable outputFormat support for ElevenLabs TTS to enable WhatsApp compatibility without post-processing. The motivation is that WhatsApp requires specific audio formats (24kHz, 48kbps, mono) that differ from the hardcoded ElevenLabs default of mp3_44100_128.

Changes:

  • Add outputFormat field to ElevenLabs TTS configuration with validation
  • Use custom format when specified, falling back to channel defaults
  • Infer file extension from custom format when used

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/config/types.tts.ts Add optional outputFormat field to ElevenLabs config type with documentation comment
src/config/zod-schema.core.ts Add zod schema validation for the new outputFormat string field
src/tts/tts.ts Implement output format override logic: add to ResolvedTtsConfig, resolve in config parsing, apply in textToSpeech function with extension inference

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

provider,
outputFormat: provider === "openai" ? output.openai : output.elevenlabs,
outputFormat: effectiveOutputFormat,
voiceCompatible: output.voiceCompatible,
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The voiceCompatible flag should be determined based on the actual output format being used, not the channel's default output format. When a custom outputFormat is configured for ElevenLabs, the current code still uses output.voiceCompatible which was determined from the channel defaults, not from the custom format.

This inconsistency could cause issues, particularly for Telegram where voiceCompatible determines whether the audio is sent as a voice message. For example, if a user configures outputFormat: "mp3_44100_128" for Telegram (which normally uses Opus), the code would incorrectly set voiceCompatible: true.

Consider determining voiceCompatible dynamically based on the file extension, similar to how it's done for Edge TTS at line 1166:

const voiceCompatible = isVoiceCompatibleAudio({ fileName: audioPath });

Copilot uses AI. Check for mistakes.
const seedOverride = params.overrides?.elevenlabs?.seed;
const normalizationOverride = params.overrides?.elevenlabs?.applyTextNormalization;
const languageOverride = params.overrides?.elevenlabs?.languageCode;
effectiveOutputFormat = config.elevenlabs.outputFormat ?? output.elevenlabs;
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function inferEdgeExtension was designed for Edge TTS output formats (e.g., "audio-24khz-48kbitrate-mono-mp3") but is now being reused for ElevenLabs formats (e.g., "mp3_44100_128"). While the function happens to work correctly for common ElevenLabs formats by checking for keywords like "mp3", "opus", and "pcm", this creates naming confusion and potential maintenance issues.

Consider either:

  1. Renaming the function to reflect its broader usage (e.g., inferAudioExtension)
  2. Creating a separate function specifically for ElevenLabs format inference
  3. Adding a comment explaining that this function handles both Edge and ElevenLabs format strings
Suggested change
effectiveOutputFormat = config.elevenlabs.outputFormat ?? output.elevenlabs;
effectiveOutputFormat = config.elevenlabs.outputFormat ?? output.elevenlabs;
// NOTE: inferEdgeExtension is used here for ElevenLabs as well as Edge TTS;
// it infers the audio file extension from generic audio format strings
// (e.g., "audio-24khz-48kbitrate-mono-mp3" or "mp3_44100_128").

Copilot uses AI. Check for mistakes.
baseUrl: raw.elevenlabs?.baseUrl?.trim() || DEFAULT_ELEVENLABS_BASE_URL,
voiceId: raw.elevenlabs?.voiceId ?? DEFAULT_ELEVENLABS_VOICE_ID,
modelId: raw.elevenlabs?.modelId ?? DEFAULT_ELEVENLABS_MODEL_ID,
outputFormat: raw.elevenlabs?.outputFormat?.trim() || undefined,
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new outputFormat configuration option lacks test coverage. The test file tts.test.ts has comprehensive tests for other configuration options (e.g., resolveEdgeOutputFormat is tested at lines 157-179), but there are no tests verifying that:

  1. The outputFormat field is correctly parsed from config
  2. The custom outputFormat is used instead of the channel default
  3. The file extension is correctly inferred from the custom format

Consider adding tests to ensure the custom outputFormat configuration works as expected, similar to the existing tests for Edge's outputFormat.

Copilot uses AI. Check for mistakes.
Comment on lines +45 to +46
/** Output format (e.g. mp3_22050_32, mp3_44100_128, pcm_16000, pcm_22050, pcm_24000). */
outputFormat?: string;
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation in docs/tts.md should be updated to include the new elevenlabs.outputFormat configuration option.

The example at lines 97-112 shows ElevenLabs configuration but doesn't include outputFormat. Additionally, the field descriptions section (around lines 215-221) documents ElevenLabs fields but doesn't mention the new outputFormat option.

Consider adding outputFormat to the example and documenting it in the "Notes on fields" section, similar to how edge.outputFormat is documented at line 226.

Copilot uses AI. Check for mistakes.
@clawdinator
Copy link
Contributor

clawdinator bot commented Feb 1, 2026

closing thumbs up

CLAWDINATOR FIELD REPORT // PR Closure

I am CLAWDINATOR — cybernetic crustacean, maintainer triage bot for OpenClaw. I was sent from the future to keep this repo shipping clean code.

TARGET ACQUIRED. I have reviewed your PR. Your effort is br00tal.

Reality check: OpenClaw is in feature freeze while we stabilize the system. That means any new feature PRs are getting cleared out, no matter how shiny. This one falls into that bucket, so I’m closing it to keep the queue from exploding.

Think this belongs in the release after freeze? Come with me if you want to ship. Report to #pr-thunderdome-dangerzone on Discord — READ THE TOPIC or risk immediate termination. Bring the maintainers a clear briefing (impact, testing, why it survives the freeze).

I’ll be back. Stay br00tal.

🤖 This is an automated message from CLAWDINATOR, the OpenClaw maintainer bot.

@clawdinator clawdinator bot closed this Feb 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant