Submit

Supertone TTS MCP

@supertone-inc

MCP server for the Supertone TTS API. Generate natural speech, browse and preview the voice catalog, predict synthesis cost, and create cloned voices — directly from Claude Desktop, Cursor, or any MCP-compatible client. Supports Korean, English, Japanese, and 20+ other languages, with speed, pitch, and emotion-style control.
Overview

supertone-mcp

A composable MCP toolkit for the Supertone TTS API. Rather than a single "speak this text" command, it exposes Supertone's SDK as a set of building-block tools — synthesis, voice discovery, preview, duration/credit prediction, usage tracking, and full voice-cloning CRUD — that an LLM assembles to fulfill a request. Works in Claude Desktop, Cursor, or any MCP-compatible client.

supertone-inc/supertone-mcp MCP server

Covers Korean, English, Japanese, and 31 languages total. Speed (0.5x–2.0x), pitch shift (-24 to +24 semitones), emotion styles, per-call output mode, streaming, and model selection.

Features

Synthesis

  • text_to_speech — Convert text to audio. Per-call control of output_mode (files / resources / both), autoplay, streaming, model, plus include_phonemes / normalized_text. Long text is auto-chunked by the SDK.
  • predict_duration — Estimate audio length (and credit cost) without synthesizing.

Voice discovery (preset)

  • search_voice — Filter the catalog by language, gender, age, use_case, style, model, name, or description.
  • get_voice — Full detail for one voice.
  • preview_voice — Sample audio URLs for a voice (filterable by language/style/model).

Custom voice cloning

  • clone_voice — Create a cloned voice from a local WAV/MP3 (≤3MB).
  • search_custom_voice — List/filter cloned voices.
  • get_custom_voice — Full detail for one cloned voice.
  • edit_custom_voice — Update name and/or description.
  • delete_custom_voice — Permanently delete (irreversible).

Usage & credits

  • get_credit_balance — Remaining credits.
  • get_usage_history — Usage over a time window.
  • get_voice_usage — Usage for a specific voice.

Breaking changes & migration (0.2.0)

0.2.0 moves behavior control out of environment variables and into per-call tool parameters — so the LLM decides per request, not the server config.

Before (env var)After (per-call parameter)Note
SUPERTONE_MCP_OUTPUT_MODE=files|resources|bothtext_to_speech(output_mode=...)Default still files
SUPERTONE_MCP_AUTOPLAY=truetext_to_speech(autoplay=...)Default changed truefalse (playback is now explicit)
(always streamed)text_to_speech(streaming=...)New, default false (one-shot). streaming=true requires model="sona_speech_1"

Other changes:

  • Default model changed sona_speech_1sona_speech_2_flash.
  • list_voices was removed (since the discovery release) and replaced by search_voice — call it with no arguments to reproduce the old "list everything" behavior.
  • No more hard 300-character limit — longer text is auto-chunked by the SDK (credit/latency scale with length).

If you previously set SUPERTONE_MCP_OUTPUT_MODE or SUPERTONE_MCP_AUTOPLAY, remove them from your client config and pass output_mode / autoplay per call instead. (The server prints a one-time stderr notice if it sees the removed vars.)

Installation

# Using uvx (recommended)
uvx supertone-mcp

# Using pip
pip install supertone-mcp

Configuration

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "supertone-tts": {
      "command": "uvx",
      "args": ["supertone-mcp"],
      "env": {
        "SUPERTONE_API_KEY": "your-api-key-here"
      }
    }
  }
}

Cursor

Add to your Cursor MCP settings (same JSON shape as above).

Environment Variables

Only authentication and stable defaults are configured via the environment — all behavior is controlled per call.

VariableRequiredDefaultDescription
SUPERTONE_API_KEYYesYour Supertone API key
SUPERTONE_MCP_VOICE_IDNopreset voice (Aiden, multilingual)Default voice_id for text_to_speech / predict_duration (override per call)
SUPERTONE_OUTPUT_DIRNo~/supertone-tts-output/Directory where audio files are saved (used by output_mode=files/both)

Removed in 0.2.0: SUPERTONE_MCP_OUTPUT_MODE and SUPERTONE_MCP_AUTOPLAY — see Migration.

Output modes (text_to_speech output_mode)

ModeReturnsUse when
files (default)Plain text with the saved file path + metadataYou want the file on disk
resourcesMCP AudioContent + TextContent (no file written)The client renders audio inline (e.g., Claude.ai chat)
bothFile on disk and AudioContent/TextContentYou want both — preview inline, keep the file

Usage Examples

The MCP client routes natural-language requests across these tools — the value of the toolkit is composition: the LLM chains several tools to satisfy one request.

Example 1 — Discover → preview → estimate cost → synthesize

"Find a calm Korean female voice, let me hear a sample, check the cost, then make this announcement as an mp3."

The LLM assembles:

search_voice(language="ko", gender="female", style="neutral")   # find candidates
  → preview_voice(voice_id)                                       # sample URLs to confirm the voice
  → predict_duration(text, voice_id) + get_credit_balance()       # gauge cost before spending
  → text_to_speech(text, voice_id, output_format="mp3",
                   output_mode="files")                           # synthesize

Example 2 — Clone my voice → use it right away

"Make a cloned voice from ~/recordings/sample.wav named MyVoice, then read this greeting with it and play it for me."

The LLM assembles:

clone_voice(name="MyVoice", audio_path="~/recordings/sample.wav")   # create the cloned voice
  → get_custom_voice(voice_id)                                       # confirm it was created
  → text_to_speech(text, voice_id=<cloned>, autoplay=true)           # synthesize, then play immediately

autoplay is a per-call parameter (default false), so playback happens only when explicitly requested.

Tool Parameters

text_to_speech

ParameterTypeRequiredDefaultDescription
textstringYesText to convert (long text is auto-chunked by the SDK)
voice_idstringNoenv or presetVoice identifier (browse via search_voice)
languagestringNokoLanguage code — one of 31 (ko, en, ja, …)
output_formatstringNomp3mp3 or wav
modelstringNosona_speech_2_flashsona_speech_1, sona_speech_2, sona_speech_2_flash, sona_speech_2t, sona_speech_3t, supertonic_api_1, supertonic_api_3
speedfloatNo1.00.5–2.0
pitch_shiftintNo0-24 to +24 semitones
stylestringNoEmotion style (varies by voice)
output_modestringNofilesfiles, resources, or both (see Output modes)
autoplayboolNofalsePlay the audio locally after synthesis (macOS afplay)
streamingboolNofalseStream synthesis. Only supported by model="sona_speech_1"
include_phonemesboolNofalseReturn phoneme timing data alongside the audio
normalized_textstringNoPre-normalized text (only used by sona_speech_2 / sona_speech_2_flash)

predict_duration

Same core parameter schema as text_to_speech (long text auto-chunked). Returns "Predicted duration: 2.34s (credit usage is proportional to duration).".

search_voice

All parameters optional. With no filters → full catalog. With any filter → first response line is Filters applied: ....

ParameterTypeDescription
languagestringe.g., ko, en, ja
genderstringe.g., male, female
agestringe.g., young_adult, child
use_casestringe.g., narration, advertisement
stylestringe.g., neutral, happy
modelstringe.g., sona_speech_2_flash
namestringpartial match
descriptionstringpartial match

get_voice / preview_voice

ToolRequiredOptional
get_voicevoice_id
preview_voicevoice_idlanguage, style, model (filter samples)

clone_voice

ParameterTypeRequiredDescription
namestringYesDisplay name (non-empty)
audio_pathstringYesLocal WAV or MP3 path (≤3MB). Supports ~ expansion
descriptionstringNoOptional note

Custom voice CRUD

ToolRequiredOptional
search_custom_voicename, description (partial match)
get_custom_voicevoice_id
edit_custom_voicevoice_idname, description (at least one required)
delete_custom_voicevoice_id(IRREVERSIBLE)

Usage & credits

ToolRequiredOptional
get_credit_balance
get_usage_history— (reports a recent default window)
get_voice_usagevoice_id

Development

# Clone and install
git clone https://github.com/supertone-inc/supertone-mcp.git
cd supertone-mcp
uv sync

# Run tests
uv run pytest -q

# Run with coverage
uv run pytest --cov=src --cov-report=term-missing

License

MIT

Server Config

{
  "mcpServers": {
    "supertone-tts": {
      "command": "uvx",
      "args": [
        "supertone-mcp"
      ],
      "env": {
        "SUPERTONE_API_KEY": "your-api-key-here"
      }
    }
  }
}
© 2025 MCP.so. All rights reserved.

Build with ShipAny.