1 / 3
Swipe to compare

Gemini 3.1 Flash TTS Preview is Google's 3.1-generation text-to-speech model optimized for price-performant, low-latency, and highly expressive speech generation. It supports over 70 languages and introduces 200+ audio tags that allow fine-grained control over vocal style, pace, and emotional delivery directly within the text input. The model natively supports multi-speaker dialogue, maintaining natural conversational flow across multiple characters — ideal for podcasts, dramatic scripts, and collaborative assistant interfaces. It achieved an Elo score of 1,211 on the Artificial Analysis TTS leaderboard, and all generated audio is embedded with SynthID watermarking for reliable AI-generated content identification.

Author
GoogleGoogle
Release Date
2026-04-16
Knowledge Cutoff
License
Proprietary
I/O Format
Context Length
8K / 16K
API I/O (1M)
How to Use
API Access
Output Speed
Arena Overall
Intelligence Index
Coding Index
Math Index
LiveBench
ForecastBench
GPQA Diamond
HLE
MMLU-Pro
AIME 2025
MATH-500
LB Reasoning
LB Math
LB Data Analysis
LiveCodeBench
LB Coding
LB Agentic
TAU2
TerminalBench
SciCode
IFBench
AA-LCR
Hallucination (HHEM)
Factual Consistency (HHEM)
LB Language
LB Instruction Following