Gemini 3.1 Flash TTS Preview is Google's 3.1-generation text-to-speech model optimized for price-performant, low-latency, and highly expressive speech generation. It supports over 70 languages and introduces 200+ audio tags that allow fine-grained control over vocal style, pace, and emotional delivery directly within the text input. The model natively supports multi-speaker dialogue, maintaining natural conversational flow across multiple characters — ideal for podcasts, dramatic scripts, and collaborative assistant interfaces. It achieved an Elo score of 1,211 on the Artificial Analysis TTS leaderboard, and all generated audio is embedded with SynthID watermarking for reliable AI-generated content identification.
Gemini 3.1 Flash TTS Preview is Google's 3.1-generation text-to-speech model optimized for price-performant, low-latency, and highly expressive speech generation. It supports over 70 languages and introduces 200+ audio tags that allow fine-grained control over vocal style, pace, and emotional delivery directly within the text input. The model natively supports multi-speaker dialogue, maintaining natural conversational flow across multiple characters — ideal for podcasts, dramatic scripts, and collaborative assistant interfaces. It achieved an Elo score of 1,211 on the Artificial Analysis TTS leaderboard, and all generated audio is embedded with SynthID watermarking for reliable AI-generated content identification.