Gemini 3.5 Live Translate: real-time voice translation without headphones

For years, Google showcased real-time voice translation prototypes on stage, but each one demanded specific hardware — Pixel Buds earphones, a Google smartphone, or a dedicated setup. On June 9, 2026, the company announced Gemini 3.5 Live Translate: a speech-to-speech model that dissolves those hardware requirements and brings live voice translation to the standard Google Translate app on Android and iOS.

Key takeaways

Gemini 3.5 Live Translate supports over 70 languages in real time
The model preserves the original speaker's tone, pacing, and pitch
Every generated audio stream is permanently marked with a SynthID watermark
"Listening mode" (translation through the phone earpiece) is available on Android only
Google Meet integration launches for select enterprise customers in June

End of the headphone mandate

Until recently, live translation in Google's ecosystem required Pixel Buds connected to an Android phone. Late 2025, Google expanded support to any earbuds and the iOS app — but still assumed users had something in their ears.

Gemini 3.5 Live Translate goes a step further. On Android, a new "listening mode" lets you hold the phone to your ear as if taking a call. Translated audio streams directly through the phone's earpiece — handy for listening to a foreign-language tour guide without any additional accessory. The feature is Android-only for now; iOS support has not been announced.

The model handles automatic language detection, so neither developers nor users need to manually configure language pairs. Google states that Gemini 3.5 Live Translate keeps pace with normal conversation with only a few seconds of lag — small enough not to disrupt the exchange.

How the model sounds and what SynthID adds

Previous voice translation systems often produced mechanical-sounding output. Gemini 3.5 Live Translate processes not just the content but also the speaker's voice characteristics — intonation, pacing, and pitch. Official Google demos suggest a noticeably more natural result than generic text-to-speech.

Even so, Google has chosen not to obscure the AI's role. Every audio stream from Gemini 3.5 Live Translate carries a SynthID digital watermark embedded in the audio waveform. According to Google's announcement, the watermark cannot currently be removed, meaning every translated stream — however natural-sounding — is identifiable as AI-generated content. This type of speech-to-speech AI had no publicly available counterpart at this capability level before.

SynthID is an imperceptible audio marker; Google had previously used it for generated images. Extending the technology to audio represents the first broad commercial deployment of audio watermarking in a Google product.

Where and when it becomes available

The model enters Google's ecosystem through several channels simultaneously. Developers can already use a public preview in the Gemini Live API and Google AI Studio — the model processes continuous speech without manual multilingual configuration.

In Google Meet, live translation powered by Gemini 3.5 will reach select enterprise customers in June 2026, ahead of a broader rollout. Google has announced changes to Meet's interface to make Live Translate more prominent and accessible.

The broadest distribution channel is the Google Translate app on Android and iOS — the Gemini 3.5 update is due "soon," without a specific date. The previous model in Translate ran on an older architecture and did not match voice characteristics to the same degree as Gemini 3.5 Flash and the new solution.

For comparison: Apple presented similar live voice translation capabilities at WWDC 2026 as part of Siri AI, but based on Gemini 3. Microsoft Translator does not offer equivalent speaker-characteristic preservation. Gemini 3.5 Live Translate is currently the most advanced publicly available speech-to-speech solution that retains voice characteristics.

Why this matters

Real-time voice translation is technically complex not because of text translation itself — Google has done that for years — but because of the need to preserve speech fluency and naturalness with minimal latency. Gemini 3.5 Live Translate pushes that bar: it eliminates the specialized earphone requirement and preserves individual voice characteristics.

The broader context matters more. Billions of people still do not share a language with their doctors, teachers, or employers. A tool that runs inside a standard phone app without any additional hardware has real potential to lower that barrier — especially in developing markets where a smartphone is the only device a user has access to.

At the same time, the mandatory SynthID watermark sets a precedent: Google openly acknowledges that mass AI translation creates disinformation risks and is setting its own accountability standard. Embedding a marker in the audio layer makes misuse harder, though not impossible.

What's next

Google announced a Pro version of the Gemini 3.5 model in the coming weeks — likely with better voice quality and lower latency than the current Flash version
Broader rollout in Google Meet (beyond selected enterprise customers) has no announced date
"Listening mode" on iOS is not yet available — Apple and Google have not commented on plans for its launch beyond Android