Robots Atlas>ROBOTS ATLAS

Prompt Engineering in Practice · Multimodality

Audio and Video

Multimodality

Introduction

Whisper, GPT-4o Realtime, Gemini 1.5 Pro: how to prompt audio and video in 2024. Transcription, diarization, TTS, voice cloning, video understanding, latency budgets for voice agents and production monitoring.