Prompt Engineering in Practice · Multimodality
Audio and Video
Multimodality
Introduction
Whisper, GPT-4o Realtime, Gemini 1.5 Pro: how to prompt audio and video in 2024. Transcription, diarization, TTS, voice cloning, video understanding, latency budgets for voice agents and production monitoring.