Gemini will handle pictures, videos, and audio just as well as it handles text

Google made a big deal about Gemini’s “multimodal” capabilities, “multimodal performance,” meaning it can comprehend different kinds of information such as text, images, video, audio, and more. According to the company, Google trained Gemini on a variety of mediums from the ground up, rather than taking it on after the chat features were up and running.
Google shared a video where a Gemini-powered Bard helps with a student’s physics homework starting with a photo of the assignment with handwritten questions. The AI then seamlessly transitions to written advice, complete with equations and step-by-step answers.