AI FOR FASHION AND OTHER AI ANNOUNCEMENTS FROM GOOGLE I/O

Google is fully embracing AI. Throughout its keynote at the I/O developer conference, the company emphasized "AI" over 120 times. While not all AI announcements were equally significant, we've gathered the top new AI products and features unveiled at Google I/O 2024, starting with the fashion domain.

AI for Fashion and Accessibility

Google is enriching its TalkBack accessibility feature for Android with generative AI capabilities. Soon, TalkBack will utilize Gemini Nano to provide auditory descriptions of objects for users with low vision or blindness. For example, it might describe an item of clothing as, "A close-up of a black and white gingham dress with a short length, collar, and long sleeves, tied at the waist with a large bow." This integration aims to assist users who encounter approximately 90 unlabeled images daily by autonomously providing content insights, potentially eliminating the need for manual input.

Generative AI in Search

Google intends to leverage generative AI to structure entire Google Search result pages. These pages will vary depending on the search query, potentially displaying AI-generated summaries of reviews, discussions from platforms like Reddit, and lists of suggestions. Initially, these AI-enhanced result pages will appear for searches related to inspiration, such as trip planning. Eventually, they will extend to queries about dining options, recipes, movies, books, hotels, e-commerce, and more.

Project Astra and Gemini Live

Google is advancing its AI-powered chatbot, Gemini, to better comprehend its environment. The company showcased a new Gemini experience called Gemini Live, enabling users to engage in detailed voice conversations with Gemini on their smartphones. Users can interrupt Gemini to ask clarifying questions, and the chatbot adapts to their speech patterns in real-time. Gemini Live can also analyze and respond to users' surroundings through photos or videos captured by their smartphones' cameras. These technical advancements stem from Project Astra, an initiative within DeepMind aimed at creating AI-powered applications and agents for real-time, multimodal understanding.

Google Veo

Google introduces Veo, an AI model rivaling OpenAI's Sora, capable of generating 1080p video clips up to a minute long based on text prompts. Veo can produce various visual and cinematic styles, comprehend camera movements and visual effects from prompts, and demonstrate a basic understanding of physics, enhancing the realism of generated videos. Additionally, Veo supports masked editing for targeted video modifications and can generate longer videos based on sequences of prompts that form a cohesive story.

Ask Photos

Google Photos receives an AI boost with the experimental feature Ask Photos, powered by Google's Gemini AI models. Scheduled for release later this summer, Ask Photos enables users to search their Google Photos collection using natural language queries, leveraging Gemini's understanding of photo content and metadata. For instance, users can conduct complex searches like finding the "best photo from each National Park visited," wherein Gemini evaluates photo quality, geolocation data, and dates to return relevant images.