blog.google
|
ksl
|
|
Google published a technical explainer on how AI Mode in Search handles visual queries through a method called “query fan-out.” When a user uploads an image – say, a photo of an outfit – Gemini analyzes the full frame, identifies individual items like the hat, jacket, and shoes, then fires off parallel Lens searches for each object simultaneously. The results get synthesized into a single response with shopping links and context. It’s a clean example of multimodal orchestration: Gemini acts as the reasoning layer while Lens serves as the retrieval backend across billions of indexed images. Perplexity and OpenAI have both been adding visual search to their products, but Google’s advantage here is the decade of Lens training data and Shopping graph integration that no competitor can replicate overnight.
