Gate News message, April 23 — Google researchers, including He Kaiming and Xie Saining, published a paper introducing Vision Banana, a general-purpose vision understanding model created through lightweight instruction fine-tuning of the company’s Nano Banana Pro (Gemini 3 Pro Image) image generation model. The key innovation unifies outputs of all vision tasks as RGB images, enabling segmentation, depth estimation, and surface normal prediction through image generation without task-specific architectures or loss functions.
In semantic segmentation, Vision Banana outperformed the specialized model SAM 3 by 4.7 percentage points on Cityscapes; in referring expression segmentation, it surpassed SAM 3 Agent. However, it lagged behind SAM 3 in instance segmentation. For 3D tasks, metric depth estimation achieved 0.929 average accuracy across four standard datasets, exceeding Depth Anything V3’s 0.918, using only synthetic data without real depth information or camera parameters at inference. Surface normal estimation achieved state-of-the-art results on three indoor benchmarks.
Fine-tuning involved minimal vision task data mixed into original image generation training, preserving the model’s generation capabilities—performance matched the original Nano Banana Pro in generation quality tests. The paper proposes that image generation pretraining in vision parallels text generation pretraining in language: models learn the internal representations needed for image understanding during generation, with instruction fine-tuning merely releasing this capability.
Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to
Disclaimer.
Related Articles
Samsung SDS Expands Google Cloud Partnership to Serve Regulated Sectors with AI and Security Services
Gate News message, April 23 — Samsung SDS expanded its partnership with Google Cloud to deliver AI, cloud computing, and security services to regulated industries including government and financial services.
The companies will deploy Google Distributed Cloud for customers requiring data
GateNews12m ago
Sullivan & Cromwell Apologizes for AI Hallucinations in Court Filing with 40 Erroneous Citations
Gate News message, April 23 — Sullivan & Cromwell, a major Wall Street law firm, apologized to a federal judge after submitting a court filing containing approximately 40 incorrect citations and other errors caused by AI hallucinations. Andrew Dietderich, co-head of the firm's global restructuring t
GateNews28m ago
Tencent Releases and Open-Sources Hunyuan Hy3 Preview with 295B Parameters
Gate News message, April 23 — Tencent unveiled and open-sourced Hunyuan Hy3 preview, a hybrid mixture-of-experts language model featuring fast and slow thinking fusion. The model comprises 295 billion total parameters with 21 billion active parameters, supporting a maximum context length of 256K
GateNews42m ago
South Korea, Vietnam Sign 70+ MOUs on AI, Energy, and Data Infrastructure
Gate News message, April 23 — South Korea and Vietnam signed more than 70 memoranda of understanding (MOUs) during President Lee Jae Myung's state visit to Hanoi on April 23, covering AI, energy, infrastructure, and telecommunications. A business forum attended by over 500 executives discussed AI an
GateNews42m ago
AI answer engine batch poisoning: In Gemini 3’s correct answers, 56% have no source support
This article points out that when an AI answering engine queries, it retrieves and cites webpages in real time; if the sources are AI-generated or lack evidence, the results get contaminated. This can take effect without further training and is called retrieval contamination. Although Gemini3 has high accuracy, 56% of its answers lack verifiable sources. Case studies such as Lily Ray and Grokipedia show that AI can be easily fooled by self-created content. The conclusion is that the citation layer becomes decoupled from reliable authors, forming a self-reinforcing contamination loop; users still need to trace back to the original sources and should not treat the answer as the endpoint of fact-checking.
ChainNewsAbmedia50m ago
Anthropic Tells Court Deployed Pentagon AI Models Have No 'Kill Switch'
Gate News message, April 23 — Anthropic submitted a filing to the U.S. Court of Appeals for the D.C. Circuit stating that once its AI models are deployed in Pentagon environments, the company has neither visibility nor technical means to control or shut down the models, and no "kill switch"
GateNews52m ago