Insights | GoPomelo

Introducing Gemini: The Largest and Most Capable AI Model from Google

Written by Oliver Machwirth | Dec 7, 2023 11:13:10 AM

Making AI More Helpful for Everyone

A Note from Sundar Pichai, Google CEO and Alphabet CEO:
The current technological shift, particularly in AI, is a monumental opportunity for scientific discovery, human progress, and improving lives. This transition, arguably the most profound in recent history, surpasses the shifts to mobile and the web. AI holds the promise of creating opportunities for people globally, heralding new waves of innovation, economic progress, and unparalleled advancements in knowledge, learning, creativity, and productivity.

The excitement lies in the potential of AI to be universally helpful, affecting lives globally. As an AI-first company for nearly eight years, the progress in AI application is accelerating. Millions are now using generative AI in products for tasks that were impossible a year ago. Developers are leveraging these models and infrastructure to create novel AI applications. Businesses worldwide are growing with these AI tools.

This progress is just the beginning. Google's approach to AI is both bold and responsible. It focuses on ambitious research and developing capabilities that benefit society, integrating safeguards and collaborating with governments and experts to manage emerging risks.

With the launch of Gemini, Google marks a new phase in AI capabilities. Gemini 1.0, with versions Ultra, Pro, and Nano, signifies a major leap in AI development. This era of models is a significant scientific and engineering endeavor at Google, opening new doors for global opportunities.

 

Introducing Gemini

By Demis Hassabis, CEO and Co-Founder of Google DeepMind

AI has been a lifelong focus for many researchers, including myself. From the early days of programming AI for computer games to years in neuroscience, the goal has always been to build smarter machines for humanity's benefit.

Google DeepMind has always aimed to create a new generation of AI models, akin to human interaction and understanding. The aim is to develop AI that is not just smart software but a useful and intuitive assistant.

With Gemini, this vision is closer to reality. It represents a collaborative effort across Google, including Google Research. Gemini is multimodal, able to understand and operate across various types of information like text, code, audio, image, and video. It's Google's most flexible model, optimized for use from data centers to mobile devices, enhancing the way developers and customers use AI.

Introducing Gemini: The Largest and Most Capable AI Model from Google

Gemini is also the most flexible model yet — able to efficiently run on everything from data centers to mobile devices. Its state-of-the-art capabilities will significantly enhance the way developers and enterprise customers build and scale with AI.

Google optimized Gemini 1.0, our first version, for three different sizes:

  • Gemini Ultra — the largest and most capable model for highly complex tasks.
  • Gemini Pro — the best model for scaling across a wide range of tasks.
  • Gemini Nano — the most efficient model for on-device tasks.

State-of-the-art performance

State-of-the-art performance is a hallmark of Gemini models, which have been meticulously tested and evaluated across a diverse array of tasks. These tasks range from understanding natural images, audio, and video to mathematical reasoning. In large language model (LLM) research and development, Gemini Ultra has demonstrated exceptional results, surpassing the current state-of-the-art on 30 of the 32 widely recognized academic benchmarks.

Gemini Ultra notably achieved a 90.0% score, a groundbreaking feat marking it as the first model to surpass human experts in MMLU (massive multitask language understanding). This benchmark involves a challenging mix of 57 subjects, including mathematics, physics, history, law, medicine, and ethics, to assess both world knowledge and problem-solving skills.

The innovative approach to MMLU adopted by Gemini allows it to utilize its reasoning abilities more effectively. By contemplating more deeply before responding to complex questions, Gemini demonstrates notable advancements over relying solely on initial impressions.

Gemini surpasses state-of-the-art performance on a range of benchmarks including text and coding.

An illustrative chart demonstrates Gemini Ultra's superior performance on common text benchmarks compared to GPT-4, with API numbers calculated for any unreported figures. In both text and coding benchmarks, Gemini establishes a new standard of excellence.

Additionally, Gemini Ultra has achieved a state-of-the-art score of 59.4% on the novel MMMU benchmark. This benchmark encompasses multimodal tasks across various domains that require thoughtful reasoning.

In the realm of image benchmarks, Gemini Ultra has outshone previous models without needing support from object character recognition (OCR) systems, which typically extract text from images for further analysis. These benchmarks underline Gemini's inherent multimodal capabilities and offer a glimpse into its more advanced reasoning abilities.

Next-generation capabilities

Next-generation capabilities are evident in the innovative approach Google has adopted for creating multimodal models. Previously, the norm in model development involved training distinct components for varying modalities, followed by a process of integration to achieve a semblance of multimodal functionality. While effective in certain tasks like image description, these models often faltered when faced with more abstract and complex reasoning.

In contrast, Google's Gemini model has been architected to embrace multimodality from its inception. The initial training encompasses a variety of modalities, laying a robust foundation. Subsequently, the model undergoes fine-tuning with additional multimodal data, enhancing its precision and effectiveness. This foundational approach empowers Gemini to intuitively understand and reason across diverse inputs, surpassing existing multimodal models in performance. The capabilities of Gemini are, therefore, regarded as state-of-the-art across numerous domains.

Sophisticated reasoning

Gemini 1.0, developed by Google, showcases sophisticated multimodal reasoning capabilities, enabling the system to interpret complex written and visual information. This advanced skill set positions it as a unique tool in revealing knowledge that might otherwise remain obscured in extensive data pools.

The system's extraordinary capacity to derive insights from an immense volume of documents, by effectively reading, filtering, and comprehending information, is set to accelerate innovation across various sectors, including science and finance. This ability to process information at digital speeds promises to drive significant breakthroughs.

Gemini unlocks new scientific insights

Understanding text, images, audio and more

Recognizing and understanding text, images, audio, and more simultaneously, Gemini 1.0 exhibits a remarkable ability in processing nuanced information. This capability enables it to effectively answer questions on complex topics. The strength of Gemini 1.0 particularly shines in explaining the reasoning behind intricate subjects such as math and physics, demonstrating its advanced comprehension and analytical skills.

Gemini explains reasoning in math and physics

Advanced coding

Gemini demonstrates proficiency in understanding, explaining, and generating high-quality code in major programming languages like Python, Java, C++, and Go. Its cross-language capabilities and complex information reasoning place it at the forefront of coding models globally.

Gemini Ultra stands out in key coding benchmarks, including HumanEval, an industry standard for coding task performance evaluation, and Natural2Code, Google's internal dataset focusing on author-generated sources instead of web-based content.

Gemini's flexibility extends to powering advanced coding systems. Google introduced AlphaCode two years ago, marking the first AI code generation system to achieve competitive performance in programming contests.

Building on Gemini's technology, Google developed AlphaCode 2, a more sophisticated code generation system. AlphaCode 2 excels in competitive programming challenges, incorporating complex mathematical and theoretical computer science concepts.

Compared to the original AlphaCode, AlphaCode 2 demonstrates significant improvements on the same evaluation platform, solving nearly double the problems. It is estimated to outperform over 85% of competition participants, a substantial increase from the nearly 50% mark set by AlphaCode. AlphaCode 2 achieves even greater success when programmers collaborate with it, specifying properties for the code samples.

Google is enthusiastic about the growing utilization of advanced AI models by programmers. These tools offer collaborative support, aiding in problem-solving, proposing code designs, and assisting with implementation. This collaboration enables the faster development and release of apps and the design of superior services.

More reliable, scalable, and efficient

Gemini 1.0 was trained at scale on AI-optimized infrastructure, utilizing the in-house designed Tensor Processing Units (TPUs) v4 and v5e from Google. This model was crafted to be highly reliable and scalable for training, as well as efficient in deployment.

On Google's TPUs, Gemini operates at a significantly faster pace compared to earlier models, which were smaller and less capable. These custom-designed AI accelerators are fundamental in powering AI-driven products used by billions, such as Search, YouTube, Gmail, Google Maps, Google Play, and Android. Additionally, they have empowered numerous companies globally to train large-scale AI models in a cost-effective manner.

The latest advancement in this technology is the introduction of the Cloud TPU v5p, the most powerful, efficient, and scalable TPU system to date, specifically designed for training advanced AI models. This next-generation TPU is set to accelerate the development of Gemini and assist developers and enterprise customers in training large-scale generative AI models more swiftly. This enhancement paves the way for quicker delivery of new products and capabilities to customers.

Built on a Foundation of Responsibility and Safety

GoPomelo recognizes the importance of advancing bold and responsible AI, a commitment mirrored by Google in their endeavors. Drawing on the principles established by Google, including AI Principles and robust safety policies prevalent across their products, there are new protections being integrated to cater to the multimodal capabilities of Gemini. Throughout its development, potential risks are being carefully considered, with efforts focused on testing and mitigation.

Gemini stands out as having undergone the most comprehensive safety evaluations among Google's AI models, particularly in areas of bias and toxicity. This includes pioneering research into risk domains such as cyber-offense, persuasion, and autonomy. Google Research’s advanced adversarial testing techniques have been instrumental in preemptively identifying critical safety concerns before Gemini’s implementation.

To uncover potential blindspots in internal evaluation methods, collaboration with a varied group of external experts and partners is underway, aiming to rigorously test the models across numerous issues.

During Gemini’s training stages, to monitor content safety and adherence to established policies, tools like the Real Toxicity Prompts are being utilized. This benchmark, developed by the Allen Institute for AI, comprises 100,000 varied toxicity-level prompts sourced from the web. More information about this initiative will be available shortly.

To reduce risks, specialized safety classifiers have been developed by Google. These are tasked with recognizing, labeling, and filtering content related to violence or negative stereotypes. This, coupled with strong filters, contributes to a layered strategy designed to enhance Gemini’s safety and inclusivity. Moreover, ongoing efforts are in place to tackle known challenges faced by models, such as factuality, grounding, attribution, and corroboration.

For GoPomelo, like Google, responsibility and safety remain at the forefront of AI model development and deployment. This enduring commitment necessitates collaborative building. In this vein, engagement with the industry and broader ecosystem is crucial in establishing best practices and safety and security benchmarks. This is evident through participation in platforms like MLCommons, the Frontier Model Forum and its AI Safety Fund, and the Secure AI Framework (SAIF), formulated to address AI-specific security risks in both public and private sectors. The journey continues with partnerships involving researchers, governments, and civil society groups globally in the development of Gemini.

Gemini Safety and responsibility at the core

Making Gemini available to the world

Gemini 1.0 is now rolling out across a range of products and platforms:

Gemini Pro in Google products

Gemini Pro is being introduced to billions through Google products. Bard is now powered by a fine-tuned version of Gemini Pro, enhancing its capabilities in reasoning, planning, and understanding. This marks Bard's most significant update since its debut, available in English across more than 170 countries and territories. Plans are in place to extend this to more modalities, languages, and locations soon.

Pixel 8 Pro, the first smartphone designed to run Gemini Nano, is incorporating this technology. Features like Summarize in the Recorder app and Smart Reply in Gboard, starting with WhatsApp integration, are powered by Gemini Nano. Expansion to additional messaging apps is expected next year.

Over the coming months, Gemini will be integrated into more Google products and services, including Search, Ads, Chrome, and Duet AI. Currently, Gemini is being tested in Search, enhancing the Search Generative Experience (SGE) with a 40% reduction in latency for English searches in the U.S., along with quality improvements.

Building with Gemini

From December 13, developers and enterprise customers can access Gemini Pro via the Gemini API in Google AI Studio or Google Cloud Vertex AI. Google AI Studio is a web-based tool for app prototyping and launching, providing easy access with an API key. Vertex AI offers a fully-managed AI platform with customization options for Gemini, incorporating Google Cloud's enterprise-level security, privacy, and data governance.

Android developers will have the opportunity to utilize Gemini Nano for on-device tasks through AICore, a new system capability in Android 14, debuting on Pixel 8 Pro devices. Early previews of AICore are available for sign-up.

Gemini Ultra coming soon

Regarding Gemini Ultra, extensive trust and safety evaluations, including external red-teaming, are underway, alongside model refinements through fine-tuning and reinforcement learning from human feedback (RLHF). Select customers, developers, partners, and safety and responsibility experts will have early access to Gemini Ultra for feedback before its broader release to developers and enterprise customers next year.

Additionally, Bard Advanced is set to launch early next year, offering an advanced AI experience with access to top-tier models and capabilities, beginning with Gemini Ultra.

The Gemini era: enabling a future of innovation

This represents a significant milestone in the development of AI, marking the beginning of a new era in which rapid innovation and responsible advancement of model capabilities are key focuses.

Great strides have been made in the development of Gemini, with ongoing efforts to enhance its abilities in future versions. This includes advancements in planning and memory, as well as expanding the context window to process an even greater amount of information, thereby improving response quality.

The potential of a world responsibly empowered by AI is truly exciting. This future, driven by innovation, is poised to amplify creativity, expand knowledge, propel scientific advancements, and transform the lives and work of billions of people globally.