Day two of Google Cloud Next ‘24 in Las Vegas was quite eventful, with significant emphasis on Gemini and AI agents within the Google Cloud ecosystem. The developer keynote, led by Google Cloud's Chief Evangelist, Richard Seroter, and Senior Developer Advocate, Chloe Condon, focused on demonstrating the capabilities of Gemini in Google Cloud. They highlighted its potential not only to align with current user needs but also to drive innovation and advancement beyond current standards.
The presentation, which included a variety of demonstrations, gave Richard, Chloe, and their fellow Googlers and partners an opportunity to explore in depth the AI technologies and integrations offered by Google Cloud. These technologies are designed to assist with the essential tasks that Google Cloud customers undertake on a daily basis, underlining Google Cloud's commitment to enhancing productivity and innovation through AI-driven solutions. This approach emphasizes Google Cloud's strategy to leverage AI to simplify complex processes, enhance efficiency, and enable users to achieve more with their cloud-based tasks and projects.
Google Cloud’s generative AI experience for developers starts with Gemini Code Assist. Google Cloud VP and GM Brad Calder showed the audience how support for Gemini 1.5 in Code Assist enables a 1M token context window — the largest in the industry.
Jason Davenport, a Google Cloud Developer Advocate, showcased the utility of Gemini Cloud Assist in enhancing the design, operation, troubleshooting, and optimization of applications through a deep understanding and utilization of the specific context from one's cloud environment. This includes leveraging a broad range of resources such as error logs, load balancer configurations, and firewall rules, illustrating the power of contextual AI to streamline cloud management tasks.
Furthermore, the integration of Gemini across a suite of Google Cloud applications, including BigQuery and Looker, alongside support for advanced AI features like vector search and embedding in Google Cloud databases, highlights Google's commitment to embedding AI deeply within its cloud ecosystem. These advancements, coupled with AI integrations in developer tools and user interface libraries such as Cloud Workstations and React, are empowering developers by enabling them to incorporate multi-modal inputs (e.g., text and images) into their applications. This facilitates the creation of sophisticated AI-driven recommendations, predictions, and syntheses in a significantly reduced timeframe.
The session led by Google Cloud Product Manager Femi Akinde and Chloe Condon exemplified this by demonstrating the rapid development process from conceptualization to the deployment of an immersive and inspirational AI application. This represents a significant leap forward in making AI more accessible and functional for developers, thereby unlocking new possibilities for innovation and application development within the Google Cloud platform.
New things that makes this possible:
App Hub - Announced today, and with a deep integration into Google Cloud Assist, App Hub provides an accurate, up-to-date representation of deployed applications and their resource dependencies, regardless of the specific Google Cloud products they use.
BigQuery continuous queries - In preview, BigQuery can now provide continuous SQL processing over data streams, enabling real-time pipelines with AI operators or reverse ETL.
Natural language support in AlloyDB - With support for Google’s state-of-the-art ScaNN algorithm, AlloyDB users get the enhanced vector performance that powers some of Google’s most popular services.
Gemini Code Assist in Apigee API management: Use Gemini to help you build enterprise-grade APIs and integrations using natural language prompts.
Kaslin Fields, a Google Cloud Developer Advocate, addressed a crucial aspect of AI application development during her talk: transitioning from building a generative AI app to making it ready for production. This step is vital as it involves ensuring the application is scalable, reliable, and capable of handling real-world demands.
Google Cloud offers solutions to this challenge through platforms like Cloud Run and Google Kubernetes Engine (GKE). Cloud Run is designed to offer developers a seamless experience for deploying and scaling applications quickly, making it an excellent choice for those looking to swiftly bring their AI applications from development to production. Its fully managed platform allows for easy adjustments to scale and performance needs, accommodating varying levels of user demand without requiring extensive infrastructure management.
On the other hand, GKE offers a comprehensive feature set that caters to the needs of more demanding or specialized AI applications. It provides a robust, secure, and scalable environment for deploying containerized applications, including those powered by AI, enabling developers to manage their applications with flexibility and efficiency. GKE's strength lies in its ability to support complex workloads and configurations, making it suitable for applications that require advanced computing, storage, and networking capabilities.
Together, Cloud Run and GKE present developers with a range of options for making their generative AI apps production-grade, whether they're building straightforward applications or those with unique and demanding requirements. These platforms illustrate Google Cloud's commitment to providing developers with the tools they need to succeed in the rapidly evolving landscape of AI application development.
New things that make this possible:
Cloud Run application canvas - Generate, modify and deploy AI applications in Cloud Run, with integrations to Vertex AI so you can consume generative APIs from Cloud Run services in just a few clicks.
Gen AI Quick Start Solutions for GKE - Run AI on GKE with a Retrieval Augmented Generation (RAG) pattern, or integrated with Ray.
Support for Gemma on GKE: GKE offers many paths for running Gemma, Google’s open model based on Gemini. Better yet, the performance is excellent.
Steve McGhee, a Google Cloud Reliability Advocate, brought an important perspective to the conversation during the developer keynote, highlighting the complexity of AI applications. He noted that AI apps can exhibit emergent behaviors, which can lead to unforeseen issues. This underscores the unpredictable nature of AI systems and the unique challenges they present in terms of reliability and maintenance.
Charity Majors, cofounder and CTO at Honeycomb.io, further elaborated on this theme by contrasting current system failures with those of the past. She observed that while system failures used to occur in relatively predictable patterns, today's systems are much more dynamic and chaotic. Modern architectures are increasingly complex, characterized by their diversity, constant evolution, and the distributed nature of their components. This complexity makes troubleshooting and ensuring the reliability of these systems a significantly more challenging task.
These insights from McGhee and Majors point to the evolving landscape of technology, where the advent of dynamic, AI-driven architectures demands new approaches to system design, monitoring, and maintenance. As systems become more capable and autonomous, ensuring their reliability and predicting their behavior becomes a more intricate task, requiring advanced tools and methodologies. This shift underscores the importance of developing robust monitoring and analysis tools, like those offered by Honeycomb.io, to manage the complexity and unpredictability of modern, AI-powered systems.
But what generative AI taketh away — the predictability of the same old, same old — it also giveth back in the form of new tools to help you understand and deal with change.
New things that make this possible:
Shadow API detection - In preview in Advanced API Security, shadow API detection helps you find APIs that don’t have proper oversight or governance, so could be the source of damaging security incidents.
Confidential Accelerators for AI workloads - Confidential VMs on the A3 machine series with NVIDIA Tensor Core H100 GPUs extends hardware-based data and model protection to the CPU to GPUs that handle sensitive AI and machine learning data.
GKE container and model preloading - In preview, GKE can now accelerate workload cold-start to improve GPU utilization, save money, and keep AI inference latency low.