Serving large language models (LLMs) can be costly due to the need for hardware accelerators and their inefficient use. Google Kubernetes Engine (GKE) offers features like autoscaling and load balancing to help serve LLMs at scale while reducing costs. To optimize LLM serving, a performance benchmarking tool for GKE automates cluster setup, inference server deployment, and benchmarking to measure performance trade-offs. The article provides recommendations for maximizing throughput on NVIDIA GPUs with GKE and guides on optimizing the model server platform for efficient inference. Read more.
Google's enterprise IT teams manage the infrastructure that supports Googlers, aiming to provide secure, scalable solutions for optimal productivity. To meet internal demands, they leveraged Google's own cloud infrastructure, successfully migrating over 100 vendor-supplied services. This transformation eliminated low-value tasks, enhanced team skills, and enabled the delivery of more valuable features. With the lift-and-shift and cloud modernization complete, Google now shares their journey, the benefits gained, and the risks encountered during the process. Read more.
Generative AI is transforming how users interact with products and services, especially in analytics. Instead of manually building queries, users can now engage with data using natural language, with AI acting as the bridge between them and the data. Over the past decade, Bytecode has helped over 1,000 organizations build data stacks and adopt Looker for trusted self-service analytics. A common question from clients is how to integrate AI into their analytics. For those using Google Cloud's data and AI tools (BigQuery, Looker, and Vertex AI), the process is easier and faster than expected. Read more.
Businesses seek customer data to drive sales and gain insights, but challenges arise when retailers hesitate to share sensitive information. For example, a consumer goods brand could benefit from understanding customer behaviors on a retailer's website, but collaboration is hindered by data privacy concerns. As a result, company leaders are looking to marketing teams and CMOs for solutions. Read more.
Google is announcing the preview of the TreeAH vector index in BigQuery, leveraging advanced approximate nearest neighbor algorithms from Google's research. This new index offers significant latency and cost reductions compared to the previously implemented inverted file index (IVF). The announcement will cover the architectural differences between TreeAH and IVF, performance results, and guidance on when and how to use TreeAH for optimal results. Read more.
Businesses generate vast amounts of real-time data, yet many still rely on slow, batch-oriented analysis. BigQuery is popular for handling large datasets, but users are now demanding real-time capabilities. In response, BigQuery has been transformed into a real-time, event-driven analytical platform. Today, Google announces the preview launch of BigQuery continuous queries, enabling users to manage continuous data streams for faster, real-time insights. Read more.
Cloud SQL Studio, a lightweight tool for querying databases directly from the console, is now generally available for MySQL, PostgreSQL, and SQL Server. It provides a consistent, intuitive interface across all supported database engines, simplifying connectivity and security. Key features include tools for easily creating, editing, and managing databases, as well as an AI assistant that helps users write queries using natural language, boosting productivity and efficiency in database administration. Read more.
Compliance is an ongoing process that evolves with your organization and regulatory changes. Assured Workloads has updated its portfolio with new software-defined controls and policies to better support compliance requirements on Google Cloud. Read more.
Google is exploring how generative AI tools can enhance code security. Recently, they used Gemini 1.5 Pro to quickly reverse engineer WannaCry malware and identify its killswitch in just 34 seconds. This AI model shows potential for improving code vulnerability detection and remediation. However, the approach is still experimental, and further development and validation are needed before it can be considered a reliable security tool. Read more.
Source: https://cloud.google.com/blog/products/gcp