Matthew Berman

Google Cloud CEO: Anthropic, TPUs, Mythos, NVIDIA and more

54 minApr 24, 2026

Key Themes

AI infrastructureTPU strategyInference economicsAgentic workflowsFull-stack co-designEnterprise adoptionCybersecurityData center capacity

Summary

Google Cloud’s AI stack bets on TPUs, inference demand, and full-stack co-design

This conversation centers on Google Cloud’s strategy to scale AI infrastructure profitably by expanding compute supply, monetizing TPUs, and optimizing the entire stack from chips to storage and networking. The discussion contrasts TPUs with NVIDIA GPUs through a total-cost-of-ownership lens, highlights the shift from training-heavy workloads toward inference and agents, and shows how Google is using its own AI tools internally for coding and security. It also frames enterprise adoption and cybersecurity as major drivers of future demand and operational complexity.

Inference economics appear to be the key monetization driver for AI infrastructure, not just raw training capacity.

The discussion explicitly links sustainable AI infrastructure returns to monetizing inference, especially as training spend alone is not a durable model.

Workload-specific silicon design can be a competitive moat when customer demand is strong enough to justify specialization.

Google describes splitting its TPU line into training and inference chips because workloads are diverging and demand is high.

Power efficiency and deployment flexibility are material advantages in AI infrastructure, not just engineering details.

The speaker repeatedly emphasizes dollars per watt, tokens per watt, and the need to deploy more broadly across air-cooled and constrained environments.

Enterprise AI adoption is accelerating and may be a more durable demand source than consumer hype cycles.

The conversation cites growing Gemini Enterprise token usage and rapid user growth, along with multiple enterprise customer examples.

AI is becoming a core productivity and security layer inside large organizations, which supports ongoing spend on cloud, tooling, and model serving.

Google describes internal coding, review, debugging, and security workflows that rely on AI, showing that the technology is moving into operational infrastructure.

Select any chapter text to Deep Dive with AI

01Compute, TPUs, Data Centers, and AGI Job Fears

Google Cloud’s AI strategy is presented as a large-scale infrastructure challenge: secure enough power, land, and compute, then monetize that capacity through TPUs and cloud services. The discussion argues that Google’s vertical integration improves margins and that the company must keep sharing compute with customers rather than hoarding it. It also addresses public concerns about AI-driven job loss by pointing to examples where AI improved productivity without eliminating jobs, while stressing that Google continues to hire in several functions.

Google prepared for the AI wave by diversifying energy sources, securing land, and reworking data center construction and deployment.

TPUs are increasingly used beyond AI labs, including by capital markets and HPC customers.

Owning silicon improves unit economics whether Google is selling TPUs or using them internally.

Compute remains supply-constrained, so Google cannot simply hoard capacity if it wants to fund expansion.

AI job displacement fears are countered with examples showing productivity gains without layoffs.

Google Cloud says it is still hiring in product, sales, deployment engineering, and new product areas.

AI is also being positioned as a cybersecurity tool for prioritizing and fixing vulnerabilities.

02NVIDIA vs TPU: TCO, 8th Gen Split, and the Training-to-Inference Shift

The conversation compares NVIDIA’s TCO narrative with Google’s TPU positioning, arguing that customers select the best platform on economics and performance, not branding. Google explains TPU efficiency as a system-level advantage spanning chips, networking, latency, memory throughput, and compiler tooling. It then describes splitting the 8th-generation TPU line into training and inference versions because usage is shifting toward inference, multimodal generation, and agentic workflows that require different memory and deployment characteristics.

Customers choose TPUs based on total cost of ownership.

TPU advantage comes from the full stack, not just the chip.

Google optimized for dollars per watt and tokens per watt because power constraints were expected.

Inference demand for the 8th-generation TPU has exceeded expectations.

The 8th-gen TPU line is split into training-oriented and inference-oriented chips.

Gemini usage is moving from search/chat to content generation and then agents.

Agent workflows change memory and token patterns, which affects chip design.

Inference chips are intended for broader deployment, including air-cooled environments.

03Extreme co-design, Anthropic, and cybersecurity

Google Cloud describes an end-to-end co-design approach across models, chips, storage, and networking to support agentic AI workloads. The chapter then covers Anthropic as a customer within a broader platform-company relationship, Google’s ability to serve very large models, the growing value of structured enterprise data, and the company’s internal use of AI for coding and security. The closing discussion focuses on the cybersecurity implications of more capable models, including AI-assisted offense, continuous red teaming, and large-scale vulnerability remediation.

Google optimizes the full stack for agentic usage rather than treating TPU performance in isolation.

Higher-throughput storage and low-latency serving are important for training and agent workflows.

Anthropic is framed as a customer even while competing with Google in enterprise AI.

Google says its TPU serving stack can handle the largest models.

The company does not see a current slowdown in pre-training from a capacity perspective.

Structured enterprise data improves retrieval, citations, and agent reliability.

Internal coding tools are widely used for coding, review, debugging, and incident response.

Cybersecurity concerns include model-enabled attacks and the need for continuous red teaming.

04What keeps Thomas up at night?

Thomas says his main concern is balancing long-term infrastructure planning with near-term product execution. He emphasizes securing enough capital, data center, network, and TPU capacity while ensuring Google solves the right customer problems as the market changes quickly. He closes by pointing to rapid Gemini Enterprise growth as evidence that the strategy is resonating, while noting that cybersecurity remains a major area to stay ahead of.

Long-term capital planning for infrastructure is a major concern.

Data center, network, and TPU capacity must keep pace with demand.

Cybersecurity is increasingly important as AI capabilities advance.

Google is focused on whether it is solving the right customer problems.

Gemini Enterprise usage is growing quickly.

The team wants proactive solutions that stay ahead of market shifts.