The Circuit

Breaking the Memory Wall: Micron’s Strategy for the AI Era

53 minMay 5, 2026

Key Themes

AI memory demandInference bottleneckKV cacheMemory hierarchyBandwidth constraintsStorage growthPower efficiencyFab expansion

Summary

Micron argues AI is creating a durable memory and storage boom beyond the usual chip cycle.

This episode frames AI as a structural change in how data centers use memory: not just training large models, but keeping them responsive during inference, managing longer context windows, and supporting agentic workflows. The guest explains why KV cache, bandwidth, power efficiency, and persistent storage are becoming central constraints, then broadens the discussion to AI-driven SSD demand, fab buildouts, and semiconductor supply limits. The overall message is that memory and storage are moving from support components to strategic AI infrastructure.

Treat AI memory demand as a structural theme rather than a short-lived cycle.

The discussion repeatedly argues that inference, longer context windows, and agentic workloads create persistent demand for high-performance memory and storage.

Bandwidth and power efficiency may matter as much as raw capacity in AI infrastructure.

The episode stresses that AI scaling is increasingly limited by memory bandwidth and data-center power budgets, not just flops or storage size.

AI storage demand could expand as models make more data effectively “hot.”

The guest argues that AI-generated content and broader retrieval patterns increase how often data is accessed, lifting demand for SSDs and related infrastructure.

Semiconductor supply may remain tight because capacity buildouts take years.

Micron says the industry is already behind demand and that new fabs take a long time to come online, limiting near-term supply relief.

AI itself is becoming a tool for semiconductor manufacturing and yield improvement.

The company describes using AI to speed engineering, detect issues, improve yields, and accelerate fab ramps, which could compound future supply advantages.

Select any chapter text to Deep Dive with AI

01AI Transforms Memory Demand: From Training to Inference, KV Cache, and Context Windows

The conversation opens by arguing that AI is changing memory from a background component into a strategic data-center asset, especially as inference becomes more important than training. The guest explains how Micron has been preparing across HBM, LPDDR5, SSDs, and related products, then introduces KV cache and longer context windows as the key reason memory demand rises sharply during decoding. The central idea is that inference requires systems to remember prior state, and if they cannot, recomputation drives much higher compute usage.

AI is reshaping memory’s role in the data center, especially for inference and advanced model training.

The guest says the current cycle feels different from past memory cycles because demand is tied to a structural AI shift, not just normal cyclicality.

Micron had early awareness of AI trends and has been developing multiple AI-adjacent memory and storage technologies over several years.

The conversation distinguishes the training era from the inference era, arguing inference creates a new memory bottleneck.

Inference is described as using memory to remember, while training uses memory to learn and then discard.

KV cache is explained as stored intermediate state used during decoding so the model does not need to recompute prior steps.

Longer context windows, more parameters, and more concurrent agentic workloads all increase KV cache demand per GPU.

Without enough memory, systems must recompute prior work, causing compute demand to grow sharply rather than linearly.

02Memory Hierarchy, Power Constraints, and AI-Driven Storage Demand

This chapter lays out the AI memory hierarchy from HBM near the GPU down through main memory, expansion memory, SSD-backed context storage, and finally large networked data lakes. It then turns to bottlenecks: DRAM and SSD shortages, bandwidth limits, and power constraints at the data-center level. The conclusion is that AI is not only increasing demand for memory, but also increasing demand for storage because more data is generated, consumed, and kept hot.

HBM sits closest to the GPU and is used for training and inference; typical KV cache in HBM is about 10 to 100 GB.

If HBM is insufficient, KV cache moves to main memory attached to a CPU, which is larger but slower.

Expansion memory is described as a not-yet-deployed idea using high-capacity DIMMs connected, possibly via optics, in a separate box.

Context memory storage uses SSDs to extend capacity further, trading latency and bandwidth for much higher total storage.

At the bottom of the hierarchy are large network data lakes backed by massive SSDs.

Current pain points are described as both DRAM and SSD shortages throughout the stack.

The speaker says the main bottleneck is increasingly memory bandwidth rather than compute flops.

HBM4 is described as offering more than 2x the bandwidth of HBM3e.

Power becomes a major data-center bottleneck; more performance is only useful if it fits within a fixed power budget.

AI drives storage demand by generating more data, enabling more creation, and warming previously cold data so it is accessed more often.

03Personal AI Memory, Data Center SSDs, and the Global Fab Buildout

The discussion broadens from model memory to personal AI memory, arguing that today’s agentic workflows still lack durable session-to-session continuity and therefore rely on workarounds. It then pivots to enterprise storage, where large, power-efficient SSDs are positioned as a way to consolidate footprint and improve gigabyte-per-watt economics. The final stretch focuses on Micron’s view that AI is accelerating engineering and manufacturing, but the industry is still supply-constrained because too few fabs were built and new capacity takes years.

AI agents still lack persistent session-to-session memory, creating demand for better storage and context management.

Users are currently relying on workarounds such as file structures and harnesses to simulate memory.

Context length is growing rapidly, reportedly at about 30x per year, increasing memory needs.

Micron positions SSDs as a key enabler for enterprise AI because they can store large amounts of data with lower power and footprint.

A 245TB SSD was cited as an example of consolidating storage, reducing data center space by more than 80%.

SSDs improve gigabyte-per-watt efficiency and reduce supporting infrastructure like networking, power supplies, and fans.

The pace of engineering, yield improvement, and fab construction is accelerating, with AI used internally to speed design and development.

The market may underestimate the long-term AI opportunity and the depth of enterprise adoption.

Micron says the industry is already behind demand because too few fabs were built, and new capacity takes years to come online.

Micron is building or expanding fabs in multiple regions, including Boise, New York, Virginia, Singapore, Japan, and Taiwan.

04Closing Remarks

The hosts close by reflecting on how quickly AI capabilities are improving and how much of that progress depends on the underlying memory and storage stack. They note that the work being done in infrastructure makes AI more useful and economically viable, particularly for hyperscalers, and then sign off.

AI systems are improving so quickly that today's models will seem much less capable in retrospect.

The ongoing engineering work behind AI infrastructure helps increase usefulness and value.

Memory and storage are described as a major part of the AI economics.

Hyperscalers benefit as AI monetization improves with broader usage.

The hosts thank the guest and close the episode.