Site icon The CTE Group

A longstanding industry flaw is the tendency to choose familiar solutions over foundational change

Originally a LinkedIn Post, April 10, 2026

Friday thoughts

We keep defaulting to what’s familiar, even when the pressure shows up elsewhere.

Years ago, when I was training a new SysAdmin, I told him the next thing we’d cover was backup, and he said, “I’m not interested in learning backup. Backup isn’t sexy.” That stuck with me because the reason we aren’t addressing the right pressure points for AI might be the same. The pressure is showing up in memory, but memory isn’t sexy.

Just like backup, memory isn’t visible. You don’t demo it, and you don’t get that instant “wow” moment like you did when SSDs replaced spinning disk. Storage gave us something we could point to as bigger, faster, and measurable. Memory doesn’t behave that way. When it’s working, nothing happens, things just don’t slow down. All systems GO. So the industry keeps coming back to faster storage, more bandwidth, better pipelines, as if the problem is how quickly we can move data from disk to compute. It’s not.

We already lived through this once with 3D XPoint, or Intel Optane. It was a great technology, but it was packaged in a way that confused the market. Instead of leaning into memory, Intel split it into SSD and persistent memory, and the result was an expensive SSD most people didn’t understand, and a persistent memory model only a few really knew how to use. The real disruption was memory, but it got packaged as storage because that was familiar, and the industry followed suit, comparing it dollar-for-dollar to NAND, which made it look expensive instead of valuable, and missed the shift.

Now we’re doing it again, trying to solve a memory locality problem with storage protocols. Yes, NVMe has “memory” in the name, but it’s a storage interface on NAND, not system memory. It’s fast and efficient, but it is still not memory. GPUs keep getting faster, HBM isn’t scaling at the same rate, and datasets are only getting larger, so in real production, we spend more time moving data than executing on it.

We’re not ignoring what we can improve today. For example, persistent KV cache, better locality, and smarter data movement are real optimizations, but they’re mitigations, not resolutions. This is Amdahl’s Law at work, reducing the penalty but not removing the constraint.

Right now, we don’t have enterprise-ready, composable memory at scale, but that’s where this is going. What we do have is a growing gap between how fast we compute and how effectively we place and move data. I’ve been calling that the Nebula Gap. The memory wall didn’t go away, it moved inside the GPU, and if that’s true, then the answer isn’t faster storage, it’s a different architecture.

CXL, driven by the CXL Consortium, opens the door, not as bigger DRAM, but as shared memory at rack scale. If memory becomes pooled, shared, and dynamic, who decides where data lives when the GPU needs it?

That’s not a storage problem, it’s a control plane problem. What’s emerging here is a new category: the AI Memory Control Plane.

Exit mobile version