“High Optane” Fuel For Performance

Introduction

In early August of 2020, Intel presented to a group of delegates for GestaltIT’s Storage Field Day 20 (#SFD20). I knew it was on Optane and had already had some exposure to it, but what I was familiar with was Intel Optane SSD, I wasn’t familiar with Intel Optane Persistent Memory (PMem). I will admit when I heard Intel was presenting, I was prepared for a long deep in the weeds presentation that would leave little for us to realize the real customer benefit. I was dead wrong. Intel knocked it out of the park.

Intel readily admitted they are relatively new to this game. Last April Intel launched its first-generation PMem product, code-named Apache Pass, they call the 100 series, and just recently in June released the next generation, code-named Barlow Pass, or their 200 series product. Between the first generation and the second generation product, customers have experienced 25% more bandwidth with about 3 watts of power savings per DIMM. Intel calls it a “Solution on a DIMM,” which I thought was clever, but I soon found out just how accurate that statement is.

What’s the difference?

PMem v DRAM

From a packing and presentation perspective, PMem and DRAM are similar in that both reside on the same bus, packaged as DIMMs, and both may operate similarly, storing volatile data. Where it differs is the scale. Intel Optane PMem ranges from 128GB, 256GB, and 512GB, where DRAM is smaller and typically maxes out around 32GB. Where it also differs is in the name, Persistent Memory, meaning Intel Optane PMem can persistently store data without power.

PMem v SSD

While they both use the same memory media, the presentation and packaging are different.

PMem: DIMM, co-exists with DRAM expanding memory and improving overall performance
SSD: NAND packaging, fast storage tier, typically used as a complement to existing storage tiers.

The Intel Optane PMem can operate in multiple modes; there is a memory mode which is transparent to the application, allowing you to pair persistent memory with DRAM. In this mode, DRAM is leveraged as cache, and PMem is used for extra capacity. However, just like typical DRAM, it is volatile memory and not persistent. The other mode, App Direct Mode, is where you begin to see the real power of persistent memory. This requires a persistent memory aware OS and application. In this mode, you have two pools of memory, high-speed DRAM, and a separate pool for persistent memory. The good news is most of the contemporary server operating systems are PMem aware, and several ISVs (independent software vendors) have already optimized or are currently developing their enterprise applications to be PMem aware because they are seeing the value of leveraging a persistent memory stack. Below is just a shortlist of some of the companies and operating systems I found having announced support for Intel Optane Persistent Memory.

Aerospike
Dell EMC VxRail and here is an ESG Lab Validation Report from July of 2020
SAP HANA
VMware vSphere Support for both App Direct, and Memory Mode
Compatible Operating Systems for Intel Optane Persistent Memory

And this is where it got interesting for me. Intel mentioned this project called, DAOS, during the PMem discussion and that maybe we’d talk more about it later. When “later” came is when Intel “blew me away.”Now, I could go into how fast Optane SSDs are and give you a bunch of data points, but what I found supremely interesting was the application of the Intel Optane Persistent Memory in a software-aware solution. Don’t get me wrong, the Optane SSDs are equally as impressive, but I found PMem to potentially be the game-changer in high-performance computing. The Intel Optane software-aware solution we talked about was, DAOS, Distributed Asynchronous Object Storage, the project Intel has been working on for about two years or so. DAOS is a standalone distributed high-performance file system technology and is an open-source project. DAOS, unlike Lustre, is designed for exascale performance and capacity. I have been working with high-performance file systems for a good part of my career across a few different companies and one of the things I learned when working with some of the largest labs in the country was many of the filesystems today struggle when approaching billions and billions of files in a high-performance computing environment. This may not seem too essential to you, but when you consider the price of these supercomputers and the cost to operate, feeding the compute cores as fast as possible is vital. The last thing you want is the compute cores to sit idle waiting for data, but in some cases, that is precisely what happens to these very expensive cores. Several years ago, I was working on a project with a large company that delivered a Lustre-based storage solution to solve precisely that problem, performance at scale, exascale. Lustre seemed to do fine with 100s of millions of files, but when we approached billions is when we began to see issues, and the fragility of the filesystem was exposed. When it came right down to it, the leading challenge to the Lustre filesystem, or any filesystem for that matter, is the fact that it had the heavy burden of being a filesystem. We needed to create a new approach that didn’t necessarily resemble what we had in the past. We decided to look at creating a net-new high-performance storage solution based on object storage. Object stores are flat and typically perform quite handily when retrieving data, and lots of it. While all of our research at the time pointed to this as the next generation of high-performance storage solutions, sadly the work never began, which is another reason I perked up when we started talking about DAOS.

What is Exascale?

Exascale Computing Project

First, if you want to learn more about Exascale Computing, visit the Exascale Computing Project website for more information and details. ECP is a multi-laboratory initiative in conjunction with the DOE-led Exascale Computing Initiative (ECI), a partnership between two DOE organizations, the Office of Science (SC) and the National Nuclear Security Administration (NNSA). It was formed in 2016 to accelerate research, development, acquisition, and deployment projects to deliver exascale computing capability to the DOE laboratories by the early to mid-2020s.

Brief history

When we measure the performance of these high-performance computer platforms, we talk about FLOPS, or Floating Point Operations per Second. FLOPS is the number of operations a processor is capable of performing each second. The chart below shows the various units of measurement.

Name	Unit	Value
KiloFLOPS	kFLOPS	10³
MegaFLOPS	MFLOPS	10⁶
GigaFLOPS	GFLOPS	10⁹
TeraFLOPS	TFLOPS	10¹²
PetaFLOPS	PFLOPS	10¹⁵
ExaFLOPS	EFLOPS	10¹⁸
ZettaFLOPS	ZFLOPS	10²¹
YottaFLOPS	YFLOPS	10²⁴

In the 70s, the compute power was measured in MFLOPS, or MegaFLOPS, then in the 90s, we reached TFLOPS, and in the 2000s, we now sit at PFLOPS or PetaFLOPS. Today, the world’s most powerful supercomputer is Japans’ Fugaku, reaching over 415 PFLOPS, and while impressive, it doesn’t come close to that of EFLOPS, or 10^18 FLOPS, or a quintillion FLOPS. A quintillion, by the way, is a 1 followed by 18 zeroes. If you had one quintillion bytes of data, for example, you’d have an Exabyte. Hence, ExaFLOPS.

Why is Exascale Computing so Important?

In a 2010 report by the Department of Energy called, “The Opportunities and Challenges of Exascale Computing,” the authors answer the question, why?

“The most obvious question – the key question really – is of course: why go to the exascale? This question is not meant in the trivial sense that one would pose for any expenditure whatsoever in leading-edge computing technologies but rather is motivated by the fact that the transition from current petascale computing to the exascale will involve investments across the board – from hardware to fundamental algorithms, programming models, compilers, and application codes – that will dwarf previous levels of investment made as computer architectures have evolved in the past. That is, we recognize that the values to society extracted from this change in computing paradigm have to be commensurate with the costs of developing this type of computing – and given the substantial costs, we need to be sure that the extracted values are similarly substantial. We will make the argument in the following that the extracted values are very large – but will do so in two stages, first by making some general points about the present frontiers of computing – independent of discipline – and then by focusing on a few example disciplines to illustrate the more general point.”

The shift from petascale to exascale is not evolutionary; it is revolutionary. This shift requires as the authors explain above, serious investment across the board to achieve this new plateau in computing. Back in 2010 this report and others similar to it projected 2020 as the year when we would see exascale computing. I haven’t seen it listed in the Top 500, so it is safe to say we haven’t hit the target, yet. However, imagine if we had this today and the computational benefits we would all realize? Perhaps we could see more significant movement in finding a solution to COVID-19, or even cancer research would have been accelerated by a factor of 10. We may have also seen our aerospace industry develop more hyper-efficient jet engines with fewer emissions into our atmosphere. The reason exascale is so essential is to allow us to experience more significant benefits from the science and research currently being conducted but at a much faster rate. However, as I mentioned earlier, without an ability to feed an exascale computing platform the data at a rate comparable with its processing, it would be moot to invest in such a large scale project. This is why I found DAOS so exciting and compelling.

DAOS Performance

During the presentation, Intel shared a slide (see below) showing DAOS breaking world records as measured by the independent benchmarking organization Virtual Institute for IO. Impressive, yes, but the most remarkable bit of this is the number of servers used to achieve this performance was a mere 30, and only 52 clients.

This is based on the combination of Intel Optane Persistent Memory and the DAOS software stack. What I didn’t talk about much early on is how PMem, when it is in “app direct mode,” efficiently operates to maximize its memory usage and, ultimately, performance. Typically in a storage system, you need to align your data to the blocks to maximize your efficiency and eliminate waste. When Optane PMem is in “app direct mode,” it breaks down the blocks barrier and allows PMem aware software solutions to take advantage of byte addressability. This is critical and certainly part of the secret sauce of the DAOS performance.

What Intel has done is not only deliver, as they do, a quality hardware solution in Optane, but they are jumping into software development, an area where Intel is not typically well known, developing a scalable, distributed object storage solution based on its persistent memory, that can potentially achieve the exascale level of performance. And to be fair, Intel has always been involved in software, especially open source, as they were the third-largest contributor to the Ceph code base, but not many know this fact. So, if it is true that Intel has and continues to deliver, this massively scalable persistent memory-based storage solution, then it would follow that it makes sense to see the investments necessary to revolutionize computing to achieve a true exascale computing platform solution with a data store capable of feeding those cores at a “persistent” scale.

This combination is deadly, Optane SSDs, Optane Persistent Memory, and DAOS. Why is it deadly? Just think if a startup or an existing software vendor wrapped some additional services around DAOS to deliver to the Life Sciences community researching cures for rare diseases, or the Aerospace industry to radically change jet engine emissions, or intelligent facial recognition artificial intelligence to aid the government to identify terror groups or anything else your imagination can conjure up. Think of the possibilities; I know I am.

Closing

This certainly was one of the highlights during Storage Field Day 20 for me; I cannot wait to learn more about this solution and how Intel’s customers are using this combination to achieve their own goals and objectives for their businesses. Intel Optane Persistent Memory will forever change how we think of compute and storage going forward, or persistent memory-based storage to be more accurate.

I will leave you with this taken from The Exascale Computing Project website:

“Information technology and applied science engineering play an essential role in society, from improving decision-making to advancing humanity’s knowledge of the world and the universe. Supercomputing, or high-performance computing (HPC), enables scientists and engineers to push the edge of what is possible for US science and innovation. Using HPC-based modeling and simulation, they can study systems that otherwise would be impractical or impossible to investigate in the real world due to their complexity, size, dangerousness, or fleeting nature.

Applying the leading-edge capabilities of HPC-based modeling and simulation is essential to the execution of the US Department of Energy (DOE) missions in science and engineering and DOE’s responsibility for stewardship of the nation’s nuclear stockpile. To address threats to security and future challenges in economic impact areas, the United States is making a strategic move in HPC—a grand convergence of advances in co-design, modeling and simulation, data analytics, machine learning, and artificial intelligence. The success of this convergence hinges on achieving exascale, the next leap forward in computing.

The exponential increase in memory, storage, and compute power made possible by exascale systems will drive breakthroughs in energy production, storage, and transmission; materials science; additive manufacturing; chemical design; artificial intelligence and machine learning; cancer research and treatment; earthquake risk assessment; and many other areas.

With exascale computing, scientists and engineers will be able to solve problems that previously were out of reach, and the effects on the lives of the American people and the world will be profound.”

Yes, the effects on the lives of the American people and the world will be profound.

This is why exascale computing is critically important and why I’m very glad Intel is involved.

David A. Chapa, Chief Analyst, The CTE Group

David A. Chapa

“High Optane” Fuel For Performance

Introduction