Nvidia debuts upgraded GH200 Grace Hopper chip with high-speed memory

Nvidia Corp. today debuted an upgraded version of its GH200 Grace Hopper chip that will enable companies to run more sophisticated language models.

Nvidia Chief Executive Officer Jensen Huang detailed the chip during a keynote at Siggraph, a computer graphics conference taking place this week in Las Vegas. Huang said the GH200 (pictured) is currently in production. Nvidia plans to start sampling the chip toward the end of the year and estimates that it will start shipping with servers in the second half of 2024.

According to Nvidia, the new edition of the GH200 is the world’s first processor to include HBM3e memory. That’s a type of high-speed memory designed to store the data that a chip is actively processing. It’s 50% faster than HBM3, the technology used in the original version of the GH200.

HBM3e is manufactured by SK hynix Inc. using a 10-nanometer process. Nvidia’s adoption of the technology is not entirely surprising. In June, reports suggested that the chip giant had approached SK hynix for HBM3e samples.

Equipping a chip with faster memory such as HBM3e  enables it to run large language models with higher performance. The reason has to do with the way language models, and neural networks in general, are architected.

An artificial intelligence is made of numerous software building blocks called layers. Each such building block performs a small portion of the task that users assign to the AI.

The first layer of an AI takes raw data as input, analyzes it and generates a set of intermediate results. Those results are then passed to a second layer, which carries out further processing. The results of that further processing are then sent to a third layer for another round of calculations and the same process is repeated many more times from there.

Whenever one of an AI’s layers generates intermediate processing results, the chip on which the AI runs has to save those results to memory. The same data must then be pulled from memory into the next AI layer for further analysis. That means data is constantly moving between a chip’s logic circuits and its RAM.

The HBM3e memory in Nvidia’s upgraded GH200 chip allows data to move faster to and from logic circuits than before, which speeds up processing. According to Nvidia, that performance boost will allow companies to run more advanced AI models. The chipmaker says that a server with two of the new GH200 chips can run AI models 3.5 times larger than a similarly configured system based on the original GH200.

The new GH200 runs AI models on two onboard compute modules. The first is a central processing unit with 72 cores based on Arm Ltd.’s Neoverse chip design. The GH200’s other compute module, in turn, is a graphics processing unit that offers four petabytes of AI performance.

Onstage at Siggraph today, Huang described the GH200’s design as memory- and cache- coherent. That means the onboard GPU and CPU can carry out calculations on the same data instead of using separate data copies as is usually required. According to Nvidia, that arrangement increases processing efficiency. 

“You can take just about any large language model, put it into it and it will inference like crazy,” Huang said. “The inference cost of large language models will drop significantly.”

The GH200 is compatible with Nvidia’s MGX reference architecture, a kind of blueprint for designing servers. As a result, it should be relatively simple for hardware makers to incorporate the chip into their MXP-based servers.

Huang said that the upgraded, HBM3e-equipped version of the GH200 also forms the basis of a data center system called the Grace Hopper Compute Tray. Each such system combines a single GH200 with Nvidia’s BlueField-3 and ConnectX-7 chips. The latter chips are both designed to speed up network traffic to and from a server, but the BlueField-3 can also accelerate certain other computing tasks.

Up to 256 Grace Hopper Compute Trays may be linked together in a single cluster. According to Huang, such a cluster can provide 1 exaflop of AI performance. One exaflop corresponds to a quadrillion calculations per second. 

Image: Nvidia

Your vote of support is important to us and it helps us keep the content FREE.

One-click below supports our mission to provide free, deep and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU


Source link