Nvidia recently introduced a cutting-edge AI hardware platform powered by its next-generation chip, designed to efficiently train complex generative AI workloads and complement high-performance servers.
The Nvidia GH200 Grace Hopper leverages the company’s new HBM3e processor, which boasts 3x faster memory bandwidth compared to its predecessor.
The GH200 chip is highly configurable to meet the specific requirements of customers. When used in a dual configuration, with two chips paired together, it can deliver eight petaflops of performance, utilizing 144 Arm Neoverse cores and 282GB of HBM3e memory.
HBM3e memory is 50% faster than the current-generation HBM3 memory, offering 10TB/sec bandwidth and enabling the processing of models 3.5x larger than before.
This breakthrough in on-chip performance benefits organizations seeking to train and refine massive language models (LLMs), which currently consist of hundreds of billions of parameters but are projected to exceed a trillion parameters in the future. It allows for training and inference of AI models using a more compact network, eliminating the need for costly, large-scale supercomputers for AI training.
Jensen Huang, Nvidia’s founder and CEO, highlighted the significant drop in inference costs for large language models and emphasized the scalability of the GH200, stating that it can easily be deployed across global data centers.
Huang also stressed the importance of delivering more powerful solutions for data centers to keep up with the rapid development of AI technologies.
“To meet the growing demand for generative AI, data centers require accelerated computing platforms with specialized capabilities,” he said.
The GH200 Grace Hopper Superchip, the first generation of GH200 chips, began full production in May, catering to hyperscalers and supercomputing with a bandwidth of 900 GB/sec. Systems based on the GH200 Grace Hopper platform are expected to be available in Q2 2024.
While Nvidia’s announcement signifies a product refresh, offering faster AI training and inference capabilities, James Sanders, a principal analyst at CCS Insight, emphasized the importance of such advancements in the early stages of AI development.
Nvidia emphasized the compatibility of the new platform with its MGX modular server designs, facilitating widespread deployment across various server configurations.
The GH200 chips are NVLink compatible, enabling the creation of a supercomputer configuration called Nvidia DGX GH200, consisting of 256 GH200 Grace Hopper Superchips and providing 144TB of shared memory. This configuration is designed to tackle future generative AI models, including trillion-parameter LLMs and large-scale deep learning algorithms.
Nvidia has established itself as a leading provider of AI hardware and infrastructure, leveraging its expertise in designing graphics processing units (GPUs). The company offers a range of purpose-built chips for high-performance workloads, such as data analytics and training machine learning and generative AI models. Nvidia AI architectures, including Grace, Hopper, Ada Lovelace, and BlueField, encompass CPUs, GPUs, and DPUs.
In addition, Nvidia’s Spectrum-X Ethernet platform aims to enhance the efficiency of Ethernet networks in data centers, enabling faster data processing and AI workload execution in the cloud.
Hopper chips, including the H100, have been instrumental in training OpenAI’s GPT-4 supercomputer and have been adopted by cloud computing giants like Amazon Web Services (AWS), Google Cloud, and Microsoft Azure.
Several major tech companies, including Dell and Snowflake, have partnered with Nvidia to offer their customers customized generative AI solutions. These partnerships leverage Nvidia’s NeMo end-to-end AI framework to deploy industry-specific models, leveraging proprietary data stored in platforms like Snowflake’s Data Cloud.