What’s New: Today, MLCommons published results of its MLPerf Inference v3.1 performance benchmark for GPT-J, the 6 billion parameter large language model, as well as computer vision and natural language processing models. Intel submitted results for Habana® Gaudi®2 accelerators, 4th Gen Intel® Xeon® Scalable processors, and Intel® Xeon® CPU Max Series. The results show Intel’s competitive performance for AI inference and reinforce the company’s commitment to making artificial intelligence more accessible at scale across the continuum of AI workloads – from client and edge to the network and cloud.

“As demonstrated through the recent MLCommons results, we have a strong, competitive AI product portfolio, designed to meet our customers’ needs for high-performance, high-efficiency deep learning inference and training, for the complete spectrum of AI models – from the smallest to the largest – with leading price/performance.”

–Sandra Rivera, Intel executive vice president and general manager of the Data Center and AI Group

Why It Matters: Building on the MLCommons AI training update from June and the Hugging Face performance benchmarks that validate that Gaudi2 can outperform Nvidia’s H100 on a state-of-the-art vision language model, today’s results further reinforce that Intel offers the only viable alternative to Nvidia’s H100 and A100 for AI compute needs.

Every customer has unique considerations, and Intel is bringing AI everywhere with products that can address inference and training across the continuum of AI workloads. Intel’s AI products give customers flexibility and choice when choosing an optimal AI solution based on their own respective performance, efficiency and cost targets, while helping them break from closed ecosystems.

About the Habana Gaudi2 Results: The Habana Gaudi2 inference performance results for GPT-J provide strong validation of its competitive performance.

Gaudi2 inference performance on GPT-J-99 and GPT-J-99.9 for server queries and offline samples are 78.58 per second and 84.08 per second, respectively.
Gaudi2 delivers compelling performance vs. Nvidia’s H100, with H100 showing a slight advantage of 1.09x (server) and 1.28x (offline) performance relative to Gaudi2.
Gaudi2 outperforms Nvidia’s A100 by 2.4x (server) and 2x (offline).
The Gaudi2 submission employed FP8 and reached 99.9% accuracy on this new data type.

With Gaudi2 software updates released every six to eight weeks, Intel expects to continue delivering performance advancements and expanded model coverage in MLPerf benchmarks.

About the Intel Xeon Results: Intel submitted all seven inference benchmarks, including GPT-J, on 4th Gen Intel Xeon Scalable processors. These results show great performance for general-purpose AI workloads, including vision, language processing, speech and audio translation models, as well as the much larger DLRM v2 recommendation and ChatGPT-J models. Additionally, Intel remains the only vendor to submit public CPU results with industry-standard deep learning ecosystem software.

The 4th Gen Intel Xeon Scalable processor is ideal for building and deploying general-purpose AI workloads with the most popular AI frameworks and libraries. For the GPT-J 100-word summarization task of a news article of approximately 1,000 to 1,500 words, 4th Gen Intel Xeon processors summarized two paragraphs per second in offline mode and one paragraph per second in real-time server mode.
For the first time, Intel submitted MLPerf results for Intel Xeon CPU Max Series, which provides up to 64 gigabytes (GB) of high-bandwidth memory. For GPT-J, it was the only CPU able to achieve 99.9% accuracy, which is critical for applications for which the highest accuracy is of paramount performance.
Intel collaborated with its original equipment manufacturer (OEM) customers to deliver their own submissions, further showcasing AI performance scalability and wide availability of general-purpose servers powered by Intel Xeon processors that can meet customer service level agreements (SLAs).

What’s Next: MLPerf, generally regarded as the most reputable benchmark for AI performance, enables fair and repeatable performance comparisons. Intel anticipates submitting new AI training performance results for the next MLPerf benchmark. The ongoing performance updates show Intel’s commitment to support customers and address every node of the AI continuum: from low-cost AI processors to the highest-performing AI hardware accelerators and GPUs for the network, cloud and enterprise customers.

More Context: Performance Metrics Based on MLPerf v.31 Inference (Benchmark Results) | MLCommons Announcement