Intel Shines in MLCommons MLPerf Inference v3.1 Benchmark for AI Performance

In a significant stride towards making artificial intelligence accessible at scale, Intel has demonstrated its prowess in the AI space through the latest MLPerf Inference v3.1 benchmark results.
4th Gen Intel Xeon Scalable processors
These results underscore Intel’s competitive performance in AI inference and highlight the company’s dedication to catering to the full spectrum of AI workloads, from client and edge computing to network and cloud-based applications.

Impressive AI Product Portfolio

Sandra Rivera, Intel’s Executive Vice President and General Manager of the Data Center and AI Group, emphasized the significance of these results, stating, “As demonstrated through the recent MLCommons results, we have a strong, competitive AI product portfolio, designed to meet our customers’ needs for high-performance, high-efficiency deep learning inference and training, for the complete spectrum of AI models – from the smallest to the largest – with leading price/performance.”

Intel vs. Nvidia: The Battle Continues

Intel’s AI products have been shown to offer a compelling alternative to Nvidia’s H100 and A100 for AI computing needs. While every customer’s requirements are unique, Intel is committed to offering flexible and cost-effective AI solutions, breaking free from closed ecosystems and providing customers with options that align with their performance, efficiency, and cost targets.

Habana Gaudi2 Impresses

Intel’s submission for the MLPerf Inference benchmark featured the Habana Gaudi2 accelerators, which delivered impressive results for GPT-J, a 6 billion parameter language model. In particular:

Gaudi2 achieved a remarkable 78.58 queries per second for server queries and 84.08 queries per second for offline samples in GPT-J-99 benchmarks.

When compared to Gaudi2 demonstrated competitive performance, Nvidia’s H100 had a slight advantage of 1.09x in server performance and 1.28x in offline performance.

Gaudi2 outperformed Nvidia’s A100 by a substantial margin, delivering 2.4x better server performance and 2x better offline performance.

The Gaudi2 submission used FP8 and achieved an impressive 99.9% accuracy on this new data type.

With regular Gaudi2 software updates scheduled every six to eight weeks, Intel anticipates continued advancements in performance and expanded model coverage in MLPerf benchmarks.

Intel Xeon Processors Excel

Intel also showcased its prowess with the 4th Gen Intel Xeon Scalable processors , submitting results for all seven inference benchmarks, including GPT-J. These processors demonstrated excellent performance across various AI workloads, including vision, language processing, speech and audio translation models, and larger models like DLRM v2 recommendation and ChatGPT-J. Intel remains the only vendor to submit public CPU results with industry-standard deep learning ecosystem software.

The 4th Gen Intel Xeon Scalable processors are ideally suited for building and deploying general-purpose AI workloads with popular AI frameworks and libraries. In the GPT-J 100-word summarization task, these processors achieved two paragraphs per second in offline mode and one paragraph per second in real-time server mode.

Intel also introduced the Intel Xeon CPU Max Series, featuring up to 64 gigabytes of high-bandwidth memory. For GPT-J, this CPU series was the only one capable of achieving a critical 99.9 percent accuracy level, which is essential for applications requiring the highest level of precision.