AMD provided an in-depth look at its latest arsenal of AI accelerators for data centers and supercomputers, as well as consumer client devices, but software support, optimization and developer adoption will be essential.
Advanced Micro Devices held its Advancing AI event in San Jose this week, and in addition to launching new AI accelerators for data centers, supercomputers and client laptops, the company also presented its enablement strategy software and ecosystem with a focus on open source accessibility. . Market demand for AI computing resources currently exceeds the supply of incumbents like Nvidia, which is why AMD is working to offer compelling alternatives. Highlighting this emphatically, AMD CEO Dr. Lisa Su noted that the company is increasing its TAM forecast for AI accelerators from the $150 billion figure it forecast a year ago to the same time, to $400 billion by 2027 with a compound annual growth rate of 70%. . Artificial intelligence obviously represents a huge opportunity for major microchip players, but no one can really guess the true potential market demand. AI will be so transformational that it will impact virtually every industry in one way or another. Regardless, the market will likely be welcoming and eager to experience these new silicon AI engines and tools from AMD.
Instinct MI300X and MI300A: tip of the AMD AI spear
AMD’s data center group officially launched two major product family offerings this week, known as MI300X and MI300A, for the AI and enterprise supercomputing and cloud markets, respectively. Both products are purpose-built for their respective applications, but are based on similar chipset-compatible architectures, with advanced 3D packaging techniques and a blend of optimized 5 and 6nm semiconductor chip manufacturing processes. AMD’s high-performance computing AI accelerator is the Instinct MI300A which features both the company’s CDNA 3 data center GPU architecture, as well as Zen 4 CPU core chipsets (24 cores EPYC Genoa) and 128 GB of shared, unified HBM3 memory that both GPU accelerators and CPU cores have access to, as well as 256 MB of Infinity cache. The chip is made up of a massive 146B transistor and offers up to 5.3 TB/s of maximum memory bandwidth, with its CPU, GPU and IO interconnect enabled via AMD’s high-speed Serial Infinity Fabric.
This AMD accelerator can also function as both a PCIe attached add-on device and a root complex host processor. Overall, the company is making a bold claim for the MI300A in the HPC space, with up to a 4x performance increase over Nvidia’s H100 accelerator in applications like OpenFOAM for computational fluid dynamics, and an increase in performance per watt of up to 2 times that of Nvidia’s GH200 Grace. Hopper superchip. The AMD MI300A will also power HPE’s El Capitan at Lawrence Livermore National Laboratory, where it will replace Frontier (also powered by AMD) as the world’s first two-exaflop supercomputer, which would apparently make it the fastest and fastest supercomputer powerful in the world.
However, the MI300X is a different kind of beast, aimed squarely at cloud data centers and enterprise AI workloads such as large language models, natural language recognition, and generative AI. The MI300X does not have integrated Zen 4 CPU chipsets (what AMD calls CCD), although it does host more AMD CDNA 3 Accelerator Complex Die chipsets in an all-GPU design. There are up to a total of 6 XCDs on board the MI300X, totaling 228 GPU compute units. The MI300X also has a larger memory capacity with 192 GB of HBM3. Like the MI300A, the MI300X also offers around 5.3 TB/s of overall memory bandwidth and a whopping 17 TB/s maximum bandwidth thanks to its 256 MB of AMD Infinity cache.
Once again, AMD’s performance claims are bold, with Su proclaiming a 1.4x performance increase (latency reduction) in Llama2 (Meta’s wizard-like natural language model) at a 1.6 times increase in BLOOM transformer based LLM, alternative to GPT. -3 compared to competing offers from Nvidia. Inferring such workloads, AMD claims performance leadership over Nvidia, although the MI300X is expected to offer roughly performance parity with the H100 in AI training workloads. Of course, Nvidia just released an update to its Llama2-optimized software, so it’s likely that AMD didn’t factor that into its benchmark results above. Furthermore, Nvidia’s H200 The GPU Hopper is waiting in the wings and should bring even more gains for Nvidia’s inference performance.
AMD Ryzen 8040 series will bring AI improvement to laptops
From a hardware perspective, the rest of AMD’s Advancing AI offerings were Ryzen AI and a new line of Ryzen 8040 series mobile processors for laptops. Codenamed Hawk Point, these APUs are similar to AMD’s current generation Ryzen 7040 series, with up to eight Zen 4 CPU cores and up to twelve RDNA 3 compute units for graphics, which also have increased clock speeds. However, Hawk Point’s neural processing unit has been optimized at both the hardware and firmware level, and AMD claims its new XDNA NPU delivers up to 16 trillion operations per second of throughput for workloads of AI, which represents a performance increase of 60% compared to its previous generation. 7040 series.
AMD claims this will increase the performance of real-world AI applications in this new class of laptops by up to 40%, with AI models like Llama 2 and other applications involving machine vision. Since the XDNA Ryzen 8040 NPU is essentially a slice of the Xilinx FPGA, optimizations have likely been made to this circuit block, reconfiguring it for better performance and efficiency. AMD notes that Ryzen 8040 series AI-enabled PCs will be available in the first quarter of 2024 and that it is now sampling OEM partners.
Software activation is key: enter ROCm 6 and Ryzen AI software
All this powerful new silicon will require a lot of software implementation effort on AMD’s part, and in that regard, the company has announced two new installments in its developer-facing software suite, ROCm6, which will work in concert with its development Xilinx Vitis AI and deployment tools, as well as Ryzen AI software for client machines. AMD notes that a second installment of ROCm 6 for training workloads is also coming. ROCm is AMD’s open source software development platform and supports many leading AI frameworks such as ONYX, TensorFlow and PyTorch. AMD also notes that data center AI developers using Nvidia’s CUDA language can also easily port and optimize their existing models and applications with ROCm. AMD CEO Dr Su also made a show of force on stage with her, with representatives from Lamini, Databricks and Essential AI extolling the virtues of working with ROCm, with Lamini CEO Sharon Shou, specifically highlighting that Lamini has achieved functionality and performance. parity with CUDA.
On client machines, Ryzen AI will take pre-trained models, quantify them, and optimize them to run on AMD silicon for easy deployment. In conversations with AMD, I’ve been told the goal is to have a simple, one-click interface for developers, with support for ONYX, Tensorflow, and Pytorch live right now in the first installment of Ryzen AI Software. The folks at Redmond are also preparing Windows support, but AMD will ultimately be at Microsoft’s mercy in that regard.
To conclude this quick summary of Advancing AI Day, I would say that AMD’s success will depend heavily on its software enablement efforts, which will need to be an ongoing, long-term investment in ease of use, optimization of performance and efficiency, and ultimately, adoption by developers. It appears the company has the hardware power to take on its main rivals Nvidia and Intel. With AMD President Victor Peng spearheading its AI strategy, and with the long line of software he fostered at Xilinx before the company’s acquisition, it appears AMD also has the leadership and resources needed to execute this side of the equation. . This is going to be a dogfight with Nvidia, no doubt about it. With intensive model optimization and tuning underway now, the AI performance landscape can and will change in no time. And let’s face it, AI is still in its infancy.