There's no denying that ChatGPT and generative AI have captured the world's attention. Artificial intelligence has always been part of computing since its inception and as part of the world of science fiction. AI has developed over time, of course, and history shows the advances in AI and the opportunities available are directly tied to the availability of computational power.
Machine learning, for example, brought great advances in AI by flipping problems on their head; instead of trying to teach a computer how to recognise an image of a cat based on adjacent pixels and colours, researchers could instead feed a large set of images to a machine learning model and effectively say "this is a bunch of cat photos; you work out what's a cat." - and machine learning became possible thanks to the power, scalability, and capacity of cloud computing.
Yet machine learning, for all its goodness, has been dwarfed by the massive explosion of interest in generative AI.
|
Generative AI explosion requires greater infrastructure
It barely seems that only 12 months ago OpenAI unveiled ChatGPT to the world and demonstrated the impressive, expressive power that generative AI offers. AMD chair, president, and CEO Dr Lisa Su said, "AI is the most transformational technology in 50 years. Maybe the only thing close is the introduction of the Internet, but with AI the adoption has been much, much quicker and we're only at the beginning of the AI era."
Further, "ChatGPT has sparked a revolution that transformed the technology landscape. AI hasn't simply progressed but exploded. The year has shown us AI isn't simply a cool new thing, but the future of IT."
Generative AI is being used everywhere, she said; healthcare, climate research, AI assistants, robotics, security, and lots of tools for content creation.
Yet, this takes power. Massive power. Generative AI has become the most demanding data centre workload, requiring infrastructure and especially GPUs for training, creating models with billions of parameters, and then the same power again in run-time for answering user questions against those models. The amount of infrastructure that can be given to generative AI directly impacts how extensive the model can be, and how rapidly and deeply it can answer questions.
Dr Su explained that AMD sought last year to estimate what the infrastructure growth would be, due to generative AI. At the time AMD figured 50% CAGR (compound annual growth rate) which meant a spend of 30 billion US dollars in 2023 would become over 150 billion US dollars in 2027. "That felt like a big number," Su said, "but as we look at everything that happened in the last 12 months and the rate of change in industry across the world, it's clear demand is growing faster."
Thus, AMD revised its figures and forecasts 70% CAGR in data centre AI accelerators, from an actual spend of 45 billion US dollars in 2023 to well over 400 billion US dollars in 2027 to be spent on this area.
AMD's AI strategy
Expressing the company's commitment to advancing end-to-end AI infrastructure across cloud, HPC, enterprise, embedded, and PC, Dr Su said "our AI strategy is centred around three big priorities."
Specifically,
- broad portfolio of training and inference compute engines
- open and proven software capabilities
- AI ecosystem with deep co-innovation
Introducing the AMD MI300X
"The availability and capability of GPU compute is the single biggest driver of AI adoption," Dr Su said, before unveiling the AMD Instinct MI300X. "It is the highest performance accelerator in the world for generative AI," she said.
The MI300X is no small chip; it's a hefty silicon sandwich that combines a raft of components with names like GPUs and APUs, building upon AMD's previous MI devices. The MI100 was launched in 2020 as the first purpose-built GPU architecture to accelerate FP64 and FP32 HPC workloads. The second-gen, the MI200, introduced a denser compute architecture with leading memory capacity and bandwidth. And now the MI300, launched today, brings focused improvements on unified memory, AI data format performance, and in-node networking. It is optimised for performance and power efficiency and allows generative AI models to be trained for longer, or more simultaneous models, than ever before.
Spec-wise, the AMD MI300X is a beast. It sports 192GB HBM3 memory, with a peak theoretical memory bandwidth of 5.3 TB/s, and up to 896 GB/s AMD Infinity Fabric bandwidth. It stacks 8x XCD compute units on 4x IODs (I/O dies), with 8x HBM3 stacks, 3.5D packaging, and 256MB AMD Infinity Cache technology.
AMD says the nearest competitor is the Nvidia H100, and while the bandwidth and network performance are roughly equivalent between the two GPU beasts, the MI300X brings 2.4x more memory, and 1.3x more compute. In short, with less rack space, lower Capex, and lower Opex, customers can run more models or larger models on the same server. That's 2x the number of models for training and inference than Nvidia, or 2x the number of LLMs.
AMD has partners taking up the AMD MI300X already, with Microsoft Azure and Oracle Cloud Infrastructure on board from day one. Additionally, OpenAI's Philippe Tillet said, "OpenAI is working with AMD in support of an open ecosystem. We plan to support AMD's GPUs including MI300 in the standard Triton distribution starting with the upcoming 3.0 release."
Software and co-innovation
AMD also announced ROCm 6, its software stack to bring advanced LLM optimisations and performant AI, that can bring about 8x the performance of previous options when combined with the MI300X - so much so that AMD President Victor Peng said, "this is an infelction point for developers."
"Innovators are advancing the state of AI on AMD GPUs now," but with the MI300X and ROCm 6, "we're empowering innovators to realise the transformational power of generative AI faster."
AMD also announced partnerships with Hugging Face, PyTorch, ONNX, JAX, and others - in addition to those with Azure, Oracle, and Open AI above.
Further, AMD has partnered with Meta, with Meta AI senior director engineering Ajit Mathews saying Meta's testing with the MI300X and ROCm 6 has shown vast optimisations and promising performance numbers.
Additionally, Dell president core business operations global infrastructure solutions group Arthur Lewis announced the Dell PowerEdge 9680 server using the AMD MI300X accelerator, bringing a smaller footprint, low-latency server, and out-of-the-box LLM experience. "As of today we're open for business, ready to quote, and taking orders," Lewis said.
Additional partnerships include SuperMicro and Lenovo.
Check out the Advancing AI presentation here: