Grace – named after computing pioneer Grace Hopper – is Nvidia's first data centre CPU, designed for extremely large-scale AI and HPC workloads.
Model sizes are doubling every 2.5 months, said Nvidia senior director of accelerated computing Paresh Kharya, pointing to the way the GPT-3 model uses 175 billion parameters. This suggests 100 trillion parameter models will exist by 2023.
System bandwidth is already a bottleneck, he said, because large models do not fit into GPU memory, and communication between the GPU and CPU memory is limited to 64GBps.
So Grace will support giant scale AI and HPC applications by incorporating next-generation NVLink which will provide bidirectional bandwidth of 900GBps.
It will also feature next-generation Arm Neoverse cores, and a new memory subsystem running at 500GBps including error correction and providing a tenfold improvement in energy efficiency.
Together, these changes will mean models that currently take a month to train will instead take three days, and realtime inferencing will be possible with half-trillion parameter models.
"Leading-edge AI and data science are pushing today's computer architecture beyond its limits –processing unthinkable amounts of data," said Nvidia founder and CEO Jensen Huang.
"Using licensed Arm IP, Nvidia has designed Grace as a CPU specifically for giant-scale AI and HPC. Coupled with the GPU and DPU, Grace gives us the third foundational technology for computing, and the ability to re-architect the data centre to advance AI. Nvidia is now a three-chip company."
The Swiss National Supercomputing Centre (CSCS) and the U.S. Department of Energy's Los Alamos National Laboratory intend to build Grace-based supercomputers.
"Nvidia's novel Grace CPU allows us to converge AI technologies and classic supercomputing for solving some of the hardest problems in computational science," said CSCS Director Professor Thomas Schulthess.
Los Alamos National Laboratory director Thom Mason said "With an innovative balance of memory bandwidth and capacity, this next-generation system will shape our institution's computing strategy.
"Thanks to Nvidia's new Grace CPU, we'll be able to deliver advanced scientific research using high-fidelity 3D simulations and analytics with datasets that are larger than previously possible."
Grace should be available at the beginning of 2023.
BlueField-3, Nvidia's next-generation data processing unit (DPU), is the followup to BlueField-2.
DPUs provide hardware acceleration for software-defined networking, storage and security.
The BlueField-3 provides performance equivalent to 300 conventional CPU cores for these workloads. This frees CPU resources to do the real work of running business applications.
BlueField-3 is said to be industry's first 400GbE/NDR DPU, providing ten times the accelerated compute power of its predecessor thanks to 16 Arm A78 cores, as well as four times the acceleration for cryptography.
It also purports to be the first DPU supporting fifth-generation PCIe and offering time-synchronised data centre acceleration.
Samples are expected in the first quarter of 2022, and Dell Technologies, Inspur, Lenovo and Supermicro are integrating BlueField DPUs into their systems.
BlueField is also being supported by Canonical, Red Hat, VMware, Fortinet, Guardicore, DDN, NetApp, WekaIO, Cloudflare, F5 and Juniper Networks.
"Modern hyperscale clouds are driving a fundamental new architecture for data centers," said Huang.
"A new type of processor, designed to process data centre infrastructure software, is needed to offload and accelerate the tremendous compute load of virtualisation, networking, storage, security and other cloud-native AI services. The time for BlueField DPU has come."
The Nvidia Morpheus cybersecurity application framework applies machine learning to BlueField-3 telemetry to identify, capture and take action on threats and anomalies including leaks of unencrypted sensitive data, phishing attacks and malware.
Hardware, software, cybersecurity and cloud providers planning to take advantage of Morpheus include Aria Cybersecurity Solutions, Cloudflare, F5, Fortinet, Guardicore, Splunk, Canonical, Red Hat and VMware.
"Splunk is excited to collaborate with Nvidia to uncover more ways to utilise GPU-accelerated deep learning to enhance how we help our joint customers turn their data into doing," said Splunk vice president and CTO TimTully.
"We look forward to using the Morpheus framework to potentially provide a path for our team to quickly prototype and integrate new capabilities in our platform, as well as offload compute intensive tasks to GPU architectures to aid our customers."
Huang said "Zero-trust security models demand we monitor every transaction in the data centre in realtime. This poses a significant technical challenge: needing to sense intrusion within the server, detecting threats immediately, and operating at the data rate of modern data centres.
"Nvidia Morpheus combines Mellanox in-server networking and Nvidia AI to do real-time, all-packet inspection to anticipate threats and eliminate them as they arise."
On the subject of BlueField, Nvidia announced the next-generation DGX SuperPod AI supercomputer featuring BlueField-2 DPUs to isolate each tenant's data.
AI is the most powerful technology the world has ever known, and Nvidia DGX systems are the most effective tool for harnessing it," said Nvidia vice president and general manager of DGX systems Charlie Boyle.
"The new DGX SuperPod, which combines multiple DGX systems, provides a turnkey AI data centre that can be securely shared across entire teams of researchers and developers.
Nvidia Omniverse Enterprise, which has been in open beta for the last three months, is now generally available.
It enables global 3D design teams working at multiple locations and across multiple software suites to collaborate in real time in a shared virtual space, according to Nvidia
"Every few decades, technologies converge to enable a whole new thing – Omniverse is such an invention," said Huang
"Building on Nvidia's entire body of work, Omniverse lets us create and simulate shared virtual 3D worlds that obey the laws of physics. The immediate applications of Omniverse are incredible, from connecting design teams for remote collaboration to simulating digital twins of factories and robots. The science-fiction metaverse is near.
Omniverse Enterprise combines the Omniverse Nucleus server (which manages the shared database) shared among clients). Omniverse Connectors to plug into leading design applications, OmniverseCreate (which allows users interactively assemble, light, simulate, and render scenes, all in real time), OmniverseView (for seamless collaborative design and visualisation of architectural and engineering projects with photorealistic rendering, and RTX Virtual Workstation so collaborators can run graphics-intensive 3D applications from anywhere.
Omniverse Enterprise is optimised to run on Nvidia RTX laptops and desktops, and Nvidia certified systems based on the EGX platform.
Early adopters include The BMW Group, which has built an end-to-end digital twin of an entire factory.
Marketing services organisation WPP is using Omniverse platform to collaboratively design, build and simulate a photorealistic locations instead of shooting them in the real world.
And Ericsson is using the Omniverse platform to simulate and visualise future 5G networks.
"The Nvidia Omniverse platform lets our teams virtually explore any city's unique geography —whether it is San Francisco's hills or Frankfurt's high-rises —and its impact on radio network performance," said Ericsson head of development unit networks Joakim Sorelius.
"By combining our extensive simulation expertise with the stunning visualisations of Omniverse, we bring radio network analysis to a new level, creating insights that ensure our customers get the best possible 5G experience. We see Omniverse as the future of collaboration and planning."
The Omniverse ecosystem includes applications from software companies such as Bentley Systems, Adobe, Autodesk, Epic Games, ESRI, Graphisoft, Trimble, McNeel & Associates, Blender, Marvelous Designer, Reallusion and wrnch.
Omniverse Enterprise software is sold on subscription, and is supported by vendors including Asus, Boxx Technologies, Cisco, Dell Technologies, HP, Lenovo and Supermicro.
The Jarvis framework for building conversational AI services is also available, following an early access program.
It can convert speech in five languages to text, determine an appropriate response, and turn that back into speech in under 100 milliseconds.
Applications include transcription, translation, virtual assistants, and agent assistance.
According to Nvidia, "Developers can select a Jarvis pre-trained model from Nvidia's NGC catalog, fine-tune it using their own data with the Nvidia Transfer Learning Toolkit, optimise it for maximum throughput and minimum latency in real-time speech services, and then easily deploy the model with just a few lines of code so there is no need for deep AI expertise."
The cuQuantum SDK accelerates quantum circuit simulations. For example, a simulation that took ten days to run on a conventional dual-CPU system was completed in two hours on a DGX-A100 system using cuQuantum.
"This sets the benchmark for quantum circuit simulation performance and will help advance the field of quantum computing by improving our ability to verify the behaviour of quantum circuits,"said Garnet Chan, a chemistry professor at Caltech whose lab hosted the work.
Nvidia and AWS announced a collaboration to offer new EC2 instances combining AWS Graviton2 processors and Nvidia GPUs in the second half of 2021.
A primary application will be to allow Android games to be streamed from the cloud with accurate rendering and encoding, but without the need for emulation software.
Drive Hyperion is a reference platform for using the company's hardware and software to develop autonomous vehicles.
When it ships later this year, the eigth-generation Hyperion will include two Nvidia Drive Orin SoCs (said to provide enough power for level 4 self-driving), 12 exterior cameras, three interior cameras, nine radars and one lidar sensor, allowing customers to concentrate on building software using the Drive AV and Drive IX software stack.
Drive Sim is being updated to version 2.0. Built on the Omniverse platform, it allows multi-GPU modelling of scene and vehicle dynamics, rendering sensor data in realtime with raytracing, for scalable and repeatable simulations.