How can Nvidia, which is saturated with “AI nuclear bombs”, win the new AI computing arena?

In the 2012 ImageNet Challenge (ILSVRC), the deep convolutional neural network AlexNet was born and achieved a qualitative leap in the field of image classification and recognition. beginning.

Before that, a major challenge of deep learning “how to get out of the circle” was the problem of insufficient computing power for deep neural network training. The key to enabling AlexNet to achieve a breakthrough in computing power is that the researchers used NVIDIA’s GPU at that time.

GPU became famous in the first battle and became the infrastructure that evolved with AI technology. Nvidia is also seizing new growth opportunities in AI computing. With the explosive growth of AI computing power requirements, the NVIDIA GPU product line has also undergone multiple rounds of upgrades.

Now, Nvidia’s GPU family has another “biggest ever” performance upgrade. It has been three years since the last release of the Tesla V100, the “strongest AI chip on the surface”.

Three years of dormancy, a blockbuster.

How can Nvidia, which is saturated with “AI nuclear bombs”, win the new AI computing arena?


NVIDIA debuts the 8th generation Ampere GPU architecture and the first NVIDIA A100 GPU based on Ampere architecture, using a 7nm process, placing more than 54 billion transistors on a wafer almost the same area as the previous generation Volta architecture V100 GPU, the number of transistors It has increased by 2.5 times, but the size is only 1.3% larger. In terms of AI training and inference computing power, it is 20 times higher than the previous generation Volta architecture, and the HPC performance is improved to 2.5 times that of the previous generation.

The A100 GPU is unique in that, as an end-to-end machine learning accelerator, it unifies AI training and inference on one platform for the first time, and will also serve as an accelerator for general-purpose workloads such as data analysis, scientific computing, and cloud graphics design . Simply put, the A100 GPU is made for data centers.

On the basis of the A100 GPU, NVIDIA also released the world’s strongest AI and HPC server platform – HGX A100, the world’s most advanced AI system – DGX A100 system, and a DGX SuperPOD cluster consisting of 140 DGX A100 systems. In addition, there are also the release of platform-based products involving smart network cards, edge AI servers, autonomous driving platform cooperation and a series of software-level products.

It can be said that Nvidia did not release a “nuclear bomb” this time, but a “nuclear bomb cluster”, or the kind of saturation attack. From cloud to edge to end-to-end, from hardware to software to open source ecosystem, NVIDIA has almost established an indestructible barrier to AI computing, and it has also brought the AI ​​chip competition to a level that is unattainable for small players.

What new changes are taking place in Nvidia’s AI server chip business? What impact will the release of the A100 GPU have on the AI ​​server chip market, and what changes will it bring to the cloud computing market? This has become a few issues that we should focus on discussing while “watching the excitement”.

AI server chips: NVIDIA’s new peak in AI computing growth

As we all know, emerging businesses such as games, data centers, professional visualization, and autonomous driving are NVIDIA’s four core business sectors. Among them, although the game business is still the pillar of revenue, due to the impact of the PC game market becoming saturated and shifting to the mobile terminal, the proportion of the independent Display business is gradually shrinking; the professional visualization business has been contributing to NVIDIA’s stable revenue. , but affected by the growth of other businesses, the business proportion is also continuing to decline; emerging business sectors such as autonomous driving currently only account for a small part of the overall receivables, and the growth rate is limited, but it can be regarded as NVIDIA’s future long-term market. .

(Nvidia: Sequential Revenue Change)

The most obvious is Intel’s growth in the data center business segment. In recent years, its revenue has been in a state of rapid growth for most of the time, and the proportion of revenue has gradually approached the game business.

According to NVIDIA’s latest Q4 financial report data for fiscal year 2020, “game” revenue was as high as $1.49 billion, accounting for about 47% of total revenue; while in the data center sector with strong growth, revenue from AI server chips reached $968 million, a year-on-year increase of $968 million. An increase of 42.6%, approaching the $1 billion mark, far exceeding market expectations of $829 million.

On the whole, with the accelerated expansion of the demand for AI chips in global data centers, especially ultra-large data centers, NVIDIA’s AI server chips have also ushered in rapid growth, and are leaping to become NVIDIA’s business branch with the most market expansion potential.

From the perspective of business growth, NVIDIA launched the A100 GPU server chip and AI system cluster, and what it wants to defend is the dominance of the AI ​​server market in the current data center.

So, how is Nvidia building this AI server chip product system?

Generally speaking, for a deep neural network algorithm model, the training of the model framework needs to involve very large data calculations, but the computing method requirements are relatively simple, so a large number of high-parallel, high-efficiency and high-data transmission operations need to be performed in the cloud. Therefore, compared with CPUs that are good at complex logic operations but have fewer cores, GPUs with multiple computing units are more suitable for training deep neural networks.

This is the fundamental reason why NVIDIA’s GPU has won the market opportunity in the global cloud AI server chip market, especially the training side. At the same time, the complete TESLA GPU product line developed by NVIDIA for a series of AI services and the successful deployment of the “CUDA” development platform for GPUs are the main reasons for NVIDIA’s dominance in the AI ​​server chip market.

From the launch of the first Pascal GPU optimized for deep learning in 2016, to the launch of the new GPU architecture Volta in 2017, which is 5 times faster than Pascal, and now the Ampere (Ampere) architecture with 20 times higher performance than Volta , NVIDIA’s GPU products in the data center have been successfully achieving high-speed and stable performance improvement.

In addition, NVIDIA has launched the neural network inference accelerator TensorRT, which can provide low-latency, high-throughput deployment inference acceleration for deep learning applications. It is compatible with almost all mainstream deep learning frameworks, enabling it to meet the needs of large data centers from AI training to deployment. A complete AI build for inference.

In March last year, Nvidia announced that it had acquired Mellanox, an Israeli network communications chip company, for $6.8 billion. Through the integration of Mellanox’s accelerated network platform, NVIDIA can solve the overall architecture of connecting a large number of fast computing nodes through an intelligent network structure to form a huge data center-scale computing engine.

At the same time as the release of the A100 GPU, NVIDIA also launched the world’s first highly secure and efficient 25G/50G Ethernet smart NIC SmartNIC based on Mellanox technology, which will be widely used in large cloud computing data centers to greatly optimize network and storage workloads , to achieve higher security and network connection performance for AI computing.

Of course, the significance of the acquisition of Mellanox is more than that. In addition to solving the problem of high-performance network connection and computing power output, NVIDIA will also have GPU, SoC, and NPU three processors for different segments, which means that NVIDIA has basically Has the ability to independently build an AI data center.

On the whole, as cloud data centers are evolving from traditional data storage to deep learning, high-performance computing (HPC) and big data analysis, NVIDIA will also play a more important role as an AI computing service provider.

Beyond Nvidia’s sturdy walls, the AI ​​computing race intensifies

Of course, the cloud AI server chip market is far from the point where the pattern has been set, but will usher in the most intense competition in 2019.

NVIDIA’s GPU products have always restricted the cost of AI computing power in cloud computing data centers because of their high energy consumption and high prices. From Intel, another big player in the server chip market, to AMD, Qualcomm, cloud computing service providers Amazon, Google, Ali, Huawei, and many emerging AI chip startups, they are all actively investing in the research and development of cloud AI server chips, seeking alternatives GPU solution. It can be seen that the world has been suffering from “GPU” for a long time.

In 2019, compared to Nvidia’s slightly quieter, other companies have launched their own AI server chip products. For example, in the first half of last year, Intel, Amazon, Facebook, and Qualcomm have successively launched or announced their own dedicated AI server chips, trying to replace GPUs and FPGAs in AI inference operations. In the middle of the year, my country’s major cloud AI manufacturers also collectively exerted their efforts. Cambrian announced the launch of the second-generation cloud AI chip Siyun 270 in June; in August, Huawei officially released the most powerful AI processor Ascend910 and all scenarios. MindSpore, an AI computing framework; in September, Alibaba launched the Hanguang 800, the world’s strongest AI inference chip at the time, basically benchmarking against NVIDIA’s T4 series products.

Among all the competitors of AI chips, Intel, as the second place, obviously wants to challenge the dominant position of NVIDIA, and is also the representative who is most likely to challenge NVIDIA.

As a traditional giant of general-purpose server chips, Intel’s most likely strategy is to integrate GPU and AI into its CISC instruction set and CPU ecosystem, that is, to deploy CPU and GPU together, cloud service providers only need to buy one products, can better play the performance of AI computing.

At Intel at All IN AI, how did they build this AI computing strategy?

Intel’s first complement is the AI ​​hardware platform layout, and acquisition is the fastest solution. In 2015, Intel acquired Altera, the maker of FPGAs, at sky-high prices, and then acquired Nervana a year later, laying the foundation for a new generation of AI accelerator chipsets.

In December last year, Intel once again spent $2 billion to acquire Habana Labs, a three-year-old Israeli data center AI chip maker. Similar to NVIDIA’s acquisition of Mellanox, through the acquisition of Habana, Intel will also complement the two capabilities of communication and AI in the data center scenario.

Inspired by this acquisition, Intel announced that it was discontinuing the Nervana NNP-T for AI training, which was only released last August, and instead focused on advancing Habana Labs’ Gaudi and Goya processor products to benchmark Nvidia’s tesla V100 and inference. Chip T4. In addition, a GPU based on the Xe architecture will also be available in the middle of this year.

At the software level, in response to the challenges posed by heterogeneous computing, NVIDIA released the OneAPI public release in November last year. Whether it’s a CPU, GPU, FPGA or accelerator, OneAPI attempts to simplify and unify these innovations across the SVMS architecture as much as possible to unlock hardware performance.

Although Intel has devoted itself to AI computing with a “going all out” attitude, it has compiled an array of AI chip products covering GPUs, FPGAs and ASICs, and has established a widely applicable software and hardware ecosystem. However, there is still a certain distance to challenge NVIDIA’s general-purpose GPU products.

First, Intel’s strategy of applying CPU to AI computing has not been favored by major cloud computing vendors. Most vendors are still willing to choose CPU+GPU or FPGA solutions to deploy their AI training hardware solutions. While GPU is still the home of NVIDIA, V100 and T4 are still mainstream general-purpose GPUs and inference accelerators in current data centers.

Secondly, Intel’s layout of AI chips has just begun. Affected by the repeated delays of Nervana AI chips, Habana products have just begun to be integrated, which will make it difficult for Intel to challenge the market share of Nvidia’s AI server chips in the short term.

And now the release of NVIDIA’s latest Ampere-based A100 GPU and AI system cluster is a saturation attack on Intel and other competitors in the market. Although it is said that in the long run, the custom chips developed by cloud computing manufacturers and AI server chip manufacturers will erode the share of some GPUs, but now they must first overcome the hard walls and high walls of AI computing built by NVIDIA A100.

AI computing upgrade brings a new layout plan for data centers

Let’s first look at the changes in the data center itself. Affected by the explosive growth of AI-related application requirements and scenarios, small and medium-sized data centers cannot withstand such a huge amount of “AI computing pain”, and the market demand for ultra-large data centers is getting stronger.

First, public cloud giants represented by Amazon AWS, Microsoft Azure, Ali, and Google are occupying the main market share of super-large data centers. On the one hand, super-large data centers will bring more growth of servers and supporting hardware; on the other hand, the complexity of AI algorithms and the continuous growth of AI processing tasks require continuous upgrade of server configurations and structures.

In some visual recognition-based AI companies, tens of thousands of GPUs need to be deployed to build a supercomputing center. For cloud computing data centers of TOP-level cloud service providers, in order to support deep learning training tasks, the required GPUs are also of the order of magnitude. will be massive.

Second, cloud service manufacturers are launching self-developed chips to alleviate the problem of soaring GPU computing costs due to high prices and huge data volumes. Most of these manufacturers launch inference chips to save the general computing power of the GPU. However, these reasoning chips are only lacking in versatility, making it difficult for them to break through the situation of self-development and self-use.

So, what new changes will the release of NVIDIA’s A100 GPU chip bring to cloud computing data centers? Or what kind of threshold is set for the opponents of AI server chips?

First of all, as the A100 GPU with a new Ampere architecture, it supports 1.5TB per second of buffer bandwidth processing, supports TF32 operations and FP64 double-precision operations, bringing up to 20 times the AI ​​computing performance of FP32 and 2.5 times the performance improvement of HPC applications. . In addition, it also includes the new MIG architecture, NVLink 3.0 and the sparseness of the AI ​​computing structure, which make the A100 accelerator card not only used for AI training and AI inference, but also for scientific simulation, AI dialogue, genome and high-performance data analysis, A variety of general computing capabilities such as seismic modeling and financial calculations. This solution may relieve the computational pressure on inference of many cloud service manufacturers, and also bring certain competitive pressure to the inference chips of other manufacturers.

Second, the third-generation DGX A100 AI system released by NVIDIA significantly reduces the cost of data centers while increasing throughput. Thanks to the new elastic computing technology built into the A100, it can be flexibly split in a distributed manner. The multi-instance GPU capability allows each A100 GPU to be split into up to seven independent instances to infer tasks, and multiple A100s can also be split into multiple instances. Runs as a giant GPU for larger training tasks.

(“The more you buy, the more money you save!”)

Taking Huang Renxun’s example as an example, a typical AI data center has 50 DGX-1 systems for AI training and 600 CPU systems for AI inference, requiring 25 racks, consuming 630kW of power, and costing more than 11 million yuan. USD; while doing the same job, a rack of 5 DGX A100 systems, to achieve the same performance requirements, uses only 1 rack, consumes 28kW of power, and costs about $1 million.

That is to say, with one rack, the DGX A100 system can replace an entire AI data center with 1/10 the cost, 1/20 the power, and 1/25 the space.

In general, NVIDIA has brought a new upgrade of the AI ​​data center computing platform with a set of amazing and innovative AI computing architecture and AI server chip hardware. Nvidia’s ambition is no longer just to provide GPU hardware products with performance upgrades, but to redefine the rules of AI computing in the data center, treating the data center as a basic computing unit.

In fact, the unit price of a DGX A100 GPU system is about 200,000 US dollars. For cloud computing manufacturers who want to purchase thousands of enterprise-level GPUs for AI training, it is conceivable how high the cost will be. Now, only the world’s major cloud computing manufacturers, IT giants, governments, and laboratories have placed initial orders for the DGX A100.

For other competitors, Nvidia’s strong walls for AI server chips and AI data center computing platforms seem to be insurmountable in the short term. At the same time, it will also become a performance standard for AI server chip manufacturers to strive to benchmark in the next few years. Of course, the challenge to Nvidia A100 naturally begins here. As for Intel, AMD or AWS, Google, we will wait and see.

The Links:   NL2432HC22-36B G101EAN022