Huawei Unveils the World's Fastest AI Training Cluster Atlas 900
Yicai Global
/SOURCE : yicai
Huawei Unveils the World's Fastest AI Training Cluster Atlas 900

(Yicai Global) Sept. 19 -- Huawei Technologies has revealed its first computing strategy which will focus on architecture innovation, investment in all-scenario processors, clear business boundaries and building an open ecosystem. Huawei also unveiled the world's fastest artificial intelligence training cluster, the Atlas 900, at the Huawei Connect 2019, the company's fourth annual customer conference held in Shanghai. This super computer will help make AI more readily available to different fields of scientific research and business innovation.

Huawei Releases First Computing Strategy

Computing is a model through which humans know the world. The development from mainframe computers to personal computers, from smartphones to wearable devices shows computing power is increasingly becoming an extension of human capabilities. The industry's approach to computing is evolving and statistical computing will become mainstream. Over the next five years, Huawei estimates that AI computing will account for more than 80 percent of society's total computing power. Computing is entering a new era of intelligence.

"The future of computing is a massive market worth more than USD2 trillion," said Deputy Chairman Ken Hu, "Our strategy is to keep investing in four key areas. We will push the boundaries of architecture, invest in processors for all scenarios, keep clear business boundaries and build an open ecosystem."

First, architecture innovation. Huawei will continue to invest in basic research and has already developed the Da Vinci architecture, an innovative processor architecture designed to provide a steady and abundant supply of affordable computing power. 

Second, investment in all-scenario processors. Huawei has a full lineup of processors: Kunpeng for general purpose computing, Ascend for AI, Kirin for smart devices and Honghu for smart screens.

Third, clear business boundaries. Huawei won't sell its processors directly to consumers. Instead, it will provide them in the form of cloud services. It will supply its business partners with components, prioritizing support for integrated solutions.

Fourth, building an open ecosystem. In the next five years, Huawei will invest a further USD1.5 billion in its developer enablement program. The aim is to expand the program to support five million developers and enable Huawei's worldwide partners to develop the next generation of intelligent applications and solutions.

Atlas 900 Sets a New World Record

Huawei has spent 10 years developing the Atlas 900. It takes the Atlas 900 only 59.8 seconds to train ResNet-50, the gold standard for measuring large-scale cluster computer capabilities. This is 10 seconds faster than the previous world record.

The Atlas 900 will mostly provide extraordinary computing power for the training of large-scale datasets' neural networks and will be widely used in scientific research and business innovation. It will enable researchers to perform faster AI model training using images, videos and sound. It will allow for more efficient studies of the universe, weather forecasts, oil explorations as well as speed up the commercialization of autonomous driving.

The Atlas 900 also provides cloud services, offering abundant and economical computing resources based on highly efficient and fully inclusive AI platforms that are accessible, affordable and practical to use. Huawei has deployed the Atlas 900 on the Huawei Cloud as a cluster service, making extraordinary computing power more broadly accessible to its customers across different industries. Huawei offers these services at a great discount to universities and scientific research institutes worldwide.

Gao Wen, a member of the Chinese Academy of Engineering and director of the Peng Cheng Lab, shared the mission and vision of the Peng Cheng Lab at the event. He explained how the lab will work with Huawei to build China's first evolving AI supercomputing system that supports exascale computing. The two companies will work together to establish a new generation of platforms for AI basic research and innovation. Zheng Yelai, president of Huawei's Cloud BU, talked about how AI can be applied in a variety of scenarios. Based on Huawei's experience in over 500 projects across more than 10 industries, Zheng pointed out that the AI industry is crossing the commercial chasm and becoming a key driver for reshaping how companies go digital.

"This is a new age of exploration," Hu concluded, "An ocean of boundless potential is waiting, but just one ship is not enough. Today we launch a thousand ships. Let's work together, seize this historic opportunity and advance intelligence to new heights."

Atlas 900 Represents the Peak of Global Computing Power

Neural network architecture trained on large datasets covers image recognition, natural language processing, real-time video analysis and intelligent recommendation systems. Training these neural network models requires a lot of floating-point computing power. Significant progress has been made in the computing power and training methods of individual AI processors in recent years, but the time required for AI training of a single machine is still unrealistically long. Therefore it is necessary to enhance neural network training systems' floating-point computing power through large-scale distributed AI clusters.

The Atlas 900 consists of thousands of Ascend 910 AI processors. It is the world's fastest AI training cluster and represents the peak of global computing power. The Atlas 900 delivers 256 to 1,024 PFLOPS at FP16, equivalent to the combined computing power of 500,000 personal computers. The Atlas 900 has the following features.

Industry-Leading AI Computing Power

The Atlas 900 uses Ascend 910 AI processors which boast the greatest single chip computing density in the field. Each Ascend 910 AI processor has 32 built-in DaVinci AI cores. Each single chip provides twice the industry's computing power (256 TFLOPS at FP16). The Atlas 900 interconnects thousands of Ascend 910 AI processors to create the industry's number one computing power cluster.

The Ascend 910 AI processor adopts a System-on-a-Chip design and integrates AI computing power, general computing power, high-speed and large bandwidth I/O to greatly unload the host CPU's data pre-processing tasks and improve training efficiency.

Best Cluster Network

The Atlas 900 integrates the HCCS, PCIe 4.0 and 100G RoCE high-speed interfaces and uses 100TB of fully interconnected, non-blocking, exclusive-parameter, synchronized networks. In this way it can reduce network delays by reducing the gradient synchronized delay by 10 percent to 70 percent.

Inside the server, the Ascend 910 AI processors are interconnected through the HCCS high-speed bus. The Ascend 910 AI processor connects with the CPU via the latest PCIe 4.0 (rate 16Gb/s) technology whose rate doubles that of the more commonly used PCIe 3.0 (8.0Gb/s) technology, resulting in faster and more efficient data transfers. At the cluster level, the CloudEngine 8800 switch for data centers is used to provide a single-port 100Gbps switching rate and to connect all AI servers in the cluster with the high-speed switched network.

The original iLossless intelligent, lossless, switching algorithm performs real-time learning training on network traffic in the cluster to achieve zero packet loss and E2E μs delay on the network.

System-Level Optimization

The Atlas 900 integrates the HCCS, PCIe 4.0 and 100G RoCE high-speed interfaces through the Huawei cluster communication library and job scheduling platform, fully unlocking the powerful performance of the Ascend 910 chips.

The Huawei cluster communication library provides a distributed parallel library required for the training network. The communication library plus network topology plus training algorithm performs system-level optimization and achieves over 80 percent of cluster linearity, which greatly improves job scheduling efficiency.

Extraordinary Cooling System

Conventional data centers mostly use air-cooling technology to cool equipment which is inadequate for the AI era. High-power consuming devices such as CPUs and AI chips cause greater urban heat island effects and require more efficient cooling. Liquid cooling technology can meet data centers' high demands for power, high-density deployment and low power usage effectiveness.

The Atlas 900 uses a full liquid-cooling solution with the industry's strongest, innovative, cabinet-level, closed insulation technology which supports a more than 95 percent liquid-cooling ratio. A single cabinet supports 50 kilowatts of ultra-high cooling power consumption to achieve an extraordinary energy efficiency of a PUE less than 1.1 in data centers.

The machine room occupies 79 percent less space compared with an 8kw air-cooling cabinet. The extraordinary liquid-cooling technology meets the requirements of high power, high-density equipment deployment as well as low PUE and greatly reduces customers' total ownership costs.

1,024 Ascend 910 AI Processors

Huawei has deployed on Huawei Cloud an Atlas 900 cluster consisting of 1,024 Ascend 910 AI processors. Based on the most typical ResNet-50 v1.5 model and ImageNet-1k dataset, the Atlas 900 takes only 59.8 seconds to finish the training, ranking number one worldwide. 

The ImageNet-1k dataset contains 1.28 million photos with an accuracy of 75.9 percent. Another two mainstream manufacturers' accuracy results are 70.2 seconds and 76.8 seconds. The Atlas 900 has a 15 percent faster performance than the slower of the two. 

Follow Yicai Global on
Keywords: Huawei , Atlas 900 , AI Training Cluster