Posted on Leave a comment

Brief History of Huawei AI Chip

Warm hints: The word in this article is about 8332 and  reading time is about 40 minutes.   
Guidance: As we all know, data, computing power and algorithms are driving the third wave of AI. Faced with the explosive growth of AI computing demand, what has Huawei been doing in recent years?

As we all know, data, computing power and algorithms are driving the third wave of AI. Faced with the explosive growth of AI computing demand, what has Huawei been doing in recent years?

The seemingly sophisticated artificial intelligence (AI) technology, in fact, has “moistened things silently” into the public life. Only a Huawei Mate20 mobile phone in your hand can realize AI functions such as face recognition, object recognition, object detection, image segmentation, intelligent translation and so on.

Behind this, it relies on the dramatic increase in the power of mobile phones. Huawei Kirin 980 cell phone chip, the size of a penny, integrates 6.9 billion transistors and has the ability to perform trillions of secondary operations per second. You may not imagine that any ordinary smartphone in your hands today is tens of millions or even higher than the most advanced computer in NASA’s 1969 lunar landing program.

computing power

Among them, arithmetic is an important cornerstone. The computing power of mobile chips has grown so amazing in recent years that AI chips used in the cloud need to process huge amounts of data in complex scenarios such as autopilot, and how powerful is it? Recent research released by OpenAI shows that the demand for computing has increased by more than 300,000 times in six years, on average, since 2012. The annual growth is 10 times faster than that of Moore’s law, because deep learning neural networks need large-scale parallel computation of tensors (which can be simplified to be understood as matrices), overturning the traditional floating-point computation, and the demand for computing power is growing exponentially. For example, the original clock unit can only calculate one floating-point calculation. Now the new operator can calculate the matrix of N * N at the same time. If N = 10, it will calculate 100 times at the same time. The calculation times are 100 times higher than the original one. The new operator brings a strong computational appeal to the new chip.

If Huawei is the most concerned ICT and smart terminal manufacturer in 2019, Ascend 910 chip, the industry’s most powerful AI processor since October last year, is the most anticipated AI chip in AI circle this year.

It is the first to introduce dedicated NPUAI chips into mobile phones.

In the development of AI, our country has advantages in data, but it still needs to be developed in arithmetic and arithmetic, especially in the arithmetic of chip and hardware. Algorithmic scientists, engineers and application vendors are facing the problem of AI computing power scarcity and high cost, which greatly raises the threshold of algorithm research and innovation, and hinders the popularization and application of AI in the whole industry.

For this reason, despite the high cost of AI chip research and development in terms of money, time and manpower, on AI business track, manufacturers are building their own chip systems, most of which focus on an application or a scenario of Internet and chip manufacturers, and there are also many ICT manufacturers. Among domestic manufacturers, Huawei’s layout of AI chips can be called “classic”.

Before Alpha Go became famous in World War I, most of the Chinese people had completed the transformation from functional mobile phones to the first generation of smartphones, and the growing number of mobile phone systems with functional features and third-party applications refreshed the user’s experience. Whether AI functions or scenario-based AI services, mobile phones are required to complete complex deep learning algorithm model operations, which are computationally intensive and complex, require huge computational requirements and challenge real-time performance. At the same time, the operating environment is limited, and the power consumption, memory and storage space are very challenging, so powerful computing power is necessary.

Introduction of Artificial Intelligence to Mobile Terminal

How to introduce artificial intelligence into mobile terminals was a problem that Apple and Huawei mobile phone manufacturers were trying to solve at that time.

At the Berlin Electronics Consumption Show in September 2017, Huawei officially released the world’s first mobile AI chip, Kirin 970, and a month later released its flagship mobile phone, Mate10, carrying Kirin 970. Kirin 970 is the world’s first artificial intelligence chip with an independent neural network processing unit (NPU). Huawei was the first to introduce NPU into mobile chips. Since then, Apple, Samsung and other manufacturers have followed up. Today, AI mobile phone has become the flagship configuration of many mobile phone manufacturers. The performance of Kirin 970 built-in NPU is much better than that of CPU, GPU and DSP. At the same time, Kirin 970 gains about 50 times energy efficiency and 25 times performance advantage over CPU. This means that the Kirin 970 chip can accomplish AI computing tasks faster with less energy consumption.

With the support of NPU, the function of mobile phone will become more powerful. For example, when using voice function, AI will analyze the current context and content in detail, so as to achieve a high accuracy recognition experience, and improve the success rate of speech recognition to a higher level. In this way, intelligent assistants as the main voice function can replace the traditional manual input, playing a more important role. Perhaps in the future, there will be no “bowers” walking while playing, but more people “talking to themselves” on mobile phones.

The emergence of AI also brings many benefits to users who like mobile phone photography. Kirin 970 carries a dual-channel ISP image signal processor, which greatly improves dynamic image capture and low-light photography. With the optimization of dual lens + Dual ISP software and hardware, and the computer vision analysis of artificial intelligence, the object in the picture can be automatically analyzed, and the best current photographing mode can be selected. Even the object tracking and focusing can be carried out and the user’s photographing opportunity can be predicted, providing an unprecedented photographing experience.

The launch of Kirin 970 has become an important watershed between traditional smartphones and future AI mobile phones. The development of AI mobile phones has also entered the real-AI competition stage of hardware capability from simple algorithm optimization.

In August 2018, at the Berlin Electronic Consumer Show, Huawei also released the world’s first 7-nm AI mobile phone chip, Kirin 980.

One nanometer (nanometer) is equal to one nanometer (one billionth of a meter), about the length of 10 atoms. A hair wire is about 0.1 mm in diameter, and 7 nm is equivalent to one-thousandth of the hair. Within less than 1 square centimeter of Kirin 980, there are up to 6.9 billion transistors. In terms of chip technology, 7 nm is equivalent to 70 atomic diameters, approaching the physical limit of silicon-based semiconductor technology, and Kirin 980 is dancing on the needle tip. Yu Chengdong, CEO of Huawei Consumer Business, said that Kirin 980’s 7Nm process was the result of careful polishing by a team of more than 1,000 semiconductor engineers, which lasted three years and experienced more than 5,000 engineering validations.

Compared with Kirin 970, Kirin 980 has been upgraded in an all-round way. Taking image recognition speed as an example, Kirin 970 can reach about 2005 sheets per minute, while Kirin 980 can achieve 4500 sheets per minute image recognition under the powerful computing power of mobile dual NPUs. The recognition speed is 120% higher than that of the previous generation, which is much higher than that of the same period in the industry. Accompanied by the face recognition, voice assistant, AI photography, and all kinds of intelligent photography P map APP upgrade on the mobile phone.

At the same time, in the face of a larger number of users, Kirin 710 allows more consumers to enjoy the fun of artificial intelligence. In 2019, Huawei launched Kirin 810 chip, which is Huawei’s 7Nm mobile phone chip and Huawei’s self-developed NPU with Da Vinci architecture. This means that more and more users enjoy the flagship AI experience brought by dedicated NPU.

So far, Huawei has completed the first round of AI chip layout (Kirin 970, Kirin 980, Kirin 710, Kirin 810) on the mobile phone, and the mobile phone industry has officially entered the era of AI.

“Da Vinci” Constructs the Foundation of End-edge Cloud Computing Power Burst

AI track competition, the impact is not only mobile phones, edge side, cloud side hardware computing, data algorithms and other elements are all in the white-hot competition, almost every day there are new papers, new products.

If Huawei’s sustained investment in the chip is “living in safety and thinking about danger”, it shows its vision and determination. Then, Huawei’s ambition in the field of artificial intelligence is more ambitious. This time, Huawei not only covers various scenarios of cloud, edge and end, but also forms a closed loop from application system to chip.

In October 2018, Huawei first proposed the full stack scenario AI solution at its full connection conference. Xu Zhijun, Huawei’s rotating chairman, said: “The whole scenario refers to the deployment environment including public cloud, private cloud, various edge computing, Internet of Things industry terminal and consumer terminal. Full stack is a technical function perspective, which refers to the whole stack scheme including chip, chip enablement, training and reasoning framework and application enablement.

Da Vinci architecture is designed for AI computing features, based on high-performance 3D Cube computing engine, and achieves a significant improvement in computing power and energy efficiency. Starting from the actual needs of cloud, edge, end-independent and collaborative AI, and from the extremely low power consumption to the extremely powerful AI scenario, it provides the core support of unified architecture for the coordination, migration, deployment, upgrade and maintenance of algorithms among cloud, edge and end, which greatly reduces the threshold of AI algorithm development and iteration. Reduce AI deployment and business costs.

At present, Shengteng 310, one of the Ascend chip families, has been put into commercial use. Based on Shengteng 310, Huawei has released Atlas 200, Atlas 300, Atlas 500, Atlas 800 and other products, which have been widely used in security, finance, medical, transportation, electric power, automobile and other industries, involving cameras, UAVs, robots, smart stations, MDC (Mobile Data Center) and other product forms. More than 50 APIs based on Rising 310, such as Huawei Cloud Image Analysis Service, OCR Service and Video Intelligent Analysis Service, have been invoked more than 100 million times a day. Another large number of corporate customers are developing their own algorithmic services with the help of Shengteng 310 chip.

With the large-scale listing of Shengteng 310 related products, the expectation of Shengteng 910 is even stronger. After all, last October, Xu Zhijun announced at the conference that “Shengteng 910 is the single chip with the highest computing density, the maximum power consumption is 350W, and the semi-precision is (FP16) 256TeraFLOPS, which is nearly twice as high as Yingweida V100’s 125 TeraFLOPS. If 1024 lifts 910, there will be the largest AI computing cluster in the world so far, and the performance will reach 256 P, no matter how complex the model can be easily trained. Simply put, Sheng Teng 910 is the most powerful AI processor in the industry. At the same power consumption, it has twice the power of industry chips and 50 times the power of the strongest CPU.

The whole stack scenario AI gradually landed

Among them, in 2019, Huawei Cloud Model Arts, which has landed for commercial use, has its own portal for users and developers in addition to Rising 310. As a one-stop AI development platform, ModelArts can provide massive data preprocessing and semi-automatic annotation, large-scale distributed training, automatic model generation, and end-to-side-cloud model on-demand deployment capabilities to help users quickly create and deploy models and manage full-cycle AI workflow. In May 2019, Huawei Cloud Model Arts won the first training in image recognition on the Stanford DAWN Bench list. It only took 2 minutes and 43 seconds to train the ResNet-50 model with 128 V100GPUs on the ImageNet-1k data set. In October 2017, the training time of Stanford DAWN was 13 days, 10 hours and 41 minutes. Stanford’s DAWN Bench list almost gathers leading AI manufacturers at home and abroad. If ModelArts has a strong upgrade of 910, can it further refresh the world record? What will happen if 1024 upgrades of 910 are used as the largest AI computing cluster in the world?

From the end side to the edge side to the cloud side, from the bottom hardware to the deep learning framework to the upper application enablement, Huawei’s whole stack scenario AI strategy is gradually landing.

Among them, the whole stack of AI is based on a series of AI chips based on a unified Da Vinci architecture, from IoT to terminals (such as Kylin chip’s NPU), to the edge side and then to the cloud. At the meeting, Xu Zhijun also announced that “Huawei has been developing AI chips from outside. I want to tell you that this is the fact that we have released two AI chips today: Ascend 910 and 310.” As soon as this statement was made, the artificial intelligence circles at home and abroad started to make waves – Huawei finally offered a big gesture.

In addition to the Rising Series chips, Huawei’s full stack AI includes Mind Spore, a unified training and reasoning framework supporting end, edge and cloud independence and collaboration, CANN, a chip operator library and highly automated operator development tool, which provides application capabilities of Model Arts, layered API and pre-integration schemes. In a year ago’s rhetoric about AI, what are Huawei’s next steps to cash in on the market? We’ll see.

Leave a Reply

Your email address will not be published.